Mobile Video Big Data Architecture with Spring XD/Hadoop/HAWQ/Redis: Measuring Live Usage

October 13, 2014 Allan Baril

Earlier this year, mobile video usage overtook video usage on personal computers and research continues to show how important mobile video is to consumers. With these trends, mobile carriers and media companies are trying to learn more about usage to improve user experience and advertising revenues, especially with big advertising money makers like sports.

Earlier this year, a major sports network and mobile carrier approached Pivotal to create a system for measuring cellular data and live video usage through mobile applications. The approach used Spring XD, Pivotal HD, Redis, and Spring Boot to quickly build a big data platform and set of analytical dashboards. The resulting analytics application is helping business executives from both companies understand how live video is used and what peak data usage looks like. The whole project only took a few resources and a few months.

Mobile/Media Big Data Architecture Overview

At a high level, data moves through a factory as depicted in the diagram below. Raw data is captured from mobile phones in JSON format by a Spring XD cluster where several processes are performed. The data is then stored in an HDFS cluster where Spring XD batch jobs use SQL via the HAWQ interface to HDFS and store the calculated reports in Redis. Spring Boot is then used with Angular.js and D3.js to show analytics to end users.

Capturing Raw Data from Mobile Devices in JSON

To produce the raw data, a mobile device is instrumented to capture the number of bytes used while viewing a video. The mobile application sends a “heartbeat” message that is formatted in JSON and contains the number of bytes consumed since the last report. Generally, the mobile app sends one of these reports every minute. Since there can be a large number of viewers during a live game, the system receives millions of heartbeat messages per hour. When designing the system, we decided to keep all of the heartbeat data over time, as with the concept of keeping all data inside a data lake. This way, teams could use the data to produce additional analysis in the future. In addition, the customers had a quick timeline. They needed the system in place prior to the start of the next sports season.

Ingesting and Processing Data with Spring XD and HDFS

To handle the data ingestion, the Pivotal team utilized Spring XD to receive and transform the raw data. The processed data was stored in an Apache Hadoop® File System (HDFS) cluster as part of Pivotal HD. Spring XD and HDFS provided a flexible, scalable solution to handle the massive amount of incoming JSON heartbeat data—as much as a half terabyte per season of sports.

After the data was stored in HDFS, we used Pivotal HAWQ and SQL to generate the necessary reports from the data stored in HDFS. Spring XD’s batch job capability was leveraged to execute the reports after a game was done. The batch jobs produced JSON reports and stored them in Redis, providing immediate access to a web-based dashboard. Spring XD and Pivotal HD provided clustering and failover support to ensure business continuity in the event of a server failure.

Spring XD turned out to be a perfect solution because it was designed with a distributed and extensible pipeline for ingesting data, processing it, performing real-time analytics on the received data, and then exporting it. Using Spring XD, we quickly implemented a system to receive the raw heartbeat data from the mobile applications. Spring XD provides several “sources” and “sinks” out of the box, and we were able to modify the existing HTTP source and HDFS sink to receive and store the raw heartbeat data. We also created a custom “processor” module that enabled us to filter and validate the raw data, ensuring that it was formatted as expected and had valid data in the expected JSON fields. Spring XD also provided a mechanism to “tap” data from one stream to another. Using a tap, it was simple for us to configure a stream to count the number of heartbeat packets received. This allowed us to ensure the system worked as expected—we could see the amount of live data going through the system in real-time. Since Spring XD supports a distributed runtime deployment, it also ensured that inbound data would continue to process in the event of any server failure.

Creating Analytical Reports with Spring XD, HAWQ, Redis, and Spring Boot

With HDFS storing the raw data, we used the Pivotal Xtension Framework (PXF) and HAWQ to perform SQL queries directly against the JSON data. PXF maps the JSON to columns in a database, enabling HAWQ to execute SQL queries directly against the JSON data in HDFS. Using SQL, the product manager could then define the report queries.

For the next step, we turned to Spring XD’s batch capabilities. Custom “jobs” were configured with the appropriate SQL queries to execute. The jobs are scheduled to execute early in the morning, after a game has occurred, and the report results are stored in Redis. Redis stores all the reports so that users can view any of them without needing to spend the time computing them on demand. Finally, we implemented a simple REST API, using Spring Boot, to serve the JSON reports to the user interface.

A web-based dashboard was also built and allows users to easily view the weekly reports as they are available. Here, Spring Boot was used with JavaScript libraries like Angular.js and D3.js. Spring Boot and Spring Security were used to quickly secure the dashboard by adding a login page and requiring HTTPS for access.

Together, Spring XD, Pivotal HD and HAWQ, Redis, and Spring Boot were used to quickly create a big data solution for massive ingesting, analyzing, and reporting on cellular data use for live, game-day videos. Now, the client can identify trends and optimize media revenue.

Learn More:

Spring XD: Project and Documentation | Blog Articles | Using XD for Real Time Twitter
Pivotal HD (SQL on HDFS via HAWQ): Product | Documentation | Blog Articles
Redis: Website | Blog Articles
Spring Boot: Project and Documentation | Blog Articles
Pivotal Open Source: Products and Projects | Blog Articles

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author

Biography

Security Analytics in Action: Use Cases for Deep Monitoring of Privileged Users

Pivotal data science and security expert, Derek Lin, has considerable experience in the areas of big data, ...

WWDC 2014

It’s been a good four months since WWDC and I can not count the number of times I have said or thought “at ...

Mobile Video Big Data Architecture with Spring XD/Hadoop/HAWQ/Redis: Measuring Live Usage

Mobile/Media Big Data Architecture Overview

Capturing Raw Data from Mobile Devices in JSON

Ingesting and Processing Data with Spring XD and HDFS

Creating Analytical Reports with Spring XD, HAWQ, Redis, and Spring Boot

About the Author

Previous

Next

Mobile Video Big Data Architecture with Spring XD/Hadoop/HAWQ/Redis: Measuring Live Usage

Mobile/Media Big Data Architecture Overview

Capturing Raw Data from Mobile Devices in JSON

Ingesting and Processing Data with Spring XD and HDFS

Creating Analytical Reports with Spring XD, HAWQ, Redis, and Spring Boot

About the Author

Previous

Next

Related content in this Stream

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.