Data Driving The Future of Cars: Data Science Innovations in the Automotive Industry

February 5, 2014 Adam Bloom

featured-connected-carAccording to leading automotive industry research, there are now over one billion cars, trucks, and buses in operation around the world.

In today’s world of the Internet of Things (IoT), this means a lot of data. And data means opportunity.

Today’s “connected car” is the poster child for gaining insight, innovation, and economic growth from data science and new sources of big data about consumers. Recently, we interviewed two members of the Pivotal Data Science team, Ian Huston and Noelle Sio, who contributed to this article. They explained the importance of the connected car, introduced the innovative work Pivotal has done in the sector, and explained how the work applies to other industries as they prepared for their presentation with colleague Alex Kagoshima, “Driving the Future of Smart Cities – How to Beat the Traffic” at Strata next week.

The Connected Car as Poster Child for Data Science Innovation

Cars and trucks are great examples of data science innovation because many industries have an economic business relationship with the automotive industry. The Auto Alliance and Center for Automotive Research both highlight auto manufacturing’s impact on many other industries as buyers of materials and services, including a significant spend on research and development.

Cars are a great opportunity for data science innovation because of the interrelationship with consumption. Beyond spending on oil and gas, there is significant spending on financing and insurance. As well, consumers do a lot of other things in their cars—find directions, report accidents, pick up food, take vacations, transport retail purchases, avoid traffic, pay tolls, find parking, listen to music, check the weather, talk to people, and more. Each of these behaviors provides major streams of information for multiple industries to innovate.

Lastly, when you look at today’s connected vehicle, big data and data science projects are already underway in areas like in-car telematics, navigation, social media, ecommerce, entertainment, and communication. For example, car companies want to leverage data from as many as 50 separate microprocessors to optimize fuel efficiency, perform predictive car maintenance, and better understand the entire driving experience. Insurance companies want to monitor behavior to adjust premiums. Real-time traffic information from satellite, cell phones, GPS instruments, and social media is being used to improve navigation and optimize routes for drivers, companies, and governments.

Examples of Data Science Applications in the Automotive Industry

In two of Pivotal’s recent automotive projects, the team addressed several auto manufacturer goals by looking at traffic data sets. One company wanted to use GPS data to identify important patterns around speeding, determine why the behavior was happening, and see how the vehicle could help the driver make smarter decisions. In another effort, the data science team needed to provide predictions for the smart city and smart vehicle of the future—answering how long an accident and related traffic would take to clear so vehicle routes could be optimized in real-time, perhaps avoiding catastrophes like the Snowpocalypse that took place last week in Atlanta, Georgia.

As with other data science problems, they began looking at the data sets to identify how to access, explore, analyze, and visualize the data. As they circulated the problems within the cross-disciplinary team at Pivotal Data Labs, they realized that similar problems had been solved in bioinformatics. To attack these problems, they began placing various, separate sets of data together in one place, much like a data lake, so that the data was available to be easily “mashed up.” Then, they combined different machine learning methods to run in parallel. In the first problem, they took an in-depth analysis of velocity and traffic signal patterns over time. In the other case, they combined location data with reference information from local transport authorities about previous incidents, and combined it with both local weather conditions and crowd-sourced, social media data.

These analyses were used to help:

  • understand driving behaviors like what types of roads are used, how many turns are taken each day, and when drivers sit in traffic or move,
  • provide a more accurate and detailed view of rush hour by time and road segment,
  • predict how long incidents would take to clear,
  • optimize directions and reduce travel delays in real-time, and
  • optimize traffic light patterns to reduce congestion.

Ultimately, these types of data science insights can be woven together to even help with car design and do things like predict wear and tear or make fuel consumption improvements.

Applying Automotive Data Science to Other Industries

The underlying “Internet of Things” model opens the door for data science innovation in virtually every industry. GE is a prime example of the IoT model across several B2B industries. The IoT model can be applied anywhere digital or physical sensors track usage information and place it on the network for storage and analysis, and EVERY industry’s products and services can include digital or physical sensors to track usage.

To initiate data science projects like these auto-industry examples, there are two questions to start asking: “What do we want to learn about our customers by having a deeper understanding into their use of our product or service?” and “Where will this information lead to capabilities that improve the user experience and benefit our company?”

With these types of goals in mind, data science can then be applied to help:

  • determine how sets of sensor data can be combined with other data,
  • identify patterns, relationships, and groups, of use or usage events,
  • use information to react in real-time or to predict probable behaviors, and
  • optimize activities that impact costs and revenue.


To learn more about the Pivotal Data Labs team mentioned in this article, contact us or download our white papers:

  • The Eight-Fold Path of Data Science
  • Disruptive Data Science – Transforming Your Company into a Data Science-Driven Enterprise

Join us in Santa Clara, CA on February 11-13, 2014 for Strata 2014. You can use Pivotal20 as a 20% discount code when registering, and here is where you can find us:

  • Pivotal will be hosting booth #201 as an Elite sponsor.
  • Data Science Keynote @9AM on Thursday, 02/13/2014 by one of Pivotal’s top data scientists, Kaushik Das. The presentation will showcase data science in practice and build a digital brain. Examples will include collecting big data from a large number of sensors, storing information in a Data Lake, and applying data science methods with sophisticated machine learning and statistics to find value in the data.
  • Panel on: The Business Data Lake: An Evolution in Data Infrastructure @10:40AM on Thursday, 02/13/2014. The panel will be moderated by Jeffrey Kelly of the Wikibon Project and include the Chief Data Office at NYSE, Group Director of Pivotal Technologies at Capgemini, and Director of Information & Analysis at Kaiser Permanente. They will discuss how organizations are moving beyond rigid and high latency data warehouse environments to more flexible and cost-effective Data Lakes—centrally managed repositories using low cost technologies such as Hadoop, SQL, In-Memory, and others to land any and all data that might potentially be valuable for analysis and operationalizing that insight.
  • Presentation: Driving the Future of Smart Cities: How to Beat the Traffic @4:00PM on Thursday 02/13/2014. Three experts from Pivotal’s Data Science Team will share several innovative methods and predictive algorithms, demonstrate how they combined various machine learning methods and data sources, and show the results from analyzing traffic data via in-car sources like GPS as well as local weather, and more.

About the Author


Pivotal RabbitMQ Service Hops on to Pivotal CF
Pivotal RabbitMQ Service Hops on to Pivotal CF

The Pivotal CF RabbitMQ Service plays a fundamental role within the Pivotal One platform, providing the com...

Diego Ongaro – The Raft Consensus Algorithm
Diego Ongaro – The Raft Consensus Algorithm

Diego Ongaro gives an overview of how consensus is used in building fault-tolerant, distributed systems as ...


Subscribe to our Newsletter

Thank you!
Error - something went wrong!