This Month in Data Science

October 31, 2014 Paul M. Davis

This month in data scienceThis Month in Data Science: October 2014

Data science might be the fastest and most important growth industry in the history of the world.

Every industry is benefiting from data science. Large, medium, and small companies as well as governments and non-profits already show that data science and applied engineering turns into results.

This month shows the growth in the data science field with an influx of research dollars, workforce development, and salary increases. Surveys show companies MUST use data science to compete. Analysts point out where the biggest trends are. Journalists explore how cell phone data can help prevent ebola. And, we share what is happening with data science at Netflix, Spotify, and McLaren along with data science perspectives on data lakes, churn prediction, real-time mobile video analytics, mobile banking, Apache Spark, and more.

Here are 16 of the top stories from October to keep you abreast of this hyper-growth environment.

Research Dollars Flow into Data Science from NSF and NIH

The National Science Foundation (NSF) put $31 million towards 17 innovative data science projects to improve the workforce and improve data science tooling. The National Institutes of Health pour $32 million to help researchers analyze and use big biology data.

84% Of Enterprises See Big Data Analytics and Data Science Impacting Competition

Forbes covers the Industrial Internet Insights Report For 2015, by Accenture and General Electric, highlights how the worlds of IoT, big data, and data science overlap. The study also covers competition, risks, and priorities for enterprise initiatives in various sectors.

Using Data Science from Mobile Phones to Combat Ebola

The Economist explores how epidemiologists, scientists who study the patterns, causes, and effects of disease, might use location data from mobile phones. The data can explain where and when people go places, even predicting where humans will spread diseases.

How Netflix Uses Data Science to Produce Hit Shows

With 33 million subscribers, Netflix has scores of data about what our interests, habits, and behaviors are regarding media and entertainment. In this article from the Huffington Post, sociology, marketing, and big data are used to build a recipe for hit TV shows.

Spotify’s Secret Weapon—Adding Machine Learning to Our Music

Running algorithms over one trillion music-related data points, Spotify’s acquisition, “The Echo Nest,” performs text analytics on over 10 million music related web pages per day. Founded out of the MIT Media Lab, the analytics-driven application tracks our relationship with music.

How McLaren Won the Monaco Grand Prix and 32 Olympic Medals

During a race, sensors transmit performance details and data scientists then run simulations on the data, analyzing win probabilities under various scenarios to improve real-time decision-making. This process was used to win the Grand Prix, 32 Olympic medals and will improve London Heathrow’s on time arrivals by a whopping 19%.

8 Big Trends in Big Data Analytics

This Computerworld article explains why companies across industries are moving into big data to stay ahead. The trends include cloud analytics, Apache Hadoop® as OS, data lakes, predictive analytics, SQL on Hadoop, open source NoSQL, deep learning, and in-memory analytics.

300% Increase in Demand for Data Scientists Last Year

Mashable shares how one recruiting company has seen a 300% increase in demand for data scientists and engineers in the past year—ad tech, financial services, ecommerce, and social media lead the way. This post provides a FAQ, covering basics, salaries, hiring tips, and more.

This Month in Pivotal Data Science

The Future Architecture of a Data Lake: In-memory Data Exchange Platform Using Tachyon and Apache Spark

In partnership with the AMPLab at UC Berkeley, Pivotal envisions a future architecture with an in-memory data exchange platform on Tachyon and in-memory compute layer augmented by Apache Spark. This is the journey to one central data repository.

Text Analytics and Natural Language Processing in the Era of Big Data

This post explores the importance of two, related capabilities—natural language processing and text analytics—and how these computing methods are helping the financial services, insurance, and legal industries today.

Churn Prediction in Retail Finance and Asset Management (Part 2)

Financial services firms are often at the forefront of using data. In this post, two Pivotal data scientists explain how external data can be incorporated into a churn model, talk about the development of churn prediction models, and talk about how models can create value.

Security Analytics in Action: Use Cases for Deep Monitoring of Privileged Users

Big data analysis and data science is playing an important role in security, risk, fraud, and online banking. This post explains how Pivotal is associating behaviors across help desk tickets and command-line activities, preventing misuse of users with privileged access.

Mobile Video Big Data Architecture with Spring XD/Apache Hadoop®/HAWQ/Redis: Measuring Live Usage

This post outlines the architecture and approach taken by a major sports network and mobile carrier to create better analytics on the big data associated with live sports broadcasts on mobile phones. We explain the architecture of the entire data lifecycle.

Driving Loyalty, Engagement, and Profit in Mobile Banking with Analytics

Based on Pivotal’s experience developing over 400 mobile applications, we explain the background on why push notifications and analytics are important in mobile banking, gives examples mobile use cases, and outline the approaches and tools needed to succeed.

EMC and Pivotal Release Scale Out Bundle for Apache Hadoop® Data Lakes

The concept of the data lake continues to catch on in the industry as a variety of companies flesh out what it means to implement and how it should evolve. In this post, Silicon Angle reports on the way EMC’s bundle will simplify the deployment and scale of Apache Hadoop®.

Top Apache Hadoop® Vendors and Data Scientists Talk Apache Spark

Because it runs 10-100x faster than Apache Hadoop® on disk or in-memory, Spark is gaining momentum. In this video, the top Hadoop distribution vendors, including Pivotal, talk about vision, strategy, and the capabilities of Apache spark. Use cases and approaches are discussed alongside the location of Spark in solution architectures.

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author


A High-performing Mid-range NAS Server, Part 2: Performance Tuning for iSCSI
A High-performing Mid-range NAS Server, Part 2: Performance Tuning for iSCSI

This blog post describes how we tuned and benchmarked our FreeNAS fileserver for optimal iSCSI performance....

This Month in PaaS: Top Headlines for Platform as a Service
This Month in PaaS: Top Headlines for Platform as a Service

The PaaS space is growing leaps and bounds, and so is the volume of technology announcements, articles, and...