This Month in Data Science: October 2014
Data science might be the fastest and most important growth industry in the history of the world.
Every industry is benefiting from data science. Large, medium, and small companies as well as governments and non-profits already show that data science and applied engineering turns into results.
This month shows the growth in the data science field with an influx of research dollars, workforce development, and salary increases. Surveys show companies MUST use data science to compete. Analysts point out where the biggest trends are. Journalists explore how cell phone data can help prevent ebola. And, we share what is happening with data science at Netflix, Spotify, and McLaren along with data science perspectives on data lakes, churn prediction, real-time mobile video analytics, mobile banking, Apache Spark, and more.
Here are 16 of the top stories from October to keep you abreast of this hyper-growth environment.
The National Science Foundation (NSF) put $31 million towards 17 innovative data science projects to improve the workforce and improve data science tooling. The National Institutes of Health pour $32 million to help researchers analyze and use big biology data.
Forbes covers the Industrial Internet Insights Report For 2015, by Accenture and General Electric, highlights how the worlds of IoT, big data, and data science overlap. The study also covers competition, risks, and priorities for enterprise initiatives in various sectors.
The Economist explores how epidemiologists, scientists who study the patterns, causes, and effects of disease, might use location data from mobile phones. The data can explain where and when people go places, even predicting where humans will spread diseases.
With 33 million subscribers, Netflix has scores of data about what our interests, habits, and behaviors are regarding media and entertainment. In this article from the Huffington Post, sociology, marketing, and big data are used to build a recipe for hit TV shows.
Running algorithms over one trillion music-related data points, Spotify’s acquisition, “The Echo Nest,” performs text analytics on over 10 million music related web pages per day. Founded out of the MIT Media Lab, the analytics-driven application tracks our relationship with music.
During a race, sensors transmit performance details and data scientists then run simulations on the data, analyzing win probabilities under various scenarios to improve real-time decision-making. This process was used to win the Grand Prix, 32 Olympic medals and will improve London Heathrow’s on time arrivals by a whopping 19%.
This Computerworld article explains why companies across industries are moving into big data to stay ahead. The trends include cloud analytics, Apache Hadoop® as OS, data lakes, predictive analytics, SQL on Hadoop, open source NoSQL, deep learning, and in-memory analytics.
Mashable shares how one recruiting company has seen a 300% increase in demand for data scientists and engineers in the past year—ad tech, financial services, ecommerce, and social media lead the way. This post provides a FAQ, covering basics, salaries, hiring tips, and more.
This Month in Pivotal Data Science
The Future Architecture of a Data Lake: In-memory Data Exchange Platform Using Tachyon and Apache Spark
In partnership with the AMPLab at UC Berkeley, Pivotal envisions a future architecture with an in-memory data exchange platform on Tachyon and in-memory compute layer augmented by Apache Spark. This is the journey to one central data repository.
This post explores the importance of two, related capabilities—natural language processing and text analytics—and how these computing methods are helping the financial services, insurance, and legal industries today.
Financial services firms are often at the forefront of using data. In this post, two Pivotal data scientists explain how external data can be incorporated into a churn model, talk about the development of churn prediction models, and talk about how models can create value.
Big data analysis and data science is playing an important role in security, risk, fraud, and online banking. This post explains how Pivotal is associating behaviors across help desk tickets and command-line activities, preventing misuse of users with privileged access.
This post outlines the architecture and approach taken by a major sports network and mobile carrier to create better analytics on the big data associated with live sports broadcasts on mobile phones. We explain the architecture of the entire data lifecycle.
Based on Pivotal’s experience developing over 400 mobile applications, we explain the background on why push notifications and analytics are important in mobile banking, gives examples mobile use cases, and outline the approaches and tools needed to succeed.
The concept of the data lake continues to catch on in the industry as a variety of companies flesh out what it means to implement and how it should evolve. In this post, Silicon Angle reports on the way EMC’s bundle will simplify the deployment and scale of Apache Hadoop®.
Because it runs 10-100x faster than Apache Hadoop® on disk or in-memory, Spark is gaining momentum. In this video, the top Hadoop distribution vendors, including Pivotal, talk about vision, strategy, and the capabilities of Apache spark. Use cases and approaches are discussed alongside the location of Spark in solution architectures.
About the Author