Data science’s critical role within a wide range of industries and areas of research was evident during the month of June, as Spotify announced a new data analytics team, free-to-play games received increased scrutiny, and much more. Here’s our roundup of the biggest data science news of the month, both from Pivotal and beyond.
With the announcement of Apple Music, the competition among streaming music services heated up this month. Spotify, the current champ in the streaming music space, is doubling down on data science to improve its recommendation engine by acquiring consulting firm Seed Scientific. The goal for the company’s new data analytics team is to analyze audio attributes, metadata, and user behavior to improve its recommendations and better target advertising.
Gamification’s time as the buzzword du jour may have passed, but its underlying principles are informing user interface design across many industries. Ironically it may have proven most disruptive to the video game industry, where console makers and developers have lost mindshare and gamers to the free-to-play mobile explosion. In this in-depth article for ESPN, Simon Parkin looks at how mobile gaming companies are utilizing psychology and data science to draw gamers into an ongoing feedback loop of monetization.
In an interactive infographic, Eric Roston and Blacki Migliozzi for Bloomberg Business visualize what factors have contributed to global warming, from 1880 to 2014. Using data from NASA’s Goddard Institute for Space Studies, the visualization takes into account the degree to which both natural and industrial factors contribute to global warming.
The difficulty of finding skilled data scientists who hold the diverse skillsets required is well-established at this point. This unique mixture of mathematical, coding, and analytical prowess can be imposing to students looking to break into the field. In this article, Datanami speaks to a number of professional data scientists to determine what technical and professional skills would-be practitioners need to develop, as well as which educational paths will pay off for students.
To understand what makes legendary British motorcycle racer John McGuinness so fast, EMC embedded sensors into his suit and bike in 2015 to collect data from a Spanish test circuit. The data was released to the data science community to determine what contributes to McGuinness’s unprecedented speed in two competitions, one focused on data analysis and the other on data visualization. Data analytics winner Stefan Jol found how performance in one area of the track impacted McGuinness’s entire race, while visualization winner Charlotte Wickham developed a live visualization of relative performance among racers.
Astronomical data may be the original source of big data, with research revealing a mind-boggling number of celestial bodies. It’s not surprising that the final frontier produced some of the earliest applications for big data technologies and data science. But as Maya Dillon at the Guardian details, even with new technologies and techniques, there remain many data challenges for astronomers as they contend with the vastness of space, including finding effective approaches to visualization, developing efficient algorithms, and introducing machine learning methodologies.
This Month in Pivotal Data Science
Based on a listener suggestion, podcast host Simon Elisha discusses examples of some of the data science Pivotal Labs performs for customers. Sticking to more common, and universally understandable examples, this podcast covers two use cases in retail and how they either make money or save money.
Purdue University has become a leader in using data and data science to help students increase student success rates, flag issues, and improve teacher effectiveness. With the help of Pivotal Big Data Suite, data mining techniques, and predictive analytics, the University can give students and teachers an early warning system in situations where students might have challenges.
One of the goals for the Spring XD 1.2 release was to obtain the baseline performance metrics on a typical cluster of machines and then optimize stream performance where necessary. Spring XD is a unified, distributed, and extensible system for data ingestion, real time analytics, batch processing, and data export. Our testing drove several optimizations to increase streaming performance. The benchmarks found that a single threaded Spring XD stream can handle over 2 million (100 byte) events a second, using Apache Kafka as a transport.
The Spring XD engineering team has some big announcements regarding Spring XD 1.2 and 1.1.3 along with Flo for Spring XD. Focusing on developer experience and productivity, the new features cover Flo, performance optimization, new sources/processors/sinks/batches, runtime refactoring to act as native apps in Pivotal Cloud Foundry, Apache Ambari installed clusters, resiliency improvements, registry HA support, improved integration with Pivotal HAWQ, Pivotal Gemfire, Pivotal Greenplum Database, Pivotal HD, and Sqoop.
Today, Pivotal announced an exciting acquisition of big data query technology from the University of Wisconsin-Madison. As part of the acquisition, Professor Jignesh Patel will be joining Pivotal and starts his tenure here sharing why this is such a great move for Pivotal customers, the Quickstep technologies and himself.
After attending the Pivotal Big Data Roadshow in Atlanta, Pivotal’s Stacey Schneider validates that it is still early days for most companies on their journey to transform to a data-savvy technology company. Many attendees use the roadshow as a free orientation to starting this journey, and have many questions. Summarizing the 3 most popular questions from the event, she answers: How can I convince my org to start on big data now? Do I really have to run it? Is a Data Lake really all one big thing?
Pivotal Education makes it easy to fully realize the capabilities of our technologies by offering a series of free training courses. Designed for developers, system architects, and data practitioners, these online courses engage students through a sandbox environment and interactive labs. The introductory courses enable technologists at any point of engagement with Pivotal technologies — whether during evaluation or after deployment — to become more well-versed, efficient, and effective in their efforts. The current classes provide hands-on experience with Pivotal technologies such as Pivotal HD, Pivotal Cloud Foundry, Pivotal Greenplum Database, HAWQ, Redis, and GemFire.
Pivotal Data Events in July
Jul 7, 2015—Melbourne, Australia
Jul 8, 2015—Sydney, Australia
Join data technology experts from Pivotal to get the latest perspective on how big data analytics and applications are transforming organizations across industries.
Jul 20, 2015—San Francisco, CA
Pivotal is a proud to be a Gold Sponsor of the Data Science Summit 2015. The Summit brings together researchers and data scientists from academia as well as industry to discuss state of the art data science, applied machine learning, and predictive applications.
About the Author