Data science news in May emphasized that simply ingesting and analyzing large datasets is no longer enough. Companies need to be be truly data-driven, whether they be enterprises featured at the DataBeat conference, or popular startups like the house-sharing service Airbnb. Here’s our monthly roundup of the top data science news of the month, both from Pivotal and the entire industry.
Applying data science to human-generated content remains an imposing Big Data challenge. GigaOm features LEVAN, a machine learning system developed by the Allen Institute for Artificial Intelligence and the University of Washington, which scours text and images on the web to teach itself concepts and their relevant subsets, such as “‘heavyweight boxing,’ ‘boxing ring’ and ‘ali boxing’ which are all part of the larger concept of ‘boxing.’”
Inside BigData points to a recent research paper by Burtch Works that reveals new insights on the salaries and demographic characteristics of data science professionals. The report places the median base salary of individual data scientists at $120,000, with the median base salary for managers in data science positions being $160,000.
Upstart Business Journal spotlights Experfy, a “marketplace for ‘data geeks’” that addresses the exploding demand for data scientists. The online job marketplace is focused on professionals that hold this highly specialized, and coveted, skillset.
The meteoric rise of house-sharing service Airbnb has earned the startup a valuation of $10B. While the appeal of the service for many is its peer-to-peer rental model, behind those friendly faces is a heavily data-driven platform for sharing. Venturebeat speaks with Riley Newman, Airbnb’s head of data science, about how he believes the company’s collective data gives “voice” to its customers.
VentureBeat reflects on its two-day DataBeat conference by noting the prevalent themes from the event. Jordan Novet observes that the conversation surrounding big data has evolved, with companies aiming to reap demonstrable value from their data, rather than focusing on the vast amounts they have collected. Other trends Novet identifies include an increased desire for rapid data ingestion and insight among companies, a preference to build on existing tools, and the continued growth of Hadoop.
The Stanford School of Medicine hosted its second annual Big Data in BioMedicine Conference last week. The conference brought together academics, policy makers, and industry leaders to discuss how Big Data analytics are transforming medical health, issues surrounding patient privacy and self-reported medical data, and the implications and potential applications for global health.
Gretchen Gavett at the Harvard Business Review looks at how retailers are using location analytics to map the in-store behavior of customers, using store security camera footage to visualize trends in customer movement within a store. Through such analytics and visualizations, storeowners are gaining insight into customers’ shopping patterns, and how to optimize sales by identifying highly trafficked areas of their stores.
This Month in Pivotal Data Science
Last month’s announcement of the Pivotal Big Data Suite had a major impact on the industry, with its innovative redefinition of the economics of Big Data. This month, VentureBeat featured a video interview from EMC World with Pivotal’s VP of Product Marketing, Todd Paoletti, in which he discusses the economic and technological benefits of Pivotal’s Big Data Suite.
Wouldn’t it be great if there was a way to harness the familiarity and usability of a tool like R, and at the same time take advantage of the performance and scalability benefits of in-database/in-Hadoop computation? Earlier this week, Pivotal announced an R distribution that does just that. PivotalR, a package that translates R code into SQL for processing, is available to download from GitHub today.
In this post, Senior Field Engineer Alfred Domingo shows SQL administrators and developers how easy it is to set up SQL on Hadoop. After providing a quick overview of Pivotal HD and the Pivotal Command Center, he shows us how to use the Pivotal Command Center’s graphical user interface to set up a Hadoop cluster with HAWQ (SQL). He also walks through all the basic steps in the set-up wizard—defining the cluster, versions/services/hosts, topology, configuration, and deployment status.
Upcoming Data Science Events
The 7th Annual Hadoop Summit is the leading conference for the Apache Hadoop community.
Pivotal Open Source Hub Meetup: Develop powerful Big Data Applications easily with Spring XD, New York City – June 4, 2014
Learn how to develop powerful Big Data Applications easily with Spring XD with Mark Pollack, the Spring XD co-lead and Spring Data Lead for Pivotal.
Learn how to harness the power of Hadoop and integrate it into your Data environment. Take a look at some of the key concepts involved; reduce time to insight and build data driven applications that can be deployed on top of your infrastructure or in the cloud. We will walk through a prototype that can help showcase an end to end workflow from data creation, data ingestion to actionable business analytics.
Meet the innovators and thinkers who are building infrastructure to run the applications of the next decade.
About the Author