This Month in Data Science: Data Scientist Salaries, All-Knowing Algorithms, and Airbnb's Data-Driven Success

May 30, 2014 Paul M. Davis

Data science news in May emphasized that simply ingesting and analyzing large datasets is no longer enough. Companies need to be be truly data-driven, whether they be enterprises featured at the DataBeat conference, or popular startups like the house-sharing service Airbnb. Here’s our monthly roundup of the top data science news of the month, both from Pivotal and the entire industry.

This Month in Data Science for April 2014

Meet the algorithm that can learn “everything about anything”

Applying data science to human-generated content remains an imposing Big Data challenge. GigaOm features LEVAN, a machine learning system developed by the Allen Institute for Artificial Intelligence and the University of Washington, which scours text and images on the web to teach itself concepts and their relevant subsets, such as “‘heavyweight boxing,’ ‘boxing ring’ and ‘ali boxing’ which are all part of the larger concept of ‘boxing.’”

2014 Data Scientist Salary Survey

Inside BigData points to a recent research paper by Burtch Works that reveals new insights on the salaries and demographic characteristics of data science professionals. The report places the median base salary of individual data scientists at $120,000, with the median base salary for managers in data science positions being $160,000.

Will this Harvard-born startup be the LinkedIn of big data?

Upstart Business Journal spotlights Experfy, a “marketplace for ‘data geeks’” that addresses the exploding demand for data scientists. The online job marketplace is focused on professionals that hold this highly specialized, and coveted, skillset.

How Airbnb used data to propel its growth to a $10B valuation

The meteoric rise of house-sharing service Airbnb has earned the startup a valuation of $10B. While the appeal of the service for many is its peer-to-peer rental model, behind those friendly faces is a heavily data-driven platform for sharing. Venturebeat speaks with Riley Newman, Airbnb’s head of data science, about how he believes the company’s collective data gives “voice” to its customers.

Here’s the coolest tech we saw at DataBeat 2014

VentureBeat reflects on its two-day DataBeat conference by noting the prevalent themes from the event. Jordan Novet observes that the conversation surrounding big data has evolved, with companies aiming to reap demonstrable value from their data, rather than focusing on the vast amounts they have collected. Other trends Novet identifies include an increased desire for rapid data ingestion and insight among companies, a preference to build on existing tools, and the continued growth of Hadoop.

Big Data in BioMedicine conference focuses on the implications of data for world health

The Stanford School of Medicine hosted its second annual Big Data in BioMedicine Conference last week. The conference brought together academics, policy makers, and industry leaders to discuss how Big Data analytics are transforming medical health, issues surrounding patient privacy and self-reported medical data, and the implications and potential applications for global health.

How Data Visualization Answered One of Retail’s Most Vexing Questions

Gretchen Gavett at the Harvard Business Review looks at how retailers are using location analytics to map the in-store behavior of customers, using store security camera footage to visualize trends in customer movement within a store. Through such analytics and visualizations, storeowners are gaining insight into customers’ shopping patterns, and how to optimize sales by identifying highly trafficked areas of their stores.

This Month in Pivotal Data Science

Pivotal’s Big Data Suite Offers the Best of EMC and VMware

Last month’s announcement of the Pivotal Big Data Suite had a major impact on the industry, with its innovative redefinition of the economics of Big Data. This month, VentureBeat featured a video interview from EMC World with Pivotal’s VP of Product Marketing, Todd Paoletti, in which he discusses the economic and technological benefits of Pivotal’s Big Data Suite.

Introducing R for Big Data with PivotalR

Wouldn’t it be great if there was a way to harness the familiarity and usability of a tool like R, and at the same time take advantage of the performance and scalability benefits of in-database/in-Hadoop computation? Earlier this week, Pivotal announced an R distribution that does just that. PivotalR, a package that translates R code into SQL for processing, is available to download from GitHub today.

Transform Your Skills: Simple Steps to Set Up SQL on Hadoop

In this post, Senior Field Engineer Alfred Domingo shows SQL administrators and developers how easy it is to set up SQL on Hadoop. After providing a quick overview of Pivotal HD and the Pivotal Command Center, he shows us how to use the Pivotal Command Center’s graphical user interface to set up a Hadoop cluster with HAWQ (SQL). He also walks through all the basic steps in the set-up wizard—defining the cluster, versions/services/hosts, topology, configuration, and deployment status.

Upcoming Data Science Events

Hadoop Summit, San Jose – June 3–5, 2014

The 7th Annual Hadoop Summit is the leading conference for the Apache Hadoop community.

Pivotal Open Source Hub Meetup: Develop powerful Big Data Applications easily with Spring XD, New York City – June 4, 2014

Learn how to develop powerful Big Data Applications easily with Spring XD with Mark Pollack, the Spring XD co-lead and Spring Data Lead for Pivotal.

Pivotal Open Source Hub Meetup: How to integrate Hadoop with Systems, San Francisco – June 12, 2014

Learn how to harness the power of Hadoop and integrate it into your Data environment. Take a look at some of the key concepts involved; reduce time to insight and build data driven applications that can be deployed on top of your infrastructure or in the cloud. We will walk through a prototype that can help showcase an end to end workflow from data creation, data ingestion to actionable business analytics.

GigaOm Structure 2014, San Francisco – June 18–29, 2014

Meet the innovators and thinkers who are building infrastructure to run the applications of the next decade.

About the Author


New Benchmark Results: Pivotal Query Optimizer Speeds Up Big Data Queries Up To 1000x
New Benchmark Results: Pivotal Query Optimizer Speeds Up Big Data Queries Up To 1000x

Previously codenamed “Orca”, Pivotal is releasing new super-efficient Pivotal Query Optimizer as part of th...

What the Cloud Foundry Foundation’s New Silver Lining Says About Open Platforms
What the Cloud Foundry Foundation’s New Silver Lining Says About Open Platforms

Less than a month after new Gold Members signed on, Pivotal announced earlier today that several new friend...

SpringOne 2021

Register Now