This Month in Data Science: Data Scientist Salaries, All-Knowing Algorithms, and Airbnb's Data-Driven Success

May 30, 2014 Paul M. Davis

Data science news in May emphasized that simply ingesting and analyzing large datasets is no longer enough. Companies need to be be truly data-driven, whether they be enterprises featured at the DataBeat conference, or popular startups like the house-sharing service Airbnb. Here’s our monthly roundup of the top data science news of the month, both from Pivotal and the entire industry.

Meet the algorithm that can learn “everything about anything”

Applying data science to human-generated content remains an imposing Big Data challenge. GigaOm features LEVAN, a machine learning system developed by the Allen Institute for Artificial Intelligence and the University of Washington, which scours text and images on the web to teach itself concepts and their relevant subsets, such as “‘heavyweight boxing,’ ‘boxing ring’ and ‘ali boxing’ which are all part of the larger concept of ‘boxing.’”

2014 Data Scientist Salary Survey

Inside BigData points to a recent research paper by Burtch Works that reveals new insights on the salaries and demographic characteristics of data science professionals. The report places the median base salary of individual data scientists at $120,000, with the median base salary for managers in data science positions being $160,000.

Will this Harvard-born startup be the LinkedIn of big data?

Upstart Business Journal spotlights Experfy, a “marketplace for ‘data geeks’” that addresses the exploding demand for data scientists. The online job marketplace is focused on professionals that hold this highly specialized, and coveted, skillset.

How Airbnb used data to propel its growth to a $10B valuation

The meteoric rise of house-sharing service Airbnb has earned the startup a valuation of $10B. While the appeal of the service for many is its peer-to-peer rental model, behind those friendly faces is a heavily data-driven platform for sharing. Venturebeat speaks with Riley Newman, Airbnb’s head of data science, about how he believes the company’s collective data gives “voice” to its customers.

Here’s the coolest tech we saw at DataBeat 2014

VentureBeat reflects on its two-day DataBeat conference by noting the prevalent themes from the event. Jordan Novet observes that the conversation surrounding big data has evolved, with companies aiming to reap demonstrable value from their data, rather than focusing on the vast amounts they have collected. Other trends Novet identifies include an increased desire for rapid data ingestion and insight among companies, a preference to build on existing tools, and the continued growth of Hadoop.

Big Data in BioMedicine conference focuses on the implications of data for world health

The Stanford School of Medicine hosted its second annual Big Data in BioMedicine Conference last week. The conference brought together academics, policy makers, and industry leaders to discuss how Big Data analytics are transforming medical health, issues surrounding patient privacy and self-reported medical data, and the implications and potential applications for global health.

How Data Visualization Answered One of Retail’s Most Vexing Questions

Gretchen Gavett at the Harvard Business Review looks at how retailers are using location analytics to map the in-store behavior of customers, using store security camera footage to visualize trends in customer movement within a store. Through such analytics and visualizations, storeowners are gaining insight into customers’ shopping patterns, and how to optimize sales by identifying highly trafficked areas of their stores.

This Month in Pivotal Data Science

Pivotal’s Big Data Suite Offers the Best of EMC and VMware

Last month’s announcement of the Pivotal Big Data Suite had a major impact on the industry, with its innovative redefinition of the economics of Big Data. This month, VentureBeat featured a video interview from EMC World with Pivotal’s VP of Product Marketing, Todd Paoletti, in which he discusses the economic and technological benefits of Pivotal’s Big Data Suite.

Introducing R for Big Data with PivotalR

Wouldn’t it be great if there was a way to harness the familiarity and usability of a tool like R, and at the same time take advantage of the performance and scalability benefits of in-database/in-Hadoop computation? Earlier this week, Pivotal announced an R distribution that does just that. PivotalR, a package that translates R code into SQL for processing, is available to download from GitHub today.

Transform Your Skills: Simple Steps to Set Up SQL on Hadoop

In this post, Senior Field Engineer Alfred Domingo shows SQL administrators and developers how easy it is to set up SQL on Hadoop. After providing a quick overview of Pivotal HD and the Pivotal Command Center, he shows us how to use the Pivotal Command Center’s graphical user interface to set up a Hadoop cluster with HAWQ (SQL). He also walks through all the basic steps in the set-up wizard—defining the cluster, versions/services/hosts, topology, configuration, and deployment status.

Upcoming Data Science Events

Hadoop Summit, San Jose – June 3–5, 2014

The 7th Annual Hadoop Summit is the leading conference for the Apache Hadoop community.

Pivotal Open Source Hub Meetup: Develop powerful Big Data Applications easily with Spring XD, New York City – June 4, 2014

Learn how to develop powerful Big Data Applications easily with Spring XD with Mark Pollack, the Spring XD co-lead and Spring Data Lead for Pivotal.

Pivotal Open Source Hub Meetup: How to integrate Hadoop with Systems, San Francisco – June 12, 2014

Learn how to harness the power of Hadoop and integrate it into your Data environment. Take a look at some of the key concepts involved; reduce time to insight and build data driven applications that can be deployed on top of your infrastructure or in the cloud. We will walk through a prototype that can help showcase an end to end workflow from data creation, data ingestion to actionable business analytics.

GigaOm Structure 2014, San Francisco – June 18–29, 2014

Meet the innovators and thinkers who are building infrastructure to run the applications of the next decade.

About the Author

Biography

New Benchmark Results: Pivotal Query Optimizer Speeds Up Big Data Queries Up To 1000x

Previously codenamed “Orca”, Pivotal is releasing new super-efficient Pivotal Query Optimizer as part of th...

What the Cloud Foundry Foundation’s New Silver Lining Says About Open Platforms

Less than a month after new Gold Members signed on, Pivotal announced earlier today that several new friend...

This Month in Data Science: Data Scientist Salaries, All-Knowing Algorithms, and Airbnb's Data-Driven Success

This Month in Pivotal Data Science

Upcoming Data Science Events

About the Author

Previous

Next

Related content in this Stream

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.

If you're into FinOps, you've probably heard of FOCUS. Introducing our FOCUS FlexReports template for AWS, Azure, and GCP. Turn your cloud bills into FOCUS-compliant reports in minutes!

The latest Spring Boot simplifies infrastructure setup with Docker Compose. Now, supporting Bitnami images, it opens new possibilities for developers. Exciting times ahead!

Shape the future of Spring! Participate in the State of Spring Survey 2024. Share insights, collaborate with the community, and drive innovation.

Extend Apache Tomcat support with Tanzu Spring Runtime. Seamless transition, enhanced security, and uninterrupted workflow for Java applications.

Welcome to another edition of What’s new with Tanzu Application Catalog. This is a quarterly round up of all things related to Tanzu Application Catalog.

As we stand at the threshold of a new era in data management, Greenplum continues to lead the industry with its commitment to innovation.

Experience enhanced security with Tanzu Application Platform. Elevate your organization's defenses from code to build with SLSA Level 3, image scanning scheduling & automatic upgrades for new patches.

Explore Spring's exceptional NPS score of 75, surpassing industry benchmarks by 18%. Discover why it matters.