This Month in Data Science: July 2015

July 31, 2015 Paul M. Davis

This Month in Data Science, July 2015 In the month of July, education programs for data science gained traction, biologists considered how the big data deluge will impact genomics research, and Airbnb revealed how the company introduced data science operations from the outset.

Here’s our roundup of the biggest data science news of the month, both from Pivotal and beyond.

Real-time data science: How to embrace this new reality

Large-scale data analysis is transitioning from working with a mass of historical data to working with the ever-growing deluge of data in real-time. In this post for TechRepublic, John Weathington considers this transition in the context of the Data, Information, Knowledge, Wisdom (DIKW) pyramid, detailing how businesses can gain insights and profit from real-time data.

DataCamp Gets $1M Seed Round To Develop Data Science Learning Platform

Self-led training programs such as CodeAcademy have grown increasingly popular in recent years, as coding has become an essential skill for an increasing number of careers. DataCamp, a Belgian training startup launched in 2014, aims to do the same for data literacy and data science skills, including R, Apache Hadoop^®, and Apache Spark™. The startup received a $1 million seed round in July from investors who see continued and rapid growth in the data science job market, which only boasts an estimated 100,000 practitioners to date.

Taming the ‘Genomical Beast’ with Big Data Resources

The big data explosion is particularly imposing for the biology field of genomics, which expects the amount of genomic data to rapidly increase as the cost of sequencing decreases. A new study published in PLoS Biology compares the weight of computational resources in genomics in comparison to three other significant generators of big data, astronomy, YouTube, and Twitter. The paper details the greater capacity for storage and analysis that the field will require to meet increasing demand in the next decade.

How we scaled data science to all sides of Airbnb over 5 years of hypergrowth

In a blog post at VentureBeat, Airbnb’s first data scientist Riley Newman explains how the ascendant peer-to-peer lodging startup introduced data science to company’s operations from the outset. Five years ago, this required Newman to directly query the MySQL stores of the nascent platform, but as the company and its technology have evolved, so has its data science capabilities, now expanded to every aspect of Airbnb’s operations.

How Data Science Is Fueling Social Entrepreneurship

Social entrepreneurship, a mission-based approach to business that matches business innovation with the desire to do good, has seen significant technology-driven growth in recent years. Increasingly that growth, and the effectiveness of social entrepreneurs’ efforts, are tied to having accurate data and keen insights on the social issues the business aims to address. In an essay at Entrepreneur, Sujan Patel explains that social entrepreneurs armed with the right data are not only more effective in their efforts, but also boast important facts and figures that will engage more individuals with their causes.

Visualizing Data Science Education Programs Across the Globe

The projected growth of data science and shortfall of skilled practitioners is a well-documented issue, with a McKinsey study estimating that the “United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise.” To respond to this challenge, universities have been quick to add data science curriculum to their offerings, with over 450 programs now offering certificate, undergraduate, and graduate programs. This visualization from Pulse maps these programs around the world, a useful tool for potential students looking for data science programs in their neck of the woods.

This Month in Pivotal Data Science

Discussing Modern Data Architecture

Pivotal enterprise architect, Alexey Grishchenko joins the podcast to discuss the evolution from traditional data architectures, through advanced, Lambda and streaming architectures. He also explains how modern data stores like MPP, Apache Hadoop^®, and in memory data stores fit into these modern data architectures.

Using Data Science in Health and Life Sciences

In this episode, Pivotal Perspective’s podcast host Simon Elisha speaks with Sarah Aerni, Pivotal’s principal data scientist who leads our Healthcare and Life Sciences vertical. In the podcast, we explore with Sarah some of the work she’s been doing in healthcare and life science, and how that looks from a data science perspective.

Build Newsletter—Industrial Internet Advances Alongside Big Data

This month’s Build Newsletter highlights how software and data are everywhere by featuring one of the most prolific topics in software development today—the Industrial Internet, also referred to as the Internet of Things (or IoT). We also feature the current impact and methods for big data cleanliness, another hot topic in the past month. Of course, there will be news and commentary on topics we often cover as well—app development, big data, in-memory data grids, open source software, and data science.

UC Berkeley’s AMPLab Drives Big Data Innovation

The massive influx of data, and role of technologies such as Apache Hadoop^®, is well-established among enterprises, industries, and government institutions. But ever-increasing amounts of data and new use cases present new challenges and a need for faster and more malleable technologies. UC Berkeley’s AMPLab brings together academics and businesses to engage in dialogue and collaboration to develop the next generation of big data technologies. Pivotal is a sponsor of AMPLab, supporting UC Berkeley’s effort to connect the brightest minds who are innovating within the big data and data science sphere.

Simplifying Data Science Workflows With Pivotal Cloud Foundry

Ian Huston and Alexander Kagoshima of Data Science at Pivotal Labs delivered a presentation at the Cloud Foundry Summit 2015 demonstrating how they have used Pivotal Cloud Foundry to deliver data-driven applications to clients. Data scientists synthesize a wide range of skills in their efforts to understand complex data sets and deliver insights, and Cloud Foundry enables practitioners to quickly get to work, rather than losing time setting up servers or performing operations tasks. During their talk, the pair detailed the ways that Cloud Foundry can simplify data science workflows and deliver insights to users.

Upcoming Data Science Events

Pivotal Big Data Roadshow : Singapore
Aug 4, 2015
Join data technology experts from Pivotal to get the latest perspective on how big data analytics and applications are transforming organizations across industries.

Technology in Government
Aug 4 – 5, 2015 • Canberra, Australia
The Technology in Government Summit (TiG) 2015 brings together suppliers of ICT and emerging digital technologies, with an innovative, case study packed and public sector led conference.

VMWorld 2015 US
Aug 30 – Sep 3, 2015 • San Francisco, CA
Come meet experts in the Pivotal Booth. This will be four full days of learning—best practices, training, new innovations—all on virtualization and the cloud.

Very Large Data Bases
Aug 31 – Sep 4, 2015 • Hawaii
VLDB is a premier annual international forum for data management and database researchers, vendors, practitioners, application developers, and users.

Editor’s Note: ©2015 Pivotal Software, Inc. All rights reserved. Pivotal, Pivotal Cloud Foundry and HAWQ are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other countries. Apache, Apache Hadoop, Hadoop, and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author

Biography

The Forklifted Application

As a preview of an upcoming book on Cloud-Native Java, co-author Josh Long, who is collaborating with Kenny...

3 Ways to Talk Your Way Out of Developing a Case for DevOps ROI

In this post, Pivotal’s Michael Coté, recaps a recent article on ROI that he wrote for FierceDevOps. Import...

This Month in Data Science: July 2015

This Month in Pivotal Data Science

Upcoming Data Science Events

About the Author

Previous

Next

This Month in Data Science: July 2015

This Month in Pivotal Data Science

Upcoming Data Science Events

About the Author

Previous

Next

Related content in this Stream

How VMware Tanzu CloudHealth helps customers uncover spiraling AWS Extended Support charges.

VMware Tanzu enhances Spring development with simplified operations, accelerated innovation, seamless microservices transition, increased security, and effortless scaling.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

This 7-part blog series provides a roadmap for architecting a data science platform using VMware Tanzu. We'll delve into the building blocks of a successful platform that drives data-driven insights.

Bitnami-packaged open source software is loved by developers for its ease of use, which enables developers to directly pull a Bitnami package and seamlessly start using it with little effort.

VMware Tanzu announces the General Availability of AWS Commitment Discount Recommendations, which provides recommendations for all reservable services in AWS through VMware Tanzu CloudHealth.

Introducing VMWare Tanzu Data Hub, a self-managed Database as a Service (DBaaS) Platform, providing enterprises a way to host their internal DBaaS offering for internal business users.

In the cloud-native landscape, MCAs drive seamless compliance integration. Their expertise ensures proactive security measures align with regulatory standards for sustained innovation & collaboration.

Tanzu Application Platform brings innovation faster with more frequent feature updates. With 1.9, take advantage of enhanced DORA metrics visibility and improved compliance options for companies.

We’re excited to share some great news! Spring Academy Pro content is now free. It will be available to everyone who registers a work, vocational, or educational email address.

March 28, 2024, marks the official minor release date of Spring Cloud Gateway for K8s version 2.2, and it's set to optimize how developers protect access to their GraphQL services.

We are excited to announce that VMware Tanzu Application Service 6.0 is now generally available!

Get a clear picture of your OSS supply chain, and the risks you face from your open source software dependencies, using the all-new Tanzu OSS Health Assessment.

Trivy can now utilize CSAF VEX data to filter out false positives in CVE reports, maximizing the value of VEX documents in VMware Tanzu Application Catalog.

Bitnami-packaged open source software container images available in DockerHub are now signed by Notation, an implementation of the Notary Project specifications and a CNCF-incubating project.

There’s never been a better time to be a Java and Spring developer! Let me show you why with a sneak peak into JD Conference 2024.