In the month of July, education programs for data science gained traction, biologists considered how the big data deluge will impact genomics research, and Airbnb revealed how the company introduced data science operations from the outset.
Here’s our roundup of the biggest data science news of the month, both from Pivotal and beyond.
Large-scale data analysis is transitioning from working with a mass of historical data to working with the ever-growing deluge of data in real-time. In this post for TechRepublic, John Weathington considers this transition in the context of the Data, Information, Knowledge, Wisdom (DIKW) pyramid, detailing how businesses can gain insights and profit from real-time data.
Self-led training programs such as CodeAcademy have grown increasingly popular in recent years, as coding has become an essential skill for an increasing number of careers. DataCamp, a Belgian training startup launched in 2014, aims to do the same for data literacy and data science skills, including R, Apache Hadoop®, and Apache Spark™. The startup received a $1 million seed round in July from investors who see continued and rapid growth in the data science job market, which only boasts an estimated 100,000 practitioners to date.
The big data explosion is particularly imposing for the biology field of genomics, which expects the amount of genomic data to rapidly increase as the cost of sequencing decreases. A new study published in PLoS Biology compares the weight of computational resources in genomics in comparison to three other significant generators of big data, astronomy, YouTube, and Twitter. The paper details the greater capacity for storage and analysis that the field will require to meet increasing demand in the next decade.
In a blog post at VentureBeat, Airbnb’s first data scientist Riley Newman explains how the ascendant peer-to-peer lodging startup introduced data science to company’s operations from the outset. Five years ago, this required Newman to directly query the MySQL stores of the nascent platform, but as the company and its technology have evolved, so has its data science capabilities, now expanded to every aspect of Airbnb’s operations.
Social entrepreneurship, a mission-based approach to business that matches business innovation with the desire to do good, has seen significant technology-driven growth in recent years. Increasingly that growth, and the effectiveness of social entrepreneurs’ efforts, are tied to having accurate data and keen insights on the social issues the business aims to address. In an essay at Entrepreneur, Sujan Patel explains that social entrepreneurs armed with the right data are not only more effective in their efforts, but also boast important facts and figures that will engage more individuals with their causes.
The projected growth of data science and shortfall of skilled practitioners is a well-documented issue, with a McKinsey study estimating that the “United States alone faces a shortage of 140,000 to 190,000 people with analytical expertise.” To respond to this challenge, universities have been quick to add data science curriculum to their offerings, with over 450 programs now offering certificate, undergraduate, and graduate programs. This visualization from Pulse maps these programs around the world, a useful tool for potential students looking for data science programs in their neck of the woods.
This Month in Pivotal Data Science
Pivotal enterprise architect, Alexey Grishchenko joins the podcast to discuss the evolution from traditional data architectures, through advanced, Lambda and streaming architectures. He also explains how modern data stores like MPP, Apache Hadoop®, and in memory data stores fit into these modern data architectures.
In this episode, Pivotal Perspective’s podcast host Simon Elisha speaks with Sarah Aerni, Pivotal’s principal data scientist who leads our Healthcare and Life Sciences vertical. In the podcast, we explore with Sarah some of the work she’s been doing in healthcare and life science, and how that looks from a data science perspective.
This month’s Build Newsletter highlights how software and data are everywhere by featuring one of the most prolific topics in software development today—the Industrial Internet, also referred to as the Internet of Things (or IoT). We also feature the current impact and methods for big data cleanliness, another hot topic in the past month. Of course, there will be news and commentary on topics we often cover as well—app development, big data, in-memory data grids, open source software, and data science.
The massive influx of data, and role of technologies such as Apache Hadoop®, is well-established among enterprises, industries, and government institutions. But ever-increasing amounts of data and new use cases present new challenges and a need for faster and more malleable technologies. UC Berkeley’s AMPLab brings together academics and businesses to engage in dialogue and collaboration to develop the next generation of big data technologies. Pivotal is a sponsor of AMPLab, supporting UC Berkeley’s effort to connect the brightest minds who are innovating within the big data and data science sphere.
Ian Huston and Alexander Kagoshima of Data Science at Pivotal Labs delivered a presentation at the Cloud Foundry Summit 2015 demonstrating how they have used Pivotal Cloud Foundry to deliver data-driven applications to clients. Data scientists synthesize a wide range of skills in their efforts to understand complex data sets and deliver insights, and Cloud Foundry enables practitioners to quickly get to work, rather than losing time setting up servers or performing operations tasks. During their talk, the pair detailed the ways that Cloud Foundry can simplify data science workflows and deliver insights to users.
Upcoming Data Science Events
Pivotal Big Data Roadshow : Singapore
Aug 4, 2015
Join data technology experts from Pivotal to get the latest perspective on how big data analytics and applications are transforming organizations across industries.
Technology in Government
Aug 4 – 5, 2015 • Canberra, Australia
The Technology in Government Summit (TiG) 2015 brings together suppliers of ICT and emerging digital technologies, with an innovative, case study packed and public sector led conference.
VMWorld 2015 US
Aug 30 – Sep 3, 2015 • San Francisco, CA
Come meet experts in the Pivotal Booth. This will be four full days of learning—best practices, training, new innovations—all on virtualization and the cloud.
Very Large Data Bases
Aug 31 – Sep 4, 2015 • Hawaii
VLDB is a premier annual international forum for data management and database researchers, vendors, practitioners, application developers, and users.
Editor’s Note: ©2015 Pivotal Software, Inc. All rights reserved. Pivotal, Pivotal Cloud Foundry and HAWQ are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other countries. Apache, Apache Hadoop, Hadoop, and Apache Spark are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
About the Author