The Eight-Fold Path of Data Science

December 2, 2016


I recently spent an evening in the San Francisco Bay Area with the emerging community of data scientists. Our group included students and academics, practitioners with varying levels of experience and even a smattering of venture capitalists and angel investors, meeting to discuss practical applications of data science with D J Patil, one of the gurus of this new community.

After years of practitioners, academics and industries applying mathematical models to solve practical problems, big data technologies have created an opportunity for scientists and engineers from many different backgrounds to come together and consider how we create meaning from vast amounts of data.

The Bay Area is a hotbed for data science, but its relevance and impact is global. If the Industrial Revolution strengthened the muscular and skeletal systems of the global economy, the Internet of Things is ready to do the same to the economy’s brain and nervous system. Many smart devices already exist — smart energy meters, sensors on car and plane engines. The challenge comes in connecting these devices and the data they produce to accelerate insights and action. We see examples of this in the applications data scientists are already building, such as efforts to make oil drilling platforms smarter so they can catch issues early on and activate appropriate shut-off procedures, preventing explosions such as the one that caused the Gulf of Mexico oil spill.

The basic methodology of analytics, such as the Cross Industry Standard for Data Mining (CRISP), ) remains unchanged. What data scientists have done is build upon that foundation to take into account the increasing complexity of our problems and capabilities of our tools. Annika Jimenez, who leads the Data Science team here at Pivotal, has talked about eight steps of value creation from data in her Disruptive Data Science white paper.

In this paper, I am going to zero in on the data science practices that are part of this process of value creation. The key to a successful data science project following an eightfold path, consisting of four phases and four differentiating factors.

  Download the PDF

Agile Development for Highly Scalable Data Processing Pipelines
Agile Development for Highly Scalable Data Processing Pipelines

Recently, a client asked Pivotal’s Data Science team to help convert some aging T-SQL stored procedures use...

Test-Driven Development for Data Science
Test-Driven Development for Data Science

Joint work by Dat Tran (Senior Data Scientist) and Megha Agarwal (Data Scientist II). This is a follow up p...