This Month in Data Science

January 30, 2014 Paul M. Davis

This month in data science, January 2014As we close out the first month of 2014, we’ve seen a plethora of data-driven innovations and breakthroughs during January. From cancer to climate research, social networking apps to viral sensations, January has been an exciting month for the field that demonstrates the promise and increasing prominence of data science.

Top Data Science News in January 2014

Why data science matters to Foursquare

Foursquare users are spending 30% more time with the app since its latest update. In this feature, The Guardian explains the key role that Foursquare’s data scientist Blake Shaw played in this process.

Big Data systems are making a difference in the fight against cancer

O’Reilly Radar investigates how Apache Spark and Hadoop have sped up a key processing pipeline for genomics data at UC Berkeley’s AMPlab, accelerating cancer research efforts.

UN’s climate panel report: Humans ‘Dominant Cause’ of Global Warming

An expansive new report from the UN’s climate panel states that scientists are 95% certain that humans are the “dominant cause” of global warming. According to the BBC, “this dense, 36-page document is considered the most comprehensive statement on our understanding of the mechanics of a warming planet.”

Revealing Data Science’s Job Potential

Much has been made of data science’s job prospects, and the challenges of finding skilled practitioners. To date, this trend has been largely anecdotal, until this survey recently released by O’Reilly Media, conducted at its Strata Conference during 2012 and 2013. While representing a select group of respondents attending the tech conferences, the survey reveals intriguing insights on the demographics, salaries, and skills of practioners in the nascent field.

The data science behind that amazing U.S. dialect quiz

The U.S. dialect quiz, a news app released by The New York Times over the holiday break, quickly went viral and became one of the paper’s most-viewed features of 2013. In this interview with GeekWire Radio, data scientist Scott Golder offers insight on the impressive research undertaken a decade ago by the Harvard Dialect Project, which inspired The New York Times’ viral sensation.

information-destruction-through-history-infographic-final-revisedInformation Destruction Through History

History is tragically littered with instances of large-scale information destruction, from the burning of the Library of Alexandria to the loss of 400,000 priceless books from the Iraq National Library during the Iraq War. This infographic from Global DataVault visualizes the global impact of these events through history, and approximates how many gigabytes of data were lost from each event.

Data Stories #30: The Information Flaneur w/ Marian Dörk

Data science podcast Data Stories interviews Marian Dörk, Research Professor at the University of Applied Sciences Potsdam, about his “Information Flaneur” approach to data visualization, which is “centered on navigating, exploring, browsing and observing the data with curiosity to learn about what’s there to see and to be surprised by new thoughts and discoveries.”

This Month in Pivotal Data Science

Pivotal Hires Hugh. E Williams To Lead R&D

Hugh E Williams

Pivotal announced the addition of Microsoft and eBay veteran Hugh E. Williams as senior vice president of research and development (R&D). In addition to R&D, Williams will oversee quality assurance, product management, technology operations, and university and technical community relations for the Application and Data Fabric teams.

Pivotal’s Data Science Predictions for 2014

More than ever before, the promise of data science was more celebrated and scrutinized last year than ever before. 2014 is set to be the year that the hype subsides and it becomes an essential part of business operations. Moreover, it will drive the development of not only apps, but network-connected objects and devices in the next year.

How Data Science Enables a New Way of Thinking for Corporate IT

No longer merely the unexciting but essential back door plumbing of an enterprise, IT will increasingly take a front and center role in company strategy. Businesses with better and innovative execution of IT will gain a competitive business advantage. As IT operational needs demand increasing complexity, and attacks against the infrastructure grow more sophisticated, a new data science-driven way of thinking will become necessary.

From Data Silos to Data Lakes: Realizing the Accessible Dream


The era of data silos is nearing its end. The ongoing cycle of data science, and the rapid development of applications built upon those models and insights, will not wait for an IT infrastructure that stores critical information in numerous disconnected locations. The speed and scalability of Hadoop has given rise to the concept of the data lake, which is key to Pivotal’s vision of a unified PaaS. In an article at Forbes, Edd Dumbill characterizes the data lake as “a dream” given the current enterprise climate, but one that remains “an accessible dream.”

Time Series Analysis #1: Introduction to Window Functions

Time series data is an ordered sequence of observations of a particular variable. It is found in many real world applications, including click stream processing, financial analysis, and sensor data. Modeling time series data within a database presents a challenge, but fortunately there are tools that can aid in solving many common problems. The first of those tools, and the subject of this article, is the Window Function.

Upcoming Data Science Events

Storage Solutions for Big Data with Hadoop Architect, Sameer Tiwari

Feb 6, 2014, 5:30pm, Pivotal Labs, San Francisco, CA

There is a plethora of storage solutions for big data, each having its own pros and cons. The objective of this talk is to delve deeper into specific classes of storage types like Distributed File Systems, in-memory Key Value Stores, Big Table Stores and provide insights on how to choose the right storage solution for a specific class of problems. For instance, running large analytic workloads, iterative machine learning algorithms, and real time analytics. The talk will cover HDFS, HBase, Redis and introduce Software Defined Storage.

Strata Conference: Driving the Future of Smart Cities – How to Beat the Traffic

Feb 13, 2014, 4pm, Santa Clara, CA

As traffic volumes in cities around the world are constantly growing we are faced with the challenge to track and control car movements in a more detailed and intelligent way to beat the traffic. Pivotal’s Data Science Team has developed several innovative methods to analyze this traffic flow information harvested from real-time and in-car data sources including GPS. Pivotal’s Ian Huston, Alexander Kagoshima, and Noelle Sio will describe how they created these algorithms and show different interesting results from their application.

GigaOm Structure Data 2014

gigaom structureMarch 19–20, 2014, New York, NY

The world’s biggest and most innovative companies are using data to make better products, build bigger profits and even change the world. Join 900+ big data practitioners, technologists and executives as they examine how big data can drive business success. From grand new uses to the nuts and bolts of capturing, storing, analyzing and serving it, get the bottom line on big data now.

About the Author


The Big Data Story (and Webinar) Behind Chinese New Year
The Big Data Story (and Webinar) Behind Chinese New Year

Gong Xi Fa Cai! This is the greeting heard by half the world today, as we welcome in the Chinese New Year. ...

Migrating a Cloud Foundry PaaS to Run on OpenStack
Migrating a Cloud Foundry PaaS to Run on OpenStack

The following is a guest blog post by Julian Fischer (, @railshoster) founder and CEO or...

SpringOne 2021

Register Now