This Month in Data Science

July 31, 2014 Paul M. Davis

This Month in Data Science for July 2014Data science was a hot topic of discussion in July, with much debate over Facebook’s experiments with users’ emotions based on their behaviors and likes. Despite this, the role of data scientists enjoyed increasing prominence in the fields of sales, healthcare, and sports. Here’s our monthly roundup of the top data science news of the month, both from Pivotal and the entire industry.

Three Reasons Your Sales Team Needs Data Science

Michael Howard, chief executive of analytics consultancy C9, explains the increasing importance of data science in sales for VentureBeat. He says that data science can improve clarity in sales communications, reduce the complexity of forecasting, and rein in inconsistency and subjective over-analysis within sales teams.

Why Big Data Isn’t Enough: Tomorrow’s Technology Will Be Built Around Workflows

The healthcare industry’s extensive efforts to collect and store patients’ records have yet to yield significant improvements in care, argues Acupera’s Chief Technology Officer Imran Qureshi in Wired. Qureshi uses this as a case illustrating the limitations of data warehousing without yielding actionable insight from these records, explaining how integrating data analysis workflows into patient care could improve the entire cycle of treatment, integrating medical records, sensor data, and social and behavioral assessments to better inform health care practitioners’ care.

Data Journalism Could Use a Jolt of Data Science, Too

Data journalism has received plenty of attention this year with the launch of data-driven news sites such as FiveThirtyEight and Vox, with some resulting disappointment in the rigor and depth that these sites are delivering. Derrick Harris at GigaOm argues that much current data-driven journalism is merely charting recent report numbers, and that the breadth of sources and techniques that dedicated data scientists bring to the table could bring additional insight to data journalism efforts.

Forbes Names Data Scientist as The Best Job For Work-Life Balance

A recent study by Forbes and declared data scientist to be the best job for work-life balance, based on salary and employee feedback posted on the site. Reasons cited include the high demand for data scientists, which is leading companies to adopt more flexible schedules and workflows to attract top talent.

Lionel Messi Is Impossible

In light of his star performance during the World Cup, Benjamin Morris at FiveThirtyEight performs a deep dive on the performance of Argentinian soccer player Lionel Messi, looking at the scoring stats behind the vaunted “Messi magic.”

Facebook is Always Trying to Alter People’s Behavior, Says Former Data Scientist

The revelation that Facebook had experimented with its news feed to affect users’ emotions and behavior ignited a firestorm of debate over how personal data is being used by technology companies, and whether the company and researchers stepped over the line. According to former data scientist Andrew Ledvina, Facebook’s data team has conducted similar experiments with little oversight. Some academic researchers pushed back against the public outrage, such as University of Michigan’s Clifford Lampe, who stated “Facebook deserves a lot of credit for pushing as much research into the public domain as they do.” Late in the month, OkCupid’s blog OkTrends supported Facebook by declaring that data scientists frequently experiment with user behavior, and that this is to be expected. Nevertheless, Facebook responded to the uproar with contrition and stated that it is overhauling its internal review process.

Meet Data for Good, the Hacker News for Showing Off Your World-Changing Data Science

VentureBeat reports on Data for Good, a fork of data science news site DataTau focused on data-for-social good projects and initiatives. The site aims to document and model these projects so that others can replicate them in their own regions and communities.

This Month in Pivotal Data Science

Data Science How To: Massively Parallel, In-Database Image Processing: Part 1

Better, faster image processing can have a huge effect on a number of industries ranging from neurobiology and cancer detection to cognitive vision and control robotics. In part 1, Image Processing Expert and Pivotal Senior Data Scientist, Ailey Crow, gives a short introduction on how this science is applied and then demonstrates six steps of the process.

Pivotal Connects ‘Girls Who Code’ With Data Science and Agile Software Development

On July 18th, 28 high school-aged girls arrived at Pivotal’s San Francisco offices for an immersive introduction to Pivotal’s agile software development and data science practices. As part of the Girls Who Code initiative, participants were given an opportunity to try out Pivotal’s pair programming approach while receiving guidance and mentorship from a number of Pivotal’s expert women developers and data scientists.

Exploratory Data Science: When To Use An MPP Database, SQL on Hadoop or Map Reduce

The members of the Pivotal Data Labs team are often asked what tools and platforms they use to analyze large datasets and build cutting edge predictive models. In this post, Ian Huston considers the importance of choosing the right platform and focus on exploratory data science. The team always want to use the right tool for the right job, which means understanding what data processing is needed, performance requirements, and budgetary limitations.

How To: 20 Minute Guide to Get Started with PivotalR

In this article, Pivotal engineer and predictive analytics expert Hai Qian explains how someone new to R can get started performing statistical analysis on data stores in Greenplum Database, Pivotal HD and PostgreSQL in just 20 minutes using PivotalR. First, there is some background on R’s popularity, then the articles dives into important topics such as installation, data loading, and data manipulations for PivotalR.

About the Author


6 decision-making techniques all Product Managers should know
6 decision-making techniques all Product Managers should know

As a Product Manager you constantly make decisions. Many of the decisions you make are made in collaboratio...

CF Summit Video: Swisscom Discusses Using Cloud Foundry and OpenStack/Piston To Build Future Cloud Platform
CF Summit Video: Swisscom Discusses Using Cloud Foundry and OpenStack/Piston To Build Future Cloud Platform

In a video taken at the CF Summit, Swisscom is joined by a panel from Cloud Foundry and OpenStack/Piston Cl...