This Month In Data Science

September 30, 2014 Paul M. Davis

datascience-september-2014-300This Month in Data Science: September 2014

September was a big month for data science news, with discussion about how the discipline can be applied to social good projects, and a number of fascinating case studies about how it is being leveraged by hot social web properties such as Facebook, Uber, and eHarmony. Here’s our roundup of the biggest data science news of the month, both from Pivotal and the wider industry.

Data Science That Makes a Difference

InformationWeek looks at how governments, social good organizations, and NGOs are utilizing data science to address societal issues, and why such “Data do-gooder” projects should be supported and encouraged by large companies.

Looking to the Future of Data Science

In a recap of the recent Association for Computing Machinery annual conference, the New York Times details the prognostications of leading industry and academic data scientists. Hot topics of discussion at the event were the potential of machine learning technologies, the ethical implications of data mining, applications to the healthcare industry, and more.

Three Marks Of Real Data Science

In an opinion piece for Techcrunch, C9 CEO Michael Howard draws a distinction between “real data science (and) pseudo science,” and urges investors to be on the alert for “quackery.” A number of the companies that claim they are performing data science are in fact “just querying subsets of data to deliver limited findings,” Howard states. He lays out three ways to tell whether a company’s offerings are true examples of data science rather than mere hype.

JUST MARRIED: Facebook Data Science Team Examines 2014 Honeymoon Check-Ins

AllFacebook writes about a recent study by the Facebook Data Science Team tracking the behavior of recently married users to determine the top honeymoon destinations for couples, with Las Vegas and such tropical environs as Hawaii, Cancún, Jamaica, and Brazil being top honeymoon spots.

Forget GMOs. The Future of Food Is Data—Mountains of It

Wired profiles the efforts of Dan Zigmond, previously chief data scientist for YouTube and Google Maps, to take a data-driven approach to transforming “the future of food.” Zigmond has brought together biochemists, food scientists, chefs, and data scientists to develop a research and development lab for food distribution, preparation, and cooking with the goal of improving nutrition and reducing food waste worldwide.

Uber uses data science to predict where its riders want to go

Uber has become famous—perhaps infamous—for its aggressive efforts to disrupt the taxi business. This innovation extends to how it leverages data science according to this recent VentureBeat article. The company is using its deep stores of user activity data to optimize operations and predict the likely destinations of riders during peak times.

How eHarmony uses data science for matchmaking

eHarmony has long been secretive about the secret sauce behind its matchmaking algorithms, but the company’s senior analyst of US research and development Jonathan Beber opens the veil slightly in a recent interview with CIO magazine. The company has assembled a data science team that includes three psychologists and three computer scientists to constantly refine and improve the effectiveness of eHarmony’s services for users.

This Month in Pivotal Data Science

Content-Based Image Retrieval using Pivotal HD with HAWQ

The number of images and videos captured by humans using digital cameras is staggering. It has been estimated that an average of 350 million photos are uploaded to Facebook daily, and about 100 hours of video are uploaded to Youtube every minute. This has resulted in enormous “image and video lakes,” a term we use to describe these ever-growing collections of images and videos. Such gigantic image lakes are not unique to consumer services—they are also found in domains such as healthcare and astronomy. In this post, we will demonstrate how a Content-Based Image Retrieval system can be easily and efficiently realized using Apache Hadoop® with HAWQ, our SQL engine for Hadoop®.

Churn Prediction in Retail Finance and Asset Management (Part 1)

Financial firms collect large volumes of data from all realms of our daily lives. These data assets are used to build predictive models for many purposes, such as understanding and predicting customer behavior. In this blog series, we will explain the important factors that enable banks in the retail finance and asset management industries to build, operationalize, and derive actionable insight from such models.

Data Science How To: Massively Parallel, In-Database Image Processing: Part 2

This post, part 2, is a continuation of part 1 where Image Processing Expert and Pivotal Senior Data Scientist, Ailey Crow, gives a short introduction on how data science is applied towards better, faster image processing. This approach can have a huge effect on a number of industries ranging from neurobiology and cancer detection to cognitive vision and control robotics. For part 2, we cover steps three through six: 1) image loading, 2) filtering (smoothing), 3) thresholding, 4) cleanup via morphological operations, 5) object recognition using connected components, and 6) counting.

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author


An Interview with Vic Bhagat, EMC’s CIO, and Nominee to CIO Magazine’s CIO 100 List
An Interview with Vic Bhagat, EMC’s CIO, and Nominee to CIO Magazine’s CIO 100 List

EMC’s CIO Vic Bhagat was recently named to CIO Magazine’s “CIO 100 List” for his work in leveraging Big Dat...

Great Things Come in Small Sizes: Writing Stories that Work for your Team
Great Things Come in Small Sizes: Writing Stories that Work for your Team

When I tell people a big part of my job as a Product Manager is to write stories, I tend to be met with bla...

SpringOne 2021

Register Now