September was a big month for data science news, with discussion about how the discipline can be applied to social good projects, and a number of fascinating case studies about how it is being leveraged by hot social web properties such as Facebook, Uber, and eHarmony. Here’s our roundup of the biggest data science news of the month, both from Pivotal and the wider industry.
InformationWeek looks at how governments, social good organizations, and NGOs are utilizing data science to address societal issues, and why such “Data do-gooder” projects should be supported and encouraged by large companies.
In a recap of the recent Association for Computing Machinery annual conference, the New York Times details the prognostications of leading industry and academic data scientists. Hot topics of discussion at the event were the potential of machine learning technologies, the ethical implications of data mining, applications to the healthcare industry, and more.
In an opinion piece for Techcrunch, C9 CEO Michael Howard draws a distinction between “real data science (and) pseudo science,” and urges investors to be on the alert for “quackery.” A number of the companies that claim they are performing data science are in fact “just querying subsets of data to deliver limited findings,” Howard states. He lays out three ways to tell whether a company’s offerings are true examples of data science rather than mere hype.
AllFacebook writes about a recent study by the Facebook Data Science Team tracking the behavior of recently married users to determine the top honeymoon destinations for couples, with Las Vegas and such tropical environs as Hawaii, Cancún, Jamaica, and Brazil being top honeymoon spots.
Wired profiles the efforts of Dan Zigmond, previously chief data scientist for YouTube and Google Maps, to take a data-driven approach to transforming “the future of food.” Zigmond has brought together biochemists, food scientists, chefs, and data scientists to develop a research and development lab for food distribution, preparation, and cooking with the goal of improving nutrition and reducing food waste worldwide.
Uber has become famous—perhaps infamous—for its aggressive efforts to disrupt the taxi business. This innovation extends to how it leverages data science according to this recent VentureBeat article. The company is using its deep stores of user activity data to optimize operations and predict the likely destinations of riders during peak times.
eHarmony has long been secretive about the secret sauce behind its matchmaking algorithms, but the company’s senior analyst of US research and development Jonathan Beber opens the veil slightly in a recent interview with CIO magazine. The company has assembled a data science team that includes three psychologists and three computer scientists to constantly refine and improve the effectiveness of eHarmony’s services for users.
This Month in Pivotal Data Science
The number of images and videos captured by humans using digital cameras is staggering. It has been estimated that an average of 350 million photos are uploaded to Facebook daily, and about 100 hours of video are uploaded to Youtube every minute. This has resulted in enormous “image and video lakes,” a term we use to describe these ever-growing collections of images and videos. Such gigantic image lakes are not unique to consumer services—they are also found in domains such as healthcare and astronomy. In this post, we will demonstrate how a Content-Based Image Retrieval system can be easily and efficiently realized using Apache Hadoop® with HAWQ, our SQL engine for Hadoop®.
Financial firms collect large volumes of data from all realms of our daily lives. These data assets are used to build predictive models for many purposes, such as understanding and predicting customer behavior. In this blog series, we will explain the important factors that enable banks in the retail finance and asset management industries to build, operationalize, and derive actionable insight from such models.
This post, part 2, is a continuation of part 1 where Image Processing Expert and Pivotal Senior Data Scientist, Ailey Crow, gives a short introduction on how data science is applied towards better, faster image processing. This approach can have a huge effect on a number of industries ranging from neurobiology and cancer detection to cognitive vision and control robotics. For part 2, we cover steps three through six: 1) image loading, 2) filtering (smoothing), 3) thresholding, 4) cleanup via morphological operations, 5) object recognition using connected components, and 6) counting.
Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.
About the Author