Data Science: Neither Elementary Nor Magic

October 18, 2012 Paul M. Davis

Photo by Ian-S on Flickr.

A recent GigaOm post, “Why becoming a data scientist might be easier than you think”, has fired up considerable debate in the community. Just from reading the headline, you can probably infer why. Derrick Harris points to the success of a handful of programming neophytes who won Kaggle competitions after taking a class on Coursera, a free online machine learning course co-founded by Stanford professor Andrew Ng. “Maybe the business world has jumped the gun with all the talk about a looming skills shortage in big data and advanced analytics,” Harris writes. “There’s mounting evidence that it doesn’t take much to turn a novice programmer or statistician into a perfectly capable data scientist.”

Not so quick, argues SocialQ co-founder Joseph Misiti, who points to the host of skills data scientists boast — the ability to code in a range of languages, familiarity with software such as Hadoop and processing methods like MapReduce, a deep understanding of real-world probability and statistics problems, the ability to communicate questions and predictions to non-practitioners, and the intuition of experience.

The debate has inspired a long thread on Hacker News, with a consensus emerging that while practitioners come from a wide range of backgrounds, sophisticated statistical analysis, developing predictive algorithms, and dealing with Big Data are are not skills you pick up over a few weekends. Practitioners boast a voluminous amount of specialized knowledge and research experience.

There’s no doubt value to demystifying the tools and process of doing data science: many could benefit from a basic understanding of programming and predictive analytics. While tools like Coursera can offer much to practitioners and neophytes alike, in terms of professional development and understanding the principles and techniques involved, there remains a wide gap between basic literacy and expertise.

