A Rough Guide to Data Science

July 9, 2013 Paul M. Davis

Visualization by speedoflife via Flickr.

Visualization by speedoflife via Flickr.

If Big Data was last year’s buzzword, Data Science may reach the same level of hype this year. There’s no shortage of discussion about the high demand for data scientists, the term’s usefulness as a designation, and even declarations of its “sexiness” as a career. And as with many terms that reach a critical mass on social media, data science is a concept more widely discussed than understood. What is data science? What differentiates the practice to justify this new term? And how does someone become a data scientist?

The definition of data science varies among practitioners, but is widely understood as the application of statistical analysis and software engineering to transform vast amounts of data into useful insight. Beyond this, the data scientist iterates on models to further explore questions posed by the data, and then uses techniques such as visualization to communicate the insights and stories revealed from the process.

In a useful new document, “A Practical Introduction to Data Science Skills”, Google’s Michael Manoochehri offers a syllabus for those wanting to learn more about data science, its role in organizations and society, and the common skills, platforms, and frameworks used by practitioners. Manoochehri is the author of the forthcoming book Data Just Right, which aims to disambiguate the role of big data within the modern enterprise, and explore how organizations can not only adapt to this paradigm shift, but embrace it.

And while expert data scientists are in command of numerous mathematical and programming skills, Manoochehri offers some entry points and potential projects for the curious. Many of the “short term skills” he identifies are common among reasonably-technical users — proficiency in Python and JavaScript, familiarity with UNIX and SQL — along with data science-specific learning tasks such as gaining a basic understanding of R and running a Hadoop instance locally. While the long-term skills may be more imposing to neophytes, there’s a lot of free tools, tutorials, and datasets to learn from, and even entry-level skills can be useful for non-profits and municipalities that lack such expertise.

About the Author

Biography

Previous
Rails 4: Testing strong parameters
Rails 4: Testing strong parameters

UPDATE: Thanks to fellow Pivots Alex Kwiatkowski and Rick Reilly, we found that inheriting from ActionContr...

Next
Avoid Repetition with RubyMine's Recent Activities
Avoid Repetition with RubyMine's Recent Activities

During development, it’s common to view and edit the same group of related files, to navigate the same cla...