10 Skills of Effective Data Scientists

September 3, 2013 Paul M. Davis

Icon by George Montana Harkin via the Noun Project.

Icon by George Montana Harkin via the Noun Project.

The field of data science is a relatively new one, with a growing handful of academic programs making early attempts at setting a rubric. As a result, its definition and the roles it encompasses remains up for debate. There are plenty of broad definitions, but the suite of skills at the disposal of successful practitioners is more oblique. Sure, data science involves statistical modeling and computer engineering, but what precisely does that entail? A recent post at Data Science Central by Mitchell A. Sanders presents an in-depth survey of the skills shared by effective data scientists in the industry.

Sanders segments the activities of a data scientist into three discrete categories:

  • Capture
    • Programming and Database Skills
    • Business Domain Expertise and Knowledge
    • Data Modeling, Warehouse, and Unstructured Data Skills
  • Analyze
    • Statistical Tool Skills
    • Math skills
  • Present
    • Visualization Tool Skills
    • Storytelling Skills

Sanders goes further in his breakdown, acknowledging the diversity of opinions in the field by citing other takes on the topic. Some of the other lists recognize personality traits that are just as important as technical skill and expertise, such as curiosity, innovative thinking, and the indefinable yet useful quality of intuition.

At Data Community DC, NASA Research Scientist Oscar Olmedo diagrams the process of data science in a pyramid which shares much with Sanders’ outline, though it differs in the segmentation of tasks:

Olmedo echoes others in the field with his emphasis on data scientists who can pose the right question. This is an ability that draws upon many of the skills Sanders cites in his post: Domain Expertise and Knowledge, Data Modeling Skills, as well as innovative thinking and intuition. Olmedo writes:

Probably the most difficult part of beginning a data science project is knowing what questions to ask of data. For example, let us say we had a set of data with X number of parameters. We are not just blindly going to start analyzing the data, we must first ask a question of data. Furthermore, we may not even necessary have the data or any data at hand at the time of asking. But once we know the question we can move to the bottom most tier of the pyramid.

Though there are plenty of perspectives about the skill sets of effective data scientists, many of them have much in common and differ largely on which skills are prioritized and what ancillary traits are desirable.

What do you think are the most important skills and traits for data scientists?

About the Author

Biography

Previous
Patrick McFadin – Killer apps using Apache Cassandra
Patrick McFadin – Killer apps using Apache Cassandra

Patrick McFadin gives an introduction to Cassandra and explains how the database can excel in distributed c...

Next
Hiding the Details in RubyMine with Code Folding
Hiding the Details in RubyMine with Code Folding

By selectively hiding and showing sections of code, code folding allows you to focus on what’s relevant, w...