Pivotal People–Sarah Aerni on How To Become a Data Scientist

April 23, 2015 Sarah Aerni

sfeatured-saerni Data science continues to be a growing field, and, in this post, Pivotal Principal Data Scientist, Sarah Aerni, answers 7 questions, sharing the profile of colleagues on the Pivotal Data Labs team, the type of people Pivotal recruits in data science, and what it’s like working on some of the world’s most compelling data science projects.

In her past, Sarah has been a researcher, consultant, and entrepreneur. She graduated from UCSD with a biology degree and specialization in bioinformatics then completed her masters and PhD in Biomedical Informatics at Stanford.

Why should someone pursue expertise in data science?
Three thoughts come to mind.

One. As someone who works for one of the largest data science teams that supports broad industry experience and expertise, I can tell you firsthand that it is an area of tremendous growth. Pivotal is recruiting and has a significant need for more data scientists because data science is integrated into everything we do at Pivotal—mobile, IoT, cloud, agile, and big data. We get to work on some of the most amazing data science projects for some of the most amazing companies on the planet. We make a difference.

Two. From another perspective, many talk about the demand for data science skills in general. Perhaps you’ve run across articles published in trade magazines, like Network World, where people say data scientists get 100 recruiter emails a day. There are also executive recruiters who research the area, and they say some data scientists are earning more than doctors or lawyers. Or, maybe you saw McKinsey Global Institute’s study—saying that, by 2018, the U.S will have a shortage of 140,000 to 190,000 people with deep analytical expertise or a 50-60% gap between supply and demand. There is a need, and it is valuable. Data is only growing, and we need passionate people to mine for the golden nuggets.

Three. If you are in another field or in college, data science is not a brand new field. It is actually a rebranding of a number of different fields that are now enabled by massive advances in technology, and these advancements let you do some pretty extraordinary things. When you step back and look at the big picture, it is really quite amazing to use numbers and math with the right data and predict an outcome—or even change it. I think years ago it was all just speculative. We could imagine building models to predict if a hospital patient’s condition might deteriorate, and we could transfer them to the ICU before it happened, monitoring them more closely and avoiding a catastrophic outcome. Well, today you can do just that as covered in this Pivotal case study and it’s really just math and data! The work can have a big impact. It’s innovative and creative thinking in a technical realm. You can also change industries fairly easily—moving across financial services, healthcare, security, technology, oil and gas, bioinformatics, and others.

Why is data science so popular and why do people get excited about it?
Alongside big data economics, there have also been great, recent strides in the areas of machine learning, text analytics, natural language processing, and similar areas within data science along with open source tools. Together, we enter this new realm of finding insights in massive amounts of data. It crosses every industry, and the ability to provide value is clearly proven. Some people enter the field to save or improve lives, some get excited about sales and marketing innovations, and others want to focus on cost savings or process optimization. For example, we’ve worked on projects covering the potency of vaccines, TV viewer behavior, teen crisis hotlines, financial compliance, churn prediction, and much more. One of the more interesting, cool projects had to do with car data. We combined surveys, service visits, and sensor-based manufacturing data. With this combined data set, we looked at what is happening over a car’s lifetime—we were looking for relationships between manufacturing issues and consumer happiness to improve engineering.

Why is Pivotal a great place for data scientists to work?
Pivotal has world-class technology solutions for big data and data science—Hadoop, Pivotal Greenplum Database, support for Apache Spark™, Pivotal HAWQ (SQL on Hadoop), and MADlib. As well, Pivotal GemFire was just open sourced as Project Geode. We also have amazing data science customers and unparalleled opportunities to work on really innovative projects. Also, I would say our team is awesome—Pivotal Data Science Labs is full of smart data geeks who love sharing ideas and collaborating. Lastly, we are a leader in the space. In fact, our business unit leader spoke at the White House about 18 months ago. It’s not like we’re just kids in a candy store. It’s like we’re kids in a chocolate factory (that won’t go wonky) with only the best tools and the best ingredients—the customer data and the creative people to make the most amazing candy in the world!

What do people need to be good data scientists, particularly at Pivotal?
The number one thing is probably curiosity and number two is tenacity. To be a good data scientist, I think you have to have this underlying desire to understand data as much as possible and push through any challenges in understanding it. You might see something weird, and you just have to know why. You have to interrogate your own models to figure out why it worked and, just as importantly—why it didn’t. Regarding tenacity, you can’t say “oh well” and move along. You have to dig in and figure out why your model is not performing well, try multiple angles, and see if you can discover a better way of doing it. You also need to learn about new tools all the time in a quickly changing landscape.

Third, I would say you need to communicate and work well with people. For example, it’s important to collaborate with team members, incorporate feedback from domain experts, and present findings to executives. Our models often propose big insights, and it’s natural for others to want to make sense of them without always understanding the math underneath.

Of course, in all this, data scientists should naturally be passionate about technology and using it to crunch data and numbers. And that tenacity comes in to play here too. Sometimes, you have to come up with clever ways to represent things and really get down to the bits and bytes. You have to be creative and not just say “oh well, there isn’t a tool that does it”. After all, we are all still scientists. Searching for answers that haven’t been found before, to questions that may not have been asked before. If there is a boxed solution to it, then it doesn’t need a scientist.

What skills and experience have you seen port over to data science well—in terms of alternate skill sets, what types of backgrounds would Pivotal consider hiring?
We would look at people who do research and analysis with a lot of data—for example, biomedical informatics, electrical engineering, or operations research. Also, if people know basic programming but use it with a lot of data, they can be quite a good fit. In financial services, you find quantitative analysts—people who specialize in mathematical and statistical methods. These folks are very good as long as they don’t stay focused on answering questions in a knowledge driven way. Instead, we need to let the data take you places instead of coming out and asking a question. Actuarial science is the same. It is also fairly easy to migrate solid engineering skills with a strong interest in data. It’s the same with programmers. If you are a computer science engineer who is excited by A/B testing, how predictive models work, or how data could change your product, you probably could move into the field.

One question people ask is about a PhD—it is not necessary.

About the Author

Biography

Bearchoke Tempest, A Collection Of Frontend Spring And CF Tools

Bjorn Harvold is the CTO of Traveliko.com, an online hotel booking website. For over the past decade, he ha...

Will Intelligent Machines Replace Or Complement Human Workers?

In this perspective piece, Pivotal's big data strategist, Jeff Kelly, covers the story of Landr. The compan...

SpringOne 2024

Learn More

Return to Home

Pivotal People–Sarah Aerni on How To Become a Data Scientist

About the Author

Previous

Next

Pivotal People–Sarah Aerni on How To Become a Data Scientist

About the Author

Previous

Next

Related content in this Stream

Tanzu Platform offers customers a rich set of data service offerings as part of the platform, including management, security, and disaster recovery capabilities.

This blog provides a summary of Tanzu CloudHealth news and product updates for the month of June, 2024.

We are thrilled to announce that the initial release of GenAI on Tanzu Platform for Cloud Foundry is now available on the Broadcom Support Portal.

Exciting reveal: VMware Tanzu Greenplum 7.2! Elevate data analytics and AI efforts with enhanced performance, optimization, and transformative features.

FinOps X Recap: The CloudHealth team was onsite meeting customers, presenting in a breakout session, and sharing a first look at the tech preview of our new CloudHealth user experience.

Broadcom recently announced that Arrow Electronics will now be the sole go-to-market provider of VMware Tanzu CloudHealth.

For the 2024 State of Cloud Native App Platform research, it’s clear that the landscape is rapidly maturing. We invite you to join us in exploring these trends further

Introducing an enhanced Tanzu CloudHealth user experience, catered to the needs of a FinOps practitioner by giving them the solutions required to maximize the business value of their cloud usage.

Tanzu Network's functions and capabilities will transition to Broadcom’s Customer Support Portal, starting June 7th, 2024.

We're excited to announce the latest version - v3.13 of VMware Tanzu RabbitMQ. In this blog, we will introduce the highlights of this next step in our journey.

The VMware Tanzu Application Catalog software knowledge graph is a powerful capability that will continue to deliver new product features over the next year, including integrations with Tanzu Platform

Learn how you can more easily use Salt to manage images deployed from Bitnami Application Catalog alongside other workloads and devices in your IT estate, all with open-source-software (OSS).

This blog provides a summary of VMware Tanzu CloudHealth news and product updates for May, 2024.

This blog will introduce the main security outcomes you can expect when deploying your applications on Tanzu Platform along with how you can implement them in your organization.

Unlock the future of software development with our State of Spring 2024 free ebook! Dive into the latest trendsand discover how these advancements accelerate the Spring ecosystem.

In this in-depth exploration of Spring Application Advisor, we'll unravel its multifaceted approach and underscore what makes it the keystone of integration.

We explore the key features of Spring Boot 3.3.0 and its seamless integration with Tanzu Platform, designed to empower developers, software engineers, and tech enthusiasts to build applications.

You can now use Azure Spring Apps to effectively run Spring Batch applications with adaptive cost control.

Discover how Tanzu CloudHealth is meeting the increased demand for the elimination of wasted and unused resources by investing in better, more accurate, and new ways to assist FinOps practitioners.

VMware Tanzu Data Services' vision is to be the #1 choice for application developers seeking easy-to-use, on-demand data services for modern applications.