Pivotal People: Henry Saputra, the Engineer Behind Apache Hadoop, YARN and Spark on Cloud Foundry

June 11, 2014 Adam Bloom

Henry SaputraWhat if Apache Hadoop®, YARN, and Spark all had a multi-tenant, elastic model to run better on Cloud Foundry?

It would be pretty powerful, right? Well, that is what Pivotal engineer Henry Saputra is working on. In this Q&A, Henry shares what is going on at the front line of Pivotal’s engineering efforts around open source, big data, and PaaS.

As a short introduction, Henry is an Apache Software Foundation Member, Project Management Committee Member, and committer to several top level Apache projects. He was an active mentor of Apache Spark and is mentoring several ASF incubator projects like MetaModel, Aurora, and Flink. He also has a pedigreed engineering background with work at Siebel, Oracle, Informatica, Yahoo!, Cisco, Jive, Platfora and the OpenSocial Foundation. He also loves playing acoustic guitar! (Please note, we are looking for more experienced open source engineers to help Henry and the team.)

Q: Could you give us a bit about how you grew up, school, activities, interests, and early career?
Sure. I was born and raised in Jakarta, Indonesia. I had three siblings—two brothers and a sister. Growing up was a lot of fun. I really appreciate how my parents provided for us. We had a great childhood. For the most part, I was always interested in math and science. Honestly, I never liked to memorize things or write a lot of words. So, social studies and history were my Achilles’ heel.

After high school in Indonesia, I came to the United States to complete my undergraduate study at the University of Wisconsin-Madison (UW-Madison). Originally, I wanted to be an Industrial Engineer and focus on operations research because I loved math. Looking back, I believed that, someday, I would get paid to do work that I really, truly enjoyed. So many people don’t love what they do, and this was important to me. Later in my studies, I started to create more software programs to solve complex problems in my operations research courses. So, I decided for a double major and earned a Bachelor’s degree in Industrial Engineering and Computer Science.

After graduating, I decided to focus on programming, moved to sunny California, and took a job offer from Siebel Systems. At the time, they were the world’s leading CRM company and the fastest company to reach one billion dollars in revenue. They are now part of Oracle, and I worked at Oracle for a while. Over time, I received a Master’s of Science from University of Illinois at Urbana-Champaign (UIUC) via the I2CS program. This is an online program offered to students who were admitted to the Computer Science Master’s program. This allowed me to keep my daytime job and continue my education.

These days, I am happily married with two kids. We have a three and a six year old. So, life can get hectic. Outside of family time and day time work, I am actively involved in many open source projects, but mostly work on the Apache Software Foundation (ASF) projects.

Q: How did you end up at Pivotal?
Well, there were really three things that paved the path and one that ignited my passion.

First, my software interests evolved. Earlier in my career, I was really interested in algorithms and systems programming. Then, I started getting more and more interested in building a distributed system and in application platforms. With the explosion of data (a.k.a. big Data), distributed systems are everywhere and it is where the most innovative engineering is happening. Distributed systems are the only way to process and makes sense of massive data sets efficiently. It is where the future is going, and my interest is really piqued by solving these type of problems.

Second, I started getting exposure to open source contributions while I was working at Oracle. My team focused on the Application Developer Framework (ADF), and we built plumbing code for Oracle app developers to build apps on top of Oracle middleware. At the time, I was working on the ADF Faces project, and it was based on top of the Apache Software Foundations’s (ASF) MyFaces project. We were adding new features and custom data bindings on top of MyFaces. This allowed us to shorten development time instead of building everything ourselves. At the time, I was really intrigued by the number of people who contributed fixes and new features to Apache MyFaces. We benefited greatly from the additional tests, and it reduced the number of man-hours it took us to build underlying framework code. From that point, I became an open source evangelist and believe that all companies should both build on top of open source and contribute back. The model works extremely well, and you just cannot deny it after you work with it firsthand.

Third, I really saw how open source was used at a big internet company instead of a traditional software company. After Siebel and Oracle, I went to Informatica for a stint. Then, I was at Yahoo! and actively involved in the Apache Software Foundation as part of the Yahoo! Application Framework (YAP) team. We were a founding organization behind the OpenSocial specification. These specs govern the social APIs that allow websites to embed external content or applications. The specification also has a reference implementation called Apache Shindig, and our team decided to build our social engine based on Apache Shindig. We contributed many patches and new features so that everyone using Apache Shindig would benefit. After a while, I was invited to be part of the Project Management Committee (PMC) and as a committer for Apache Shindig. This allowed me to influence the direction of the project, and I had the right to commit code directly to the source repository.

What ignited my passion is the formation of a new team under Roman Shaposhnik. A year ago, I was approached by Pivotal, but I was hesitant. Even though Pivotal had awesome, open source projects, there were not many ASF contributions in the area of Apache Hadoop®. So, I punted on the opportunity. Then, Roman Shaposhnik joined Pivotal earlier this year, and, at that point, I could see there was a strong champion for Apache Hadoop® inside Pivotal. It was very clear how the areas of integration and APIs for an open Apache Hadoop® platform could evolve and how it could collaborate nicely with proprietary software. So, I came on board.

Q: In your words, what is your role in this new Pivotal organization?
Well, I am part of a small Apache Hadoop® open platform team. We are providing leadership, innovation, and code contributions to the Apache Hadoop® ecosystem along with other data-related open source projects. Our engineering team is going to develop stuff that impacts big data’s future.

For example, I am going to focus on improving the integration and deployment of the Apache Hadoop® ecosystem, including YARN and Apache Spark. We are also going to collaborate with Pivotal HD and Pivotal One’s Pivotal HD Service engineering teams to improve development, deployment, management, and scale of Apache Hadoop®.

Personally, I believe that open source can co-exist with proprietary software to add value to customers. We are going to add more plug-in endpoints and APIs to the open source projects we use at Pivotal. This way, we are a good citizen of the open source community, and we are still able to develop our own capabilities and compete in the marketplace. Open source can and should provide most of the plumbing and core infrastructure code so we don’t slow down development with a “reinventing the wheel” syndrome. From my viewpoint, this allows for more investment in innovation.

Q: What are the benefits of being a developer at Pivotal?
There are several! As an open source guy with experience at traditional software companies, I was attracted to Pivotal’s healthy mix of great software products AND open source projects. For example, Spring, RabbitMQ, Cloud Foundry, Redis, and MADlib are all very successful, open source projects. In addition, Pivotal HD with HAWQ, Pivotal Greenplum, and GemFire are all world-class products with solid track records. This breadth and depth of solutions provide any user with a well rounded set of tools to solve any problem, and provide both variety and a sense of completeness for a developer.

Personally, I am most interested in working with data related, distributed systems like Apache Hadoop® and Apache Spark which is where my focus is now. Yet, Pivotal’s big data products are part of a larger open source ecosystem so I have the opportunity to really work on how big data technologies fit easily into a larger cloud-based application environment.

Also, as an open source advocate and a contributor, Pivotal’s open source culture fits perfectly with my vision for how software development should be done by commercial companies. Each day, I get to work in different areas and with different super-talented engineers. As a software developer, this is like being a kid in a toy store.

Lastly, there is the influence of software development practices, like agile programming, from Pivotal Labs. There is a great culture for software development.

If a developer likes awesome projects that are in or around some of the largest open source projects in the world, and also is looking for a continuing a journey of self-improvement, they would probably like it here at Pivotal.

Q: One of our executives, Scott Yara, said that one of the strategic things we are doing for Pivotal One is to address the developer. What does that mean to your product line?
From the Pivotal HD perspective, it means that we need to develop the Apache Hadoop® platform to easily extend and work well with any other product. Of course, we will make sure the Apache Hadoop® ecosystem works well with Pivotal One, Cloud Foundry, HAWQ, GemFire XD, and other Pivotal products like Spring and RabbitMQ. Developers should be able to use our big data environments to easily configure, deploy, run, scale, and manage their apps.

Q: All Pivotal products are obviously targeting the Cloud Foundry environment. What does this mean for your product areas?
Primarily, this means we will improve Apache Hadoop® distribution, provisioning, and multi-tenancy support. It also means better monitoring and diagnostics in the cloud for Apache Hadoop® environments.

As well, Apache Hadoop® is becoming more and more integrated into Cloud Foundry. Developers will be able to access it as a cloud service—either as a data lake or as a stand-alone, distributed processing engine.

Lastly, we are making map reduce or distributed processing via YARN available as a long running service in an elastic manner for data processing. This means Cloud Foundry application developers can provision, use, and release resources to do distributed computing in an extremely efficient manner.

Q: What is the major focus for you and the new Pivotal One platform?
We are going to make the Apache Hadoop® ecosystem on Cloud Foundry more developer friendly, especially in the areas of multi-tenancy and elastic provisioning.

Q: What new Pivotal products are you most excited about working with?
So far, I really like the promise of HAWQ and GemFire XD and how they sit alongside our Apache Hadoop® distribution, Pivotal HD. With support for open standard data formats such as Parquet, customers should have greater assurance—we are not aiming to build some proprietary data lock in with our products.

Also, I am really looking forward to playing more with RabbitMQ. It is amazing to work at the company responsible for RabbitMQ. It is one of the few open source projects I admire outside of ASF projects.

Q: What do you like to do in your personal time when you aren’t living and breathing Pivotal products?
Outside of work, I stay busy with two things: family and open source work at the Apache Software Foundation. These two things basically take 100% of my time, but I am enjoying every second of it.

Q: What is your biggest achievement?
It was amazing to be acknowledged and invited to become a member of the ASF.

Q: What is top on your bucket list of things to do while still on this little rock we call earth?
It is really important to keep teaching my children to be good people. Hopefully, when it is my time to go, they are well equipped to face the real world and bring useful contribution to the society.

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author


Why Is My NTP Server Costing $500/Year? Part 1
Why Is My NTP Server Costing $500/Year? Part 1

Our recent monthly Amazon AWS bills were much higher than normal—$40 [1] dollars higher than normal. What h...

Computing's 3rd Era Pivots Into High Gear This Week At Cloud Foundry Summit
Computing's 3rd Era Pivots Into High Gear This Week At Cloud Foundry Summit

In an hour, the Cloud Foundry Summit will begin in San Francisco's Hilton. The opening keynotes by Pivotal'...