All About The New Open Source Greenplum Database

November 11, 2015 Simon Elisha

sfeatured-podcastIt is an exciting time for customers using data-related software—with more data being processed an analysed than ever. To do this effectively you need some heavy duty technology. And one of the best for a long time has been the Pivotal Greenplum Database. Now, this database is available as Open Source—giving customers even more flexibility, and enabling a flourishing community of developers and contributors. In this episode, we explore more about what this means and how you can be a part of it.

PLAY EPISODE

SHOW NOTES

TRANSCRIPT

Announcer:
Welcome to the Pivotal Perspectives Podcast. The podcast of the intersection of Agile, Cloud, and Big Data. Stay tuned for regular updates, technical deep dives, architecture discussions, and interviews. Now let’s join Pivotal’s Australia and New Zealand’s CTO, Simon Elisha, for the Pivotal Perspectives Podcast
.
Simon Elisha:
Hello, everyone, and welcome back to the podcast. Thanks for making the time to have a listen.

Coming to you from Auckland, New Zealand. Not my normal base of operations, but out here visiting some customers and some partners, and having a good old time seeing what’s what here in the north island of New Zealand, which is a very beautiful place if you ever get a chance to come on down.

Today, I wanted to talk to you a little bit about Greenplum, the Greenplum Database, and what is now the world’s first open source MPP data warehouse. What is taking place? Well, we have taken the Greenplum Database technology, which is a massively parallel processing data warehouse, and committed it to open source.

Let’s step back a little bit and just define what it is we’re talking about here. The Greenplum Database is a very well-known database technology, In fact, it was built over the last ten years. It is deployed with many, many customers with some pretty impressive workloads and does a huge amount of work. What it is, is a massively parallel processing, or MPP, database.

This is different from a traditional database in that it spreads the workload across many nodes, or segments, which basically means it slices and dices the work, but provides the end to use all application the same signal interface that they are always familiar with.

What does this mean? This means that we can process far greater amounts of data, IE terabytes to petabytes, with very quick response times, because what we do with this very efficient and complicated technology, to be honest, is split up the work between different segment nodes that handle components independently. They have their own RAM, their own data and highly connected on a highly high bandwidth network connection, and they basically divide and conquer the problem space delivering results that are many times faster than pretty much any technology that you could use.

What this means is you can have a really powerful data warehouse capability and data analytics capability in the one place, because Greenplum also supports things like MADlib and other tools to run queries and run data analytics very close to the data itself in a parallel way. Again, the seeker of speed in most cases is parallization. If you can run many small tasks at the same time, you get the outcome you want.

Greenplum, the Greenplum Database, is now being released under the Apache Software 2.0 licence and is now available for you to enjoy, play with, use, experiment, explore as you like.

This represents over ten years of development and some two million lines of code, which is pretty exciting. It also includes the next generation query optimization technology, as well as, she’s never been available commercially outside of Pivotal. That allows you to get unbelievably great performance from your queries. In fact, some big data queries opt to one thousand times more powerful. This is the Pivotal query optimizer. A huge amount of work has gone into this.

What this means now is that you have access to this technology that you can explore, contribute to, and use, and you can also get commercial support, of course, through Pivotal.

Now, what we’ve done is decided to take a stewardship role of this project, because we believe this a strategic component of our big data efforts as an overall company. The project is maintained for reuse and collaboration with a broader community. We think that particularly the PostSQL community will be really active in this because this is where the Greenplum technology came from. We will continue to sponsor development, maintenance, and innovation for the Greenplum technology itself. This is what we’ve also done with technologies like Pivotal Cloud Foundry®, Spring, RabbitMQ, and Geode.

It’s really exciting. It’s a really exciting milestone because what we’re doing is moving away from the days of closed source vendor lock in. We’re not accepting legacy models anymore. We’re not expecting customers to be locked into particular technology decisions. We’re giving them complete choice and complete flexibility.

In fact, at Pivotal we’ve open sourced all of our cloud and data products inside of ten months. That’s a pretty amazing step when you think about it. Some ten million lines of code that have moved from the commercial propriety’s feed into a thriving open source ecosystem.

We’re pretty excited about that because it means that customers can better things. You can be involved. You can drive the project in the direction you want, and you can contribute changes and improvements as you like.

Where do you go to do all this? Well, have a look at Greenplum.org. That’s g-r-e-e-n-p-l-u-m dot org where you can download the source code, you can join the mailing lists, and you can contribute to that, as well. Also, you can find the Twitter handle: @greenplum, on Twitter, obviously, and participate that way, as well.

Something to have a look at if you’re in the market for a big data solution with a lot of processing power, that you want to use SQL, that you want to use in-memory analytics for, that you want to deploy as a software component in the cloud, on your local premises, in a virtualized environment. You can make the choice and be part of what is promising to be a very exciting and thriving new community.

That’s a bit of a snapshot of the new Greenplum Database in it’s open sourced form. Go visit the website and enjoy.

As ever if there are suggestions or things you’d like to hear on the podcast, you can make suggestions at podcasts@pivotal.io, and until next time from beautiful New Zealand, keep on building.

About the Author

Simon Elisha is CTO & Senior Manager of Field Engineering for Australia & New Zealand at Pivotal. With over 24 years industry experience in everything from Mainframes to the latest Cloud architectures - Simon brings a refreshing and insightful view of the business value of IT. Passionate about technology, he is a pragmatist who looks for the best solution to the task at hand. He has held roles at EDS, PricewaterhouseCoopers, VERITAS Software, Hitachi Data Systems, Cisco Systems and Amazon Web Services.

Previous
All Things Pivotal Podcast Episode #20–Spring Session
All Things Pivotal Podcast Episode #20–Spring Session

One of the key design patterns needed to deploy a new, or migrate an existing, application to the cloud is ...

Next
CODE: Debugging the Gender Gap @ The Napa Valley Film Festival
CODE: Debugging the Gender Gap @ The Napa Valley Film Festival

Pivotal has partnered with the Napa Valley Film Festival this year to sponsor the screening of CODE: Debugg...