March 19, 2019


AT POSTGRESCONF   |   NEW YORK   |   MAR 18-22, 2019

Greenplum Summit 2019 has passed. Review the presentations here, and mark your calendar for next year's event:

Greenplum Summit at PostgresConf 2020
March 23-27, 2020
New York City

Greenplum Summit is where decision makers, data scientists, analysts, DBAs, and developers will meet to discuss, share, and shape the future of advanced data technologies.

Greenplum Database® is “Massively Parallel Postgres” for analytics, machine learning, and AI. Greenplum open source database is designed to run on any platform including on-premises, public/private clouds, and containers. It provides powerful and rapid analytics on petabyte-scale data volumes. PostgresConf is a community-driven conference series delivering the largest education and advocacy platform to the Postgres ecosystem. Over 800 attendees are anticipated at PostgresConf 2019, and we want you to be part of it.

Pivotal, as a Diamond Sponsor, has partnered with PostgresConf 2019 to present Greenplum Summit, a full-day event dedicated to Greenplum Database. At Greenplum Summit you’ll examine customer case studies, develop new skills through in-depth tutorials, share emerging best practices in Postgres-based data analytics, and envision the future of data technology while networking with your peers.


Greenplum Summit is where the best minds in massively parallel processing (MPP), open source, Postgres, and advanced analytics/AI come together. At this event you will:

  • Understand how MPP data platforms and powerful ANSI SQL analytics are driving business transformation
  • Find new ways to derive more business value from your data assets by applying machine learning, graph, geospatial, text, and other advanced analytics to your use cases
  • Learn how to take new data analytics and AI projects from experimentation to an operational business solution
  • Discover training and career opportunities for database and analytics professionals
  • Meet-face-to-face with the developers and executives driving Greenplum Database innovation

Stay connected

Join the conversation on Twitter! Follow @PivotalData and @Greenplum and #ScaleMatters for all the news and updates.


Greenplum Summit and PostgresConf 2019 are taking place at the Sheraton New York Times Square. Located in the center of everything, the Sheraton is close to many of New York’s most famous landmarks. Discounted hotel reservations are available.

Monday, March 18 @ PostgresConf

Accelerated, Hands-on Greenplum Training Course
Petabyte Scale Data Warehousing with Open Source Greenplum Database
Marshall Presser, Greenplum Fellow and Author of Data Warehousing with Greenplum
Andreas Scherbaum, Principal Software Engineer, Pivotal
Craig Sylvester, Data Engineer, Pivotal

Read More

It's more than just storing and retrieving data. Equally important are loading high volume data in parallel and running analytics in the database. This hands-on session will lead you through the entire process of creating, loading, and analyzing data in the Greenplum MPP database. It's PostgreSQL, but bigger and DWH-focused.

At the end of this workshop, attendees will learn modern DWH techniques in a PostgreSQL based Massively Parallel Processing platform. This includes the basic architecture of the Greenplum Database, the parallel techniques for loading, querying, and analyzing structured and semi-structured data, as well as the tools Greenplum provides for doing analytics in the database.

Workshop Agenda:

  1. Introduction to MPP and Greenplum
  2. Distribution -- a key to good performance in Greenplum
  3. Parallel loading -- loading multi Terabytes per hour
  4. Loading from s3 and external connectivity
  5. Polymorphic storage and external partitions
  6. Compare external tables to Foreign Data Wrappers
  7. Partitioning vs. Distribution -- how they interact
  8. Difference between PG and GP partitions
  9. Query response time exercises
  10. Running Analytics in Greenplum: MADlib exercise
  11. Analyzing Free Form Text with SOLR and GPText
  12. Monitoring and Managing Greenplum with Command Center
  13. Managing Concurrency with Resource groups and Workload Manager
  14. Running PL/Python and PL/R as Trusted Languages with PL/Container

Pre-requisites: Laptop with a modern browser and SSH client; Instruction on using SSH on Windows; Basic knowledge of SQL

Users will connect to a cloud based Greenplum Cluster.

There will be a maximum of 25 attendees.

Suggested Pre work:

Videos on YouTube Channel

GP Database basics

GP & analytics

GP & MADlib

Read Less

Tuesday, March 19 @ Greenplum Summit

Pivotal Greenplum: Postgres-Based. Multi-Cloud. Built for Analytics & AI
Keaton Adams, Advisory Data Engineer, Pivotal

Read More

Welcome to Greenplum Summit 2019! We are excited to come together once again to share insights and updates on the latest advances to the world's leading fully-featured, multi-cloud, open-source, Postgres-based, Massively Parallel Advanced Analytical database. In the presentations and technical deep-dives that are in the lineup for this year's Summit, you will discover just how far Greenplum has progressed in the areas of integrated analytics, system configuration and monitoring, and ease of deployment, along with advances in industry-leading performance—all delivered by an energetic team focused on open-source innovation.

This session gives an overview of Pivotal Greenplum, a platform engineered to analyze data at speed and scale, providing the flexibility customers require for the integration of a wide variety of data sets, protecting and isolating workloads with the latest available container technology, with the ability to perform advanced analytics using integrated tools such as MADlib, GPText, and PostGIS, along with a host of well-known procedural languages. All of this is accomplished through familiar tools and features that a Postgres architect, administrator, and end user will quickly adopt in order to bring powerful analytics and insights to the organization that they serve.

Read Less


The Present and Future of Greenplum, a Massively Parallel Postgres Database
Ivan Novick, Product Manager, Greenplum Database, Pivotal

Read More

Greenplum Database is at the forefront of global R&D for large-scale big data and analytics use cases. In this session, we will outline the new capabilities and power in Greenplum Database Version 6, as well as summarize the ongoing engineering work in progress including Postgres merging, analytics in a post-Hadoop world, GPU acceleration, high concurrency mixed workloads, Apache Kafka integration, elasticity, disaster recovery and backup, and manageability at scale.

Read Less


AI on Greenplum Using Apache MADlib and MADlib Flow
Frank McQuillan, Director of Product Management, Pivotal
Sridhar Paladugu, Advisory Data Engineer, Pivotal

Read More

Advanced analytics and machine learning are rapidly growing in importance in enterprise computing. Key enterprise data typically resides in relational form, and it is inefficient to copy data between systems to perform analytical operations.

In addition to leveraging the rich set of Postgres analytics like window functions, Greenplum offers machine learning, graph analytics, statistics, and data transformations via the mature Apache MADlib open source project. These capabilities are all Postgres-compatible but designed for massively parallel use cases.

When it comes to deploying to production, modern enterprise AI deployments are ecosystems of machine learning solutions that tightly integrate a feedback loop triggering automated updates to the underlying algorithms, and thus creating closed loop machine learning systems. MADlib Flow has been designed for containerized deployment of AI pipelines to Kubernetes, where Cloud Foundry with Postgres play a key role for low latency prediction.

In this session, we will give an overview of Apache MADlib on Greenplum and MADlib Flow. Topics will include scalability results, roadmap, and an example of a real-time financial transaction fraud prevention system that continuously learns new threat signatures.

Read Less


A Modern Interface for Data Science on Postgres and Greenplum
Scott Hajek, Senior Data Scientist, Pivotal

Read More

Data scientists today expect to work with tools that have good abstractions and interfaces. Pure SQL is not the best interface for data science, but the power and scale of SQL-based systems can be beneficial. This talk introduces a modern interface for Postgres and Greenplum that appeals to data scientists.

The importance of good abstractions and interfaces can be seen in the dominance of R, Python, and PySpark in the data science field and the similarity between their notions of dataframes. Data scientists (DS) do not relish the thought of directly writing SQL strings by hand. Nor for that matter do application developers, hence why the latter prefer object-relational mappers like ActiveRecord, Django, etc. In addition to the cognitive benefits of abstraction, such frameworks cut out error-prone manual steps, avoid dangerous string formatting, and enable more robust testing.

So why wouldn’t a data scientist just avoid SQL-based platforms altogether? Relational databases such as Postgres offer rich analytical abilities and stability, and their MPP variants offer massive scale in storage and distributed processing. Data scientists would value the ability to harness the scale of such systems while having nice abstractions to work with.

Ibis offers DS pythonistas the best of both those worlds. It is a framework for specifying queries and transformations with deferred execution on big data platforms. It looks and feels similar to DataFrame-based tools like pandas and PySpark. Lazy execution with client-side error checking helps by making certain mistakes fail fast, and it encourages delegating all the processing to the database. Ibis already supports Postgres and thus already works with a lot of the functionality on Greenplum. Minor extensions can be made to add the functionality that is special to GPDB. Ibis supports several other pluggable backends, so code written for Postgres/Greenplum could easily be run against other systems like BigQuery, HDFS, and Impala.

Read Less


Distributing Big Astronomical Catalogs with Greenplum
Pilar de Teodoro, Database Expert and Software Engineer, European Space Agency Science Data Centre

Read More

When there is no option to continue scaling up resources, there is a need for scaling out. At the European Space Agency (ESA) science data center (ESDC), we envisage a growth of the archive data stored in our databases of about 50TB in two years. The current technology used, which is vanilla PostgreSQL, will not be enough. In order to fulfill the user requirements for the different missions with such large amounts of data, distributed databases will be necessary. After testing other flavors of distributed PostgreSQL such as Citusdata and Postgres-XL, we investigated the parallel commercial DBMS Greenplum. This talk will describe a number of tests performed with some big ESA astronomical catalogs such as Gaia (1.6B rows) and Euclid catalogues (2.7B rows) with the aim to check the feasibility of the solution.

Read Less




Greenplum and Kafka: Real time Streaming to Greenplum
Sharath Punreddy, Solutions Architect, Pivotal
Niranjan Sarvi, Sr. Data Solutions Architect, Pivotal

Read More

Real time actionable insights have become vital for business success, and Apache Kafka is the de facto standard for near real time data integration for high data volumes. Greenplum-Kafka connector is high-speed, parallel data transfer utility from Kafka to Greenplum. In this session, we will demonstrate the real time streaming using Greenplum-Kafka connector. The presentation also includes the operationalization and features of Greenplum-Kafka connector.

Read Less


How Baker Hughes, a GE Company, Migrated Its Data Lake to AWS and Greenplum
Jayaraman Thiagarajan, Senior Director, Data & Analytics, Baker Hughes
Venkat Gullapalli, VP Service Delivery, Wissen Infotech

Read More

Baker Hughes, a leading oil & gas company, established its Big Data presence by setting up its mission-critical Data Lake on AWS with the consolidation and migration of enterprise data from 45+ data sources, including ERP and non-ERP data flowing into Greenplum Database on AWS amounting to petabyte size storage volume in a highly complex computing environment.

The challenges included the ingestion of high volume of enterprise data (8.5 billion records) ingested into Greenplum database with the creation of analytical and consumption layers for the business users to consume the business critical information using Tableau visualization tool. The tech stack includes Greenplum, HVR, Talend and Tableau ecosystems. The ONE DataLake on AWS is one of the major strategies for BHGE for all their future initiatives in the data & analytics space - making the best use of AWS and Greenplum's MPP capabilities.

Read Less


Greenplum Expert Panel: Greenplum Operations at Scale
Ailun Qin, VP, Morgan Stanley
Dmitriy Pavlov, Product Owner, Arenadata
Eran Shaked, R&D DBA Team Lead, Verint-Systems
Scott Smith, Vice President, Data Warehouse, Conversant Media
Moderator: Greg Chase, Greenplum Business Development, Pivotal

Read More

This panel will feature four leaders from organizations that run extensive operations teams that manage Greenplum Database at large scale for production and business critical use cases. We will dig into pressing issues for operations leaders as they look to have stability and order in their deployments.

Read Less


Bringing Cloud Databases On-Premises with Greenplum and Kubernetes
Oz Basarir, Staff Product Manager, Pivotal

Read More

This session will showcase how customers are using Greenplum on Kubernetes. We will start with an introduction to the product and the various partners and components that make up the ecosystem of AI, BI, ETL, data preparation and data science tools. Then, we will explain how customers can develop data-driven smart apps using this platform and operationalize AI. Finally, we will provide technical details of customer use cases.

Read Less




Building Models Quickly: Addressing Housing Overflow at Purdue
Ian Pytlarz, Senior Data Scientist, Purdue University

Read More

With enrollment growing more quickly than our ability to house students, temporary housing (the kind seen in Buzzfeed articles and typical of early year university housing) was set to grow. In order to reduce the need for this sub-optimal housing, Purdue set about modelling housing contract follow-through to catch students that had no intention to show up on campus and give their housing slots to people in temporary housing, all before any students arrived on campus.

Read Less


Greenplum and the Power of The Cloud - The Marketplace Offerings across AWS, Azure, and GCP
Jon Roberts, Principal Engineer, Data Innovation Lab, Pivotal

Read More

Learn about the Pivotal Greenplum in the Cloud Marketplace products as well as the unique, cloud-only benefits.

  • Demo Deploying
  • Use Cases
  • Cloud Features

Read Less


Achieve Extreme Simplicity and Superior Price/Performance with Greenplum Building Blocks
Derek Comingore, Manager, Data Engineering, Pivotal

Read More

Appliances have been the enterprise standard for retaining and running data warehousing systems for decades. The driving force behind the appliance model’s massive adoption has been simplicity. Enterprise customers have sacrificed both flexibility and openness for simplicity in the appliance era. Pivotal has been busy designing an open and modern reference architecture that encompasses aspects of the traditional appliance model coupled with highly sought flexibility known as Greenplum Building Blocks (GBB).

Advanced, commodity hardware including NVMe technology is being leveraged to achieve rapid analytics and artificial intelligence. The entire GBB stack is driven by our open-source massively parallel Postgres offering, Greenplum Database. In this session, Pivotal will provide both an introductory overview and demonstration of the GBB platform.

Read Less


Maximize Greenplum For Any Use Cases: Decoupling Compute and Storage
Shivram Mani, Principal Sr Engineering Manager, Pivotal and Francisco Guerrero, Software Engineer, Pivotal

Read More

Traditional data warehouses are deployed with dedicated on-premise compute and storage. As a result, compute and storage must be scaled together and clusters must be persistently turned on in order to provide data availability at all times. In the cloud, compute and storage can be decoupled by taking advantage of the ability to request on-demand infrastructure. Greenplum in Kubernetes brings the ability to scale compute horizontally, while S3 and Azure cloud provide storage. This means they can be scaled separately depending on the data engineers’ needs, separating data processing from storage.

In this presentation, we will demonstrate the ability to decouple compute and storage in the cloud using Greenplum and Platform Extension Framework (PXF). Deploying a Greenplum cluster in Kubernetes will give us an elastic MPP database engine. Moreover, PXF will allow us to access data residing in multiple clouds. As a result, we expect increased resource utilization and flexibility, while lowering infrastructure costs.

Read Less


Networking Happy Hour

Wednesday, March 20 @ PostgresConf

We’ve highlighted the Greenplum and Pivotal sessions being presented at PostgresConf below. Visit the PostgresConf website to see the complete agenda for March 20-22.


Scale Matters: Massively Parallel Postgres
Jacque Istok, Head of Data, Pivotal
Ivan Novick, Product Manager, Greenplum Database, Pivotal

Read More

More than 2.5 quintillion bytes of data are created each and every day—and at that rate: Scale Matters. Database workloads at scale are driving some of the most impactful use cases in the world, helping to solve both industry and government’s most interesting problems. Join two of Pivotal’s data thought leaders to hear about how to solve these problems leveraging Postgres at scale, and learn what’s next when it comes to creating a modern data ecosystem for a cloud native world.

Read Less


Learn how Dell Computer Improved Postgres/Greenplum Performance 20x with a Database Proxy
Erik Brandsberg, CTO, Heimdall Data, Inc.
Zack Odom, Field Engineer. Pivotal

Read More

Learn the various techniques on how Heimdall's Database Proxy improved throughput and performance:

  1. Batch DML operations: intelligently process singleton operations as micro-batches.
  2. Intelligently route diverse workloads (i.e. OLTP, OLAP) to Postgres, utilizing Postgres's latest features, such as materialized views for analytic purposes.
  3. Auto-caches into GemFire or other cache engines for improved response times without code changes.
  4. Easy connection pooling.
  5. Automatic write-master failover.

We will be showcasing these features with a demo and follow-up Q/A discussion.

Read Less


Achieving Business Results under Stress/Strain of Government Regulation
Brian Doyle, Vice President and Senior Engineer, Goldman Sachs
Shaun Litt, Vice President of Data Warehousing, Conversant Media
John Knapp, Advisory Data Engineer, Federal, Pivotal
Moderator: Greg Chase, Business Development, Pivotal

Read More

Some business verticals have been subject to government regulation for a long time: health care, finance, even the government itself. These days, if you are storing data about people in PostgreSQL, chances are you are subject to at least increasing regulations related to protecting people's privacy. In this panel, we'll talk with PostgreSQL professionals who must comply with regulatory requirements in managing their data in PostgreSQL, and still meet their business objectives in a productive and timely fashion.

Read Less


Bringing DevOps to Data Science: Operationalize AI Leveraging Postgres
Sridhar Paladugu, Advisory Data Engineer

Read More

Successful enterprise AI applications in 2019 are ecosystems of machine learning solutions that tightly integrate a feedback loop triggering automated updates to the underlying algorithms - creating closed loop machine learning systems.

In order to efficiently build and scale these systems enterprises need reliable, highly performant and extensible data tools to not only wrangle and prepare complex disjoint structured and unstructured data but also build and deploy machine learning algorithms. With the diversity of the Postgres community projects including PostGIS for Geospatial, Apache MADlib Postgres based machine learning, Massively Parallel Postgres analytics engine Greenplum for big data, procedural language extensions to the Python and R package ecosystems and now MADlib Flow for containerized deployment of AI pipelines to Kubernetes, Postgres provides one of the most compelling software solution stacks for enterprise AI available today.

During this 50-minute breakout session, the presenters will:

  • Highlight the advantages of Postgres and Postgres community projects for enterprise AI
  • Recommend a deployment template for closed loop machine learning solutions using Postgres community projects
  • Provide a pre-release preview of the MADlib Flow ML pipeline deployment project
  • Showcase the art-of-possible with Postgres as an enterprise AI software solution with a demo of a real time financial transaction fraud prevention system - built using Greenplum, MADlib and Kubernetes - that continuously learns new threat signatures and can scale to handle a high transaction throughput and low latency response requirements

Read Less


Agile Data Science on Greenplum [using Airflow]
Aditya Padhye, Data Engineer, Pivotal
Ambarish Joshi, Senior Data Scientist, Pivotal

Read More

In this demo we’ll see how to build and manage Data Science workflows in Greenplum using Airflow. We will also showcase how we can quickly iterate on, and continuously improve, data science models that have been deployed.

Technologies used: - Greenplum - Airflow - CircleCI [CI for Deployment] - Astronomer [Hosted Airflow Service]

Read Less

Thursday, March 21 @ PostgresConf

Driving Data Science at Scale Using Postgres and Greenplum with Dataiku
Nicolas Gakrelidz, Technical Alliances Manager, Dataiku

Read More

This session will give Data Professionals (Analytics Leaders, Data Engineers, Data Scientists, Data Analysts) a roadmap for navigating the path to Enterprise AI and driving data science at scale using Postgres/Greenplum with Dataiku.

Digital transformation are the operative words in strategic plans in enterprises across all industries. Organizations must use data to continuously develop more innovative operations, processes and products. This means embracing the shift to Enterprise AI, using the power of machine learning to enhance – not replace – humans.

To do so effectively, organizations need to:

  1. Connect Technology and Subject Matter Experts: bring all the people, from business people to analysts to data scientists, together. This happens via horizontal (team-wide) and vertical (cross-team) collaboration.
  2. Embrace Self-Service: Enable self-service analytics by creating the tools for day-to-day analysis and agile use of data.
  3. Operationalize Machine Learning: Get models out of a sandbox environment and into production to deliver real results.
  4. Build For Tomorrow: Deliver short-term projects successfully while driving an enterprise transformation strategy.

Specifically, organizations need to be able to effectively leverage the data in Postgres and Greenplum to drive this enterprise transformation. This session will explain how to make it happen covering topics including:

  • How data scientists work - processes, tools, languages, data types and more
  • Making Data Teams more productive
  • Defining technical requirements for doing data science at scale
  • Postgres and Greenplum key capabilities
  • Demo: Enterprise AI in action with Dataiku, Postgres and Greenplum
  • How to take advantage of key Postgres and Greenplum features including Apache MADlib, PostGIS, GPText for text analytics, and more

Read Less

Friday, March 22 @ PostgresConf

Career Fair
Meet our team and learn about open positions

Pilar de Teodoro

Database Expert and Software Engineer, European Space Agency Science Data Centre

Jayaraman Thiagarajan

Senior Director, Data & Analytics, Baker Hughes

Ailun Qin

VP, Morgan Stanley

Dmitriy Pavlov

Product Owner, Arenadata

Eran Shaked

R&D DBA Team Lead, Verint-Systems

Ian Pytlarz

Senior Data Scientist, Purdue University

Shaun Litt

VP, Data Warehousing, Conversant Media

Jacque Istok

Head of Data, Pivotal

Ivan Novick

Product Manager, Greenplum Database, Pivotal

Erik Brandsberg

CTO, Heimdall Data, Inc.

Nicolas Gakrelidz

Technical Alliances Manager, Dataiku


Greenplum Summit is part of the larger PostgresConf 2019 event where you can join the Postgres community to take part in educational sessions, networking opportunities, the sponsor expo, and in-depth pre-conference training.

There is no additional cost to attend Greenplum Summit. Simply purchase a PostgresConf Platinum or Gold pass, or a Day Pass for March 19, and your ticket will give you full access to Greenplum Summit.

Get a 25% discount on PostgresConf registration with promo code 2019_GREENPLUM