Data Pipelines

Export, transform, and load (ETL) data into VMware Tanzu Greenplum. Continuously stream data and events. Replicate data between systems, or migrate data to VMware Tanzu Greenplum.

Attach VMware Tanzu Greenplum to data pipelines for extraction, transformation, and loading (ETL) of data into Greenplum. Integrate data between different systems. Connect to any upstream data source, perform necessary preparation steps, and load in parallel into a target Greenplum cluster. Store historical data creation, updates, and deletion events (CDC) in upstream transactional databases or to replicate the current state of databases into VMware Tanzu Greenplum. Enable users to make analytical queries on transactional datasets without impacting upstream database performance.

Extract from any data source

Get data out of disparate systems for analysis in VMware Tanzu Greenplum. ETL frameworks have connectors to many different kinds of data sources, including proprietary and older systems.

Standardize steps for data wrangling

Define pipelines for different steps of data transformation, mapping, cleansing, privacy protection, and augmentation in preparation for loading into VMware Tanzu Greenplum.

Analyze transactional data

Send transactional data to VMware Tanzu Greenplum for reporting and analytical queries to avoid impacting the transaction processing speeds of upstream application databases.

Store entire change histories

Use VMware Tanzu Greenplum scalability to your advantage. Store entire change histories for analysis—even when they greatly exceed the data size of current transactional stores.

Move data at any speed

Handle different latency requirements—from bulk and batch loading to frequent updates in microbatches to continuous streaming of data. Take advantage of VMware Tanzu Greenplum’s parallel loading to speed data ingestion.

Leverage Apache Kafka pipelines

Send data event messages via Apache Kafka pipelines to scale to any velocity. Make messages available for consumption by different applications in addition to loading into VMware Tanzu Greenplum.


Attunity Replicate empowers organizations to accelerate data replication, ingest and streaming across a wide range of heterogeneous databases, data warehouses and Big Data platforms.


Founded by the team that built Apache Kafka®, Confluent builds a streaming platform that enables companies to easily access data as real-time streams.


Gplink makes it possible to create an External Table in VMware Tanzu Greenplum that connects to ANY JDBC connection through a gpfdist process.

HVR Software

HVR is a real-time data integration solution for VMware Tanzu Greenplum with a rich feature set including log-based change data capture (CDC), bulk loading and data validation.

IBM InfoSphere

IBM InfoSphere DataStage is an ETL tool and part of the IBM Information Platforms Solutions suite and IBM InfoSphere.


Informatica is the leading provider of data integration products for ETL, data masking, data quality, data replica, data virtualization, and primary data management.


Outsourcer automates everything for loading data into VMware Tanzu Greenplum from Oracle and SQL Server.


StreamSets is an open source, enterprise-grade, continuous big data ingest infrastructure that accelerates time to analysis by bringing unprecedented transparency and processing to data in motion.