Dataiku & VMware Tanzu Greenplum Enable Self-Service Analytics and Data Science at Scale

Together, Dataiku and VMware Tanzu Greenplum allow companies to free data teams to collaborate on petabyte scale data sets, on-premises and in all major public cloud offerings. Data access and governance are baked into a seamless visual development experience, so there is no waiting for access to data sources or DBA assistance.

Checkmark icon

High-Performance Analytics at Petabyte Scale

Dataiku enables data science and analytics teams to load and transform data sets into VMware Tanzu Greenplum during data preparation. Analysts and data scientists can then leverage Greenplum for in-database parallel processing of complex queries, data visualization and reporting.

Checkmark icon

Simplify Collaboration across Data Teams

Dataiku and VMware Tanzu Greenplum facilitate collaboration between cross-functional teams to work on the same projects, sharing datasets and code without impacting performance.

Checkmark icon

Mature Your Data Analytics Operations

Dataiku enables self-service analytics of large datasets stored in VMware Tanzu Greenplum, making it easier and faster to operationalize machine learning models and ensure a tangible business impact. The Dataiku platform also enforces data governance for user roles and teams.

Dataiku: For Everyone in the Data-Powered Organization

Dataiku Overview

Dataiku is the path to Enterprise AI that powers self-service analytics for data analysts, scientists and engineers, while ensuring the operationalization of machine learning. With Dataiku, customers can take predictive analytics and machine learning projects from inception to production at scale to realize tangible business impact.

More about Dataiku

Integration features

Loading and transforming petabyte-scale datasets during data preparation, model building and visualization.

Writes code recipes that create datasets using the results of a SQL query on existing SQL datasets.

Evokes in-database machine learning functions in Apache MADlib.

Performing in-database parallel processing of visual recipes and reporting.

Facilitate cross-team collaboration by sharing datasets, code, best practices—from a single repository—without impacting performance.

Utilizing the underlying MPP architecture of Greenplum to guarantee scalability and performance.

How it works

Dataiku can be deployed on-premises or in the cloud (e.g. AWS, Azure, etc) and connect via JDBC to Tanzu® Greenplum deployments. Dataiku users can then connect to, load, transform and query data tables stored within VMware Tanzu Greenplum.

To facilitate visual development, data engineers can create custom SQL Recipes in Dataiku to invoke in-database analytics functions of VMware Tanzu Greenplum such as those for data preparation and machine learning in Apache MADlib, for geospatial analysis in PostGIS, and text analytics in GPText. This allows data science teams to leverage the MPP architecture of VMware Tanzu Greenplum to process terabyte and petabyte sized data sets in parallel for faster results.

Read the documentation

Get Started
Down arrow

Let’s talk.

Contact us about Dataiku.

Thank you for your interest!

We will get back to you shortly.

Learn More