Informatica Joins Data Lake Ecosystem with Capgemini and Pivotal

February 24, 2015 Adam Bloom

featured-informaticaFor many, the business data lake is a vision. Today, Pivotal and Capgemini, are announcing Informatica’s contribution to the data lake ecosystem—certified technologies for Data Integration, Data Quality and Master Data Management (MDM) to help enterprises distill raw data into actionable insights. Together, we bring our collective people, process, and technology capabilities to help more companies implement data lakes, become data-driven businesses, and better compete in today’s environment.

Informatica, the industry leader in integration and data governance, is now part of the data lake ecosystem that Pivotal and Capgemini established and Pivotal recently enhanced with the open source motions of the Pivotal Big Data Suite and the Open Data Platform. The collective partners have a long history of collaboration and share a common set of goals—helping companies become more data driven, lowering corporate barriers to large scale adoption, and reducing the cycle time for innovation.

The Motivations and Hurdles of the Data Lake

For both IT and business leaders, the motivation is clear. A key conclusion from a recent Economist Intelligence Unit Report showed “a clear link between financial performance and use of data.” In other words, higher performers were more data driven, and lower performers were less data driven. Every leader today is advocating using more or better data to improve decision-making, and even CEOs are leading the charge on big data, looking for better decision-making and ways to compete. Ultimately, business units want data to flow seamlessly into one shared location where any part of the entire collection of corporate information can be produced, analyzed, and operationalized without delay and at an improved cost.

However, companies and IT departments still face challenges when it comes to advancing their people, process, and technology towards being more data driven. For example, data integration and governance are largely considered and often cited as unmet needs. In the midst of these types of requirements for more organizational control, corporate data volumes continue to grow. Businesses want to use more sources of data. Complexity increases as sets of structured and unstructured data are combined, and the portfolio of analytics capabilities includes new types of perspectives and approaches, such as an increase in the practice of data science or use of machine learning algorithms. Getting to the data lake isn’t a matter of flipping a single switch, it is a journey.

A Widened Ecosystem—Benefiting Joint Customers

Together, the three partners are helping companies reduce the cost, time, risk, and complexity of data-centric projects and operations while improving how businesses make data-driven decisions. Capgemini has a heritage of delivering in these arenas, and, in the competitive world of tomorrow, business improvements will need to include real-time, predictive analytics and machine learning types of algorithms on larger data sets. By using the thought leadership, industry leading technology, reference architectures, and implementation know-how from the three companies, business units can increase the value they deliver to their organizations in shorter cycle times and know a complete toolset is available to build data lakes.

As an industry leader, Informatica is bringing more power to the business data lake ecosystem, which Capgemini and Pivotal announced back in 2013. First, Informatica has been a longtime partner for the Pivotal Greenplum Database, and their technologies are integrated with Pivotal HD and Pivotal HAWQ, part of the Pivotal Big Data Suite and a foundation for data lakes. Informatica products also add data discovery, metadata management, and ingest. Their pre-built connectors and transformations offer effective, easier approaches for accessing sources of data. Their toolsets allow data quality to be managed and improved while end-to-end data lineages are produced and maintained. As well, master data management capabilities allow data and data relationships to be brought together and managed in one location. Lastly, Informatica’s visual development environments reduce the overall effort required—helping projects deliver as much as five times faster.

Building on Last Week’s Data Lake News

In the midst of this partnership, Pivotal is bringing three new, compelling elements to help companies become more data driven, agile, and innovative:

  1. Open Sourcing the Pivotal Big Data Suite: With companies voicing their preference for open source solutions and ecosystems along with the success of Cloud Foundry, we have decided to release Pivotal GemFire, Greenplum, and HAWQ into the open source world, making data lakes an even closer reality.
  2. The Open Data Platform: In collaboration with 15 industry leaders like GE, Hortonworks, IBM, Capgemini, and Teradata, the group will support Apache Software Foundation (ASF) policies and focus on providing a core, standard of compatible Apache Hadoop® projects and versions. The group will make it easier for software to run on any distribution so that all big data and data lake solution providers can rely on and test against the core baseline, minimizing fragmentation and duplication of efforts.
  3. Big Data Suite with New Application Services: Beyond the ability to store and process any real-time, advanced analytics, or big data workload, our flexible licensing model has expanded to allow for more cloud deployment options on Pivotal Cloud Foundry with built in licensing, and include important data services for things like Spring XD applications, Redis, and RabbitMQ—making it even easier for companies to run and scale analytics and apps on top of data lakes.

Recommended Reading/Learn More:

Editor’s Note: Apache, Apache Hadoop, Hadoop, and the yellow elephant logo are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.

About the Author


A Quick Look at Spring Cloud Data Flow
A Quick Look at Spring Cloud Data Flow

The pressures for real-time data in applications is picking up at the same rate that applications are gravi...

The Real Meaning of Software Transformation for Businesses Today
The Real Meaning of Software Transformation for Businesses Today

As companies and industries embrace the full logic of open communities, automation, and services-oriented a...

SpringOne 2021

Register Now