From Data Silos to Data Lakes: Realizing the Accessible Dream

January 16, 2014 Paul M. Davis

silos-lakeThe era of data silos is nearing its end. The ongoing cycle of data science, and the rapid development of applications built upon those models and insights, will not wait for an IT infrastructure that stores critical information in numerous disconnected locations. The speed and scalability of Hadoop has given rise to the concept of the data lake, which is key to Pivotal’s vision of a unified PaaS. In an article at Forbes, Edd Dumbill characterizes the data lake as “a dream” given the current enterprise climate, but one that remains “an accessible dream.”

In his article, Dumbill offers a succinct and useful definition of the data lake concept:

“The data lake dream is of a place with data-centered architecture, where silos are minimized, and processing happens with little friction in a scalable, distributed environment. Applications are no longer islands, and exist within the data cloud, taking advantage of high bandwidth access to data and scalable computing resource. Data itself is no longer restrained by initial schema decisions, and can be exploited more freely by the enterprise.”

The move from data silos to data lakes will accelerate data-driven insights, app development, iteration, and time to value. But this transition doesn’t happen overnight. Dumbill views this as being a four-part process for an enterprise.

When Hadoop first enters the picture, it primarily serves as an input, with disparate applications and sources contributing data for analysis. Over time, as more data sources are integrated into a growing Hadoop system, this changes into an ongoing cycle of input and output, wherein data drives insight which produces data-aware apps, which in turn contribute back to the growing wealth of information.

The data lake’s opportunities and impacts are well-documented on this blog. It is set to transform corporate IT and security operations, require closer collaboration between data scientists and app developers, spur competition and innovation, and drive new value opportunities.

As Dumbill states in his article, many enterprises remain in the early stages of this transition, but that is quickly changing. Noting that consumer giants such as Google and Facebook already boast these capabilities, enterprises have an imperative to catch up.

“As business is increasingly digital, access to data will become a critical priority,” Dumbill writes, “As will speed of development and deployment. The data lake is a dream that can match those demands.” Providing the knowledge and infrastructure necessary to meet this challenge and enable the “consumer-grade enterprise” is fundamental to the Pivotal One vision.

Learn more about Pivotal and the Data Lake

About the Author

Biography

Previous
The power and structure of push: Second screen solution
The power and structure of push: Second screen solution

Originally posted at EmirWeb by Emir Hasanbegovic Second screen has been a buzzword for quite some time and...

Next
Ruby 2.1.0 changes of note.
Ruby 2.1.0 changes of note.

The Ruby 2.1.0 Release is nearly a month old, so its well past time to look over the changes and uncover t...