Call it a Data River, Not a Data Lake
**This article originally appeared on Business 2 Community**
We’re drowning in an overflow of data. The exponential growth of data from the Internet of Things, social media, consumer interactions and more is leaving businesses scrambling to understand how to integrate, analyze and visualize it all to find the insights that matter to them.
Given this landscape, it’s no surprise that the big data community is embracing the concept du jour to describe the new storage reality enabled by Hadoop – the “Data Lake”. In 2011 CITO Research’s chief technology officer and editor Dan Wood outlined the concept of a data lake describing it as “a vision for a much wider, less organized form of storing and managing data for business intelligence purposes.” Edd Dumbill, a principal analyst for O’Reilly Radar, recently outlined the four levels of Hadoop maturity that will “lead us to the dream of the data lake.”
But while the term “data lake” may be all the rage, is it actually indicative of what’s happening? The problem with a data lake is that it’s stagnant and still. In today’s world where real-time and unstructured data are constantly flowing, I believe the term “data river” is more accurate.
Much like streams feed into each other to form a fast-moving river, data sources are coming from multiple locations including social and mobile data, purchase history, network logs and more. All of these streams flow together to create the Nile of data rivers, and when you’re ready to extract the insights that will fuel business success, you build a dam.
What’s more is the term data river properly reflects the dynamic dimensions of data including location and time. As more data is generated it becomes a matter of the velocity of the current and the effectiveness of the dam– how fast can the data make it from one point to the other and how quickly can its value be extracted and applied?
As unstructured and structured data continue to grow, big data vendors and thought leaders should not only shift the conversation from data lakes to data rivers, but most importantly, focus on the concept of the dam. Businesses must learn how to harness the power of the data river and implement tools and strategies that tame the data river and allow you to harness out the insights that matter. After all who, wants to be up a river without a paddle?
I initially had this discussion with Pivotal’s chief scientist, Milind Bhandarkar, during his visit for Big Data & Brews. See the clip below.