About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Call it a Data River, Not a Data Lake

By on July 2, 2014

**This article originally appeared on Business 2 Community**

We’re drowning in an overflow of data. The exponential growth of data from the Internet of Things, social media, consumer interactions and more is leaving businesses scrambling to understand how to integrate, analyze and visualize it all to find the insights that matter to them.

Given this landscape, it’s no surprise that the big data community is embracing the concept du jour to describe the new storage reality enabled by Hadoop – the “Data Lake”. In 2011 CITO Research’s chief technology officer and editor Dan Wood outlined the concept of a data lake describing it as “a vision for a much wider, less organized form of storing and managing data for business intelligence purposes.” Edd Dumbill, a principal analyst for O’Reilly Radar, recently outlined the four levels of Hadoop maturity that will “lead us to the dream of the data lake.”

But while the term “data lake” may be all the rage, is it actually indicative of what’s happening? The problem with a data lake is that it’s stagnant and still. In today’s world where real-time and unstructured data are constantly flowing, I believe the term “data river” is more accurate.

Much like streams feed into each other to form a fast-moving river, data sources are coming from multiple locations including social and mobile data, purchase history, network logs and more. All of these streams flow together to create the Nile of data rivers, and when you’re ready to extract the insights that will fuel business success, you build a dam.

What’s more is the term data river properly reflects the dynamic dimensions of data including location and time. As more data is generated it becomes a matter of the velocity of the current and the effectiveness of the dam– how fast can the data make it from one point to the other and how quickly can its value be extracted and applied?

As unstructured and structured data continue to grow, big data vendors and thought leaders should not only shift the conversation from data lakes to data rivers, but most importantly, focus on the concept of the dam. Businesses must learn how to harness the power of the data river and implement tools and strategies that tame the data river and allow you to harness out the insights that matter. After all who, wants to be up a river without a paddle?


I initially had this discussion with Pivotal’s chief scientist, Milind Bhandarkar, during his visit for Big Data & Brews. See the clip below.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.