About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Bonus #BigDataBrews: Eric Baldeschwieler Talks About The Future of Hadoop

By on February 18, 2014

We usually just do two 20-minute episodes of Big Data & Brews, but the conversation was so interesting with Eric that I wanted to be sure to share this too. Here are two “mega-trends” Eric said he’s paying attention to when it comes to Hadoop:


Stefan:           What’s the future? You know? Where do we go? I mean, you talked a little bit about Storm and Spark.

Eric:                 Mm-hmm (affirmative).

Stefan:           Obviously, but where do you see Hadoop maybe five years from now even?

Eric:                 Well, there’s kind of two … I’m spending a lot of time right now thinking about the future of Hadoop, and there’s two megatrends that I’m really noodling on. There’s a whole list of features that I could give you …

Stefan:           Yeah.

Eric:                 … but that’s probably another talk. One megatrend is how are Cloud and Hadoop going to converge? I think that’s … there’s a 20-minute segment right there.

Stefan:           Yeah.

Eric:                 I think that’s really interesting. If you look at it, Amazon and Google are two mature proprietary systems that show the two ways it could go. Amazon is a Cloud first, and people are having a lot of success running Hadoop on it. Google built an HPC infrastructure with a real focus on supporting things like MapReduce and that had a HDFS-like storage infrastructure first, and now they do Cloud-like things on top of it, right? They run all their services in, effectively, a Hadoop-like system. Or in at least an Hpc.Scheduler-like system.

So, how are these OpenStack, or how are these Open Source ecosystems going to converge OpenStack Hadoop, and all of the various projects in there? I think that’s really wide open.

Stefan:           Mm-hmm (affirmative).

Eric:                 Right? I mean, right now neither project does what the other set of projects need, but IT managers don’t want both.

Stefan:           Yeah.

Eric:                 Right? They want one common place to store all the data, and one common way to compute all the data. One common way to allocate resources to projects.

Stefan:           Right. They want a plug in the wall, where they just put in … this is my storage and computer and its utility.

Eric:                 Exactly. So they think that that thing is going to be called [00:02:00] OpenStack, but Hadoop is actually getting deployed in a lot more places and at a lot more scale.

Stefan:           Than OpenStack.

Eric:                 Than OpenStack, so how’s that story going to end?

Stefan:           Right.

Eric:                 I have no idea.

Stefan:           Yeah.

Eric:                 There’s a lot of speculation you can do there. The other real megatrend is when we started Hortonworks, we talked about how important it was that the community not fragment. That there be one distribution of Hadoop.

Stefan:           Yeah.

Eric:                 That’s a noble goal, but someone was following me around at a conference the other day and saying, “Admit it! Hadoop, the Hadoop community is fragmented. The Hadoop community is fragmented.” We got into this long argument and ultimately I said, “Well, so what?”

Stefan:           Yeah.

Eric:                 Right? I think, yes, in some ways the Hadoop community, we can argue about how much it’s this way, and how long it’s going to last, but I think the Hadoop community is kind of going into a Unix decade.

Stefan:           Yeah.

Eric:                 If you look at the Unix ecosystem, the Unix APIs came out pretty early. There was the AT&T Unix version and then there was the Berkeley Unix version, and then there was every vendor’s Unix version, and one can argue that this was a terrible thing. That Unix evolved much more slowly than it might have if there had been one.

Stefan:           Right. Well, it’s an evolution.

Eric:                 Yeah, you can argue that, too, and that everybody was slowed down because, as a vendor, if you wanted to write an application for Unix, you had to write it for everyone. You could look at it that way and you could look at the SQL ecosystem and say the same thing. Wouldn’t it be terrific if all the SQLs where the same because then all the people that write SQL apps would have less work to do?

Or, you could turn around and say, “Well, wait a second, look at those huge ecosystems, right?” If you look at the Unix ecosystem, Unix went from an unknown thing to the default …

Stefan:           Multi-billion market [00:04:00] and, you know, a lot of technology and innovation are in different areas.

Eric:                 … and the defaults are the ecosystems on which the systems’ infrastructures are built during that “Unix decade.”

Stefan:           Right.

Eric:                 So I think Hadoop’s going to see the same thing. I don’t know. I’m, of course, a big fan of Apache Hadoop and hope that everybody does continue to base all of their work on that, but whether or not they do, the APIs of Hadoop are being supported by more and more vendors, and more and more products, and more and more distros, be they pure or not pure, all the time and, as a result, I think what’s really interesting, over the next few years, is what are people going to do with Hadoop?

Stefan:           Right.

Eric:                 Right? What is that ecosystem that’s forming above Hadoop? If that does really well, that just drives more of all the Hadoops, and that creates more and more opportunity.

Stefan:           Great.

Eric:                 So yeah, that’s very exciting to watch and see.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.