Stefan's Blog

Big Data Musings From Datameer's CEO

Bonus #BigDataBrews: Eric Baldeschwieler Talks About The Future of Hadoop

By on February 18, 2014

Tags: , , , , No comments

We usually just do two 20-minute episodes of Big Data & Brews, but the conversation was so interesting with Eric that I wanted to be sure to share this too. Here are two “mega-trends” Eric said he’s paying attention to when it comes to Hadoop:

TRANSCRIPT:

Stefan:           What’s the future? You know? Where do we go? I mean, you talked a little bit about Storm and Spark.

Eric:                 Mm-hmm (affirmative).

Stefan:           Obviously, but where do you see Hadoop maybe five years from now even?

Eric:                 Well, there’s kind of two … I’m spending a lot of time right now thinking about the future of Hadoop, and there’s two megatrends that I’m really noodling on. There’s a whole list of features that I could give you …

Stefan:           Yeah.

Eric:                 … but that’s probably another talk. One megatrend is how are Cloud and Hadoop going to converge? I think that’s … there’s a 20-minute segment right there.

Stefan:           Yeah.

Eric:                 I think that’s really interesting. If you look at it, Amazon and Google are two mature proprietary systems that show the two ways it could go. Amazon is a Cloud first, and people are having a lot of success running Hadoop on it. Google built an HPC infrastructure with a real focus on supporting things like MapReduce and that had a HDFS-like storage infrastructure first, and now they do Cloud-like things on top of it, right? They run all their services in, effectively, a Hadoop-like system. Or in at least an Hpc.Scheduler-like system.

So, how are these OpenStack, or how are these Open Source ecosystems going to converge OpenStack Hadoop, and all of the various projects in there? I think that’s really wide open.

Stefan:           Mm-hmm (affirmative).

Eric:                 Right? I mean, right now neither project does what the other set of projects need, but IT managers don’t want both.

Stefan:           Yeah.

Eric:                 Right? They want one common place to store all the data, and one common way to compute all the data. One common way to allocate resources to projects.

Stefan:           Right. They want a plug in the wall, where they just put in … this is my storage and computer and its utility.

Eric:                 Exactly. So they think that that thing is going to be called [00:02:00] OpenStack, but Hadoop is actually getting deployed in a lot more places and at a lot more scale.

Stefan:           Than OpenStack.

Eric:                 Than OpenStack, so how’s that story going to end?

Stefan:           Right.

Eric:                 I have no idea.

Stefan:           Yeah.

Eric:                 There’s a lot of speculation you can do there. The other real megatrend is when we started Hortonworks, we talked about how important it was that the community not fragment. That there be one distribution of Hadoop.

Stefan:           Yeah.

Eric:                 That’s a noble goal, but someone was following me around at a conference the other day and saying, “Admit it! Hadoop, the Hadoop community is fragmented. The Hadoop community is fragmented.” We got into this long argument and ultimately I said, “Well, so what?”

Stefan:           Yeah.

Eric:                 Right? I think, yes, in some ways the Hadoop community, we can argue about how much it’s this way, and how long it’s going to last, but I think the Hadoop community is kind of going into a Unix decade.

Stefan:           Yeah.

Eric:                 If you look at the Unix ecosystem, the Unix APIs came out pretty early. There was the AT&T Unix version and then there was the Berkeley Unix version, and then there was every vendor’s Unix version, and one can argue that this was a terrible thing. That Unix evolved much more slowly than it might have if there had been one.

Stefan:           Right. Well, it’s an evolution.

Eric:                 Yeah, you can argue that, too, and that everybody was slowed down because, as a vendor, if you wanted to write an application for Unix, you had to write it for everyone. You could look at it that way and you could look at the SQL ecosystem and say the same thing. Wouldn’t it be terrific if all the SQLs where the same because then all the people that write SQL apps would have less work to do?

Or, you could turn around and say, “Well, wait a second, look at those huge ecosystems, right?” If you look at the Unix ecosystem, Unix went from an unknown thing to the default …

Stefan:           Multi-billion market [00:04:00] and, you know, a lot of technology and innovation are in different areas.

Eric:                 … and the defaults are the ecosystems on which the systems’ infrastructures are built during that “Unix decade.”

Stefan:           Right.

Eric:                 So I think Hadoop’s going to see the same thing. I don’t know. I’m, of course, a big fan of Apache Hadoop and hope that everybody does continue to base all of their work on that, but whether or not they do, the APIs of Hadoop are being supported by more and more vendors, and more and more products, and more and more distros, be they pure or not pure, all the time and, as a result, I think what’s really interesting, over the next few years, is what are people going to do with Hadoop?

Stefan:           Right.

Eric:                 Right? What is that ecosystem that’s forming above Hadoop? If that does really well, that just drives more of all the Hadoops, and that creates more and more opportunity.

Stefan:           Great.

Eric:                 So yeah, that’s very exciting to watch and see.

Comments are closed.