In the second episode of Big Data & Brews, live from Strata New York, Ovum principal analyst Tony Baer and I discuss recent announcements around big data in the cloud. Google, Amazon, Microsoft… there was plenty to discuss there before we wrapped up with our thoughts on the Open Data Platform initiative.
Andrew: There’ve been a few announcements in the last couple of days around big data in the cloud. Google introduced something called Cloud Data Proc?
Andrew: Maybe you could talk to us a little bit about what your understanding of that is. Microsoft had a couple of announcements.
Tony: With the Azure Data Lake.
Andrew: Azure Data Lake and also their flavor of Hadoop HDInsight on Linux-
Andrew: … Went into general availability on Monday.
Tony: This is not your father’s Microsoft.
Andrew: To that point, Microsoft and Datameer announced a tie up today.
Andrew: Primary question, what’s going on with data analysis in the cloud, and how enterprises is that or isn’t it? Second of all, what about this not being my father’s Microsoft? What does that mean about the power of analytics, if anything?
Tony: Where do I start on that one?
Andrew: Start with the cloud.
Tony: First off, I think, traditionally, the cloud has been … The conventional wisdom is that the Cloud is the option for small, mid-size enterprises that are not going to have IT organizations, or if you’re doing proof of concepts. Essentially, what the cloud really is, it’s kind of a rent versus buy sort of proposition, or a lease versus buy.
Andrew: A lease versus buy and set up all by yourself.
Tony: Yeah. The economics are lease versus buy but also, in part of that, what goes into the equation is that … The conventional wisdom is that after a certain number of years, it’s going to be more economical to buy. On the other hand, when you start looking at, “Do I have the resources to manage, to set this up?” That then skews the economics to the point that basically buying would be unaffordable, therefore, you will still have to stay the lease option.
I think the cloud certainly has that type of draw, no question about that. The other thing is it’s very good if you want to take a very agile type of orientation. You still may have on-premise analytics. If you now have a problem that you want to basically … You’re doing some exploratory analytics, you want to try out some new data sets or munch some different mixes of data sets, the cloud is a great place to do this. You might do that just in the beginning so you have proof of concept. You may keep that in the cloud because maybe the administrative overhead of bringing back on premise may be too great.
I do see that the cloud, and I have to say I have not done extensive research on this, but instinctively, it’s kind of like this is truthiness here. The cloud, ultimately, is going to be a very attractive option for organizations that want to move fast on something. It helps them keep their options open in terms of where they run it.
Andrew: I wonder about data gravity though. Getting one’s data or one’s enterprise’s data up to cloud storage in order to do the analytics there seems-
Tony: A lift?
Andrew: Seems a lift. Unless it’s that people are starting to keep their data in cloud storage as the primary source more and more. Is that what’s happening?
Tony: Put it this way, which is that remember, assuming that lease versus buy metaphor.
Tony: You just said, “What about organizations that don’t have the resources? Where does that factor in?” That factors into the cost of buying, right? Again, if you have data that is on-prem and you want to move it to the cloud that in turn, is a form of a cost. That is really more of a strategic decision in terms of is this high value data that you want to continue maintaining on premises, or that it’s lower value that you want to put in the cloud, or that you’ve made a strategic decision that you do not want to be in the business of running your systems on premises anymore.
Tony: There’s no single formula that’s going to apply to every organization.
Andrew: That makes sense.
Andrew: This Cloud Data Proc thing, I looked at it.
Andrew: It’s almost like this is a service for Google’s other data services, but it’s still fascinating because what it does is it spins out, seems somewhat virtual, but it spins up a Hadoop cluster or notably, a Spark cluster to reference, dial back, to what we were talking about before, in about 90 seconds, and then can take them down just as fast and then bring it back up.
Tony: It’s the shiny new thing, it’s a shiny new piece. The thing is that it’s the classic tortoise and the hare in that basically, you’ve got Amazon and in terms of technology, Amazon relatively speaking, and probably I’ll wash my mouth out. I’ll probably end up recanting my words next week because I’m going out to re:Invent next week. I’ll get the Amazon side.
Andrew: Have fun in Vegas.
Tony: Yeah. Amazon is not known as being a technology leader in terms of specific platform technologies. They are a leader is that they put together the whole cloud stack. They have it over everybody and the thing is that they also have the huge mind share but they also have the operational knowledge and experience. Yes, Google may have-
Andrew: They’re an integrator, they’re an SI.
Tony: Basically, yeah.
Tony: A very low-touch SI, and that has worked for the business that they’ve been searching for which is the infrastructure as a service.
Tony: Google is basically has not known how to market to enterprises, is now trying to tap its expertise in terms of how it runs its own big operation, same origin as Amazon except Amazon did this ten years ago. “Hey, we have better technology, let’s roll this out.” That’s fine. The thing is, it may be faster to deploy on Google but does Google have the management, do they have the integration that Amazon has in its cloud in between all the different pieces of its data stack?
Andrew: Or the will to succeed in this business when their main line business is so incredibly lucrative.
Tony: It’s concentration. The same thing could have been said about Amazon as well, it’s just that Amazon has had a mission to diversify its business from … I don’t want to say from day one but maybe from about, like, day ten.
Tony: Amazon has made that a model whereas Google is outside of basically, its core search engine, outside of things like YouTube, outside of Android, which is pretty impressive, their track record starting new would say whether be consumer enterprise services has been extremely spotty.
Andrew: Right. Listen, I think maybe we should close with the question about something called the Open Data Platform Initiative, or ODPi. It sort of settles, pioneered, if you will, by Hortonworks and IBM and Pivotal.
Andrew: It’s interesting because it seems like Hortonworks is taking less of a dominant role in it, but it’s all about defining something of the spec that various Hadoop distributions can keep to. I don’t know. I’m reminded a little bit of Unix in the eighties and the attempts to take all the fragmented versions of Unix and standardize them.
Andrew Do you think this will be successful?
Tony: I think what it’s going to do-
Andrew: What do you think of the underlying need?
Tony: I think what it really means is that it allows IBM and Pivotal to direct their energies elsewhere. That’s really what it means.
Andrew: There’s a bunch more companies in it than those three, now.
Tony: Right, but in terms of developing … Those are the Hadoop platform providers. For them, it’s that, “We don’t have to keep up with this race with the Apache open source project, do all the certifications. This is going to simplify that.” For them, it’s a good strategy to redirect their resources. Basically, Pivotal laid off most of their Hadoop staff probably six months prior to the announcement. What does that tell you?
Andrew: Do you think Cloudera will join this initiative or do you think that’ll-
Tony: No way.
Andrew: … That’s like a hell freezing over kind of thing?
Tony: Basically, choose your metaphor. The boat has sailed, the train’s left the station.
Tony: Basically, Cloudera has its own platform. Hortonworks has its platform. There’s a certain commonality which is at the API level but otherwise … I think what it really means is that Hadoop itself is commodity, and that all these providers don’t want to have to waste their time having to reinvent the wheel. Where you’re seeing them differentiate is in life cycle management, government, security, different forms of querying. It’s really the higher value …
Andrew: It’s not just about support and training anymore for these guys, right?
Tony: Exactly. I don’t want to say it’s at the application level but its going higher up the stack because that’s where the value is. There’s only so much you can make selling a Hadoop platform.
Andrew: Especially if it’s all open-sourced.
Andrew: I tried to pour you more but the bottles are empty and I think that’s a sign that we’re-
Tony: I’m thirsty.
Andrew: … Pretty much at the end of our chat.
Tony: You have driven me to drink.
Andrew: I can do this over several beers with you, but we try to keep it succinct here on Big Data and Brews. This is Andrew Brust thanking Tony Baer for being our guest.
Tony: Good to see you, Andrew, once again.
Andrew: Come back and visit.
Tony: Will be glad to so long as the beer’s free.
Andrew: All right.