Special Big Data & Brews from Hadoop Summit: Neil Raden on all Things Hadoop
I was lucky enough to catch up with Neil Raden at Hadoop Summit and thought it would be a great way to wrap up our special Big Data & Brews segment. Neil is the founder and principle analyst at Hired Brains and I’ve known him for a few years now. He shared some thoughts around what he’s seeing in the Big Data and Hadoop market from an analyst perspective, including verticals making strides in Hadoop adoption, optimization versus innovation (one of my favorite topics) and what he thinks is in store for next year’s Hadoop Summit.
This is our last installment of our Big Data & Brews Hadoop Summit Special Edition, hope you enjoyed!
Welcome to Big Data and Brews. Usually I ask my guest to introduce the beer and the person but yeah why don’t you introduce your drink and yourself?
Neil: Well you know as a pseudo-analyst, I don’t like to endorse any product in particular so, this is a delicious brown cold beverage that I’m drinking.
Stefan: Please introduce yourself.
Neil: Oh sorry. I’m Neil Raden. I’m the founder and principal analyst with Hired Brains Research. Very happy to be here and looking forward to having a chat with you Stefan.
Stefan: Great. What are you guys, who’s calling you and what kind of question do you get in the last two years and how did it change over time?
Neil: Mostly they ask me why the name of my company is Hired Brains. And after they work with me they are really confused. No, enterprise clients are trying to sort out the weed from the chaff, the noise of Big Data and Hadoop and what it means. IT people are pretty polarized. They think it’s a lot of crap or they can’t wait to get their arms around it because it’s something big to do. What I found in companies is there are some collision of thinking about this because the people who are the most gung-ho on Open Source, Big Data and Hadoop are people who weren’t really part of formal IT.
They started in the organizations building websites then started doing web analytics and that sort of thing and the traditional IT people were involved with networks and security, and databases and what needs to come together there’s often some real collision. But the problem with all of this and I’m talking about ordinary kinds of enterprises not companies like Google or Yahoo or even small companies that are based on Internet business or Internet data but manufacturing companies, retail companies that kind of stuff even health care. The problem is they are focused on technology and one of the things I found very attractive about Datameer from the beginning was the technology was sort of de-emphasized and the ability to use it and get useful information was emphasized.
There’s not enough of that in this industry. If you look around this floor, a lot of people are showing you, layer cake charts with tons and tons of pieces and arrows, I call it the mythology of the arrows meaning they don’t tell you what the arrows do but they’re actually the most important part of the diagram. How did you get from this module to this module to this module, well this is an arrow. So there’s not enough concentration on, over the course of my career I’ve seen this over and over again. It happened with data warehousing. But very quickly the industry and the conversation turn to databases and database optimizers and work-flow management and networks and storage and security and everybody forgot about the whole point of doing it which was to inform people to make better decisions and do better work.
Stefan: Right. Do you see it in your work the different verticals adopting this new generation of technology and different speeds? Is the health care moving as fast as financial sources, what?
Neil: I don’t know that I’m in a position to really evaluate that because the number of organization I work with is relatively small. I don’t survey the market. So I don’t know that for a fact. The only thing I do know is what I hear anecdotally and I think that there’s a huge surge of energy in health care but that’s only because the market is so big. In fact it may represent time fraction, health care, I don’t know. You always hear the same sort of serial vendors when a new technology comes along. It’s always telco, financial services, right? And depending on the application, maybe retail but retail generally are almost the last people to adopt technology because they don’t like to spend a lot or money. And the reason is the margins.
Stefan: Right. That’s their business.
Neil: That’s the kind of business that they’re in. I don’t know, I mean, you know we’ve all heard about the Internet-based businesses and about telco, maybe on health care, let’s not forget about government and then I think the biggest adopter of this technology and probably the quietest adopter of this technology is government, both in the US and abroad, and particularly in defense and intelligence business. That’s where I’ve seen a lot of it.
Stefan: People always running into the same mistakes also historically with data biology.
Neil: I think it’s a little early to talk about mistakes because most companies that have adopted Hadoop for example have adopted it in the skunkworks and it’s in kind of the investigative stage. The ones that I’ve seen that have really, really gone full on it are big companies but obviously very big. Now I’m working with a company right now in Singapore and they’re looking at a very interesting problem which is reputation risk. When you look at the parcel of technologies they need to bring there, real-time, analytics of data from 50 to a hundred different sources from data pools from social networks and rss, you name it, right?
They’ve got to pull the stuff in and they’ve got to assess reputation risk on the spot. Then, they have to do something with it. They have to find a way to disseminate what they’ve discovered in a way that the people who need to make those decisions can make those decisions. This is a company that has 20 or 25 people. They’re using machine learning, they’re using all sorts of very advanced technologies.
Stefan: What do you see in your customer base being adopted from all the different specs in the Hadoop world? It’s a jungle now. What is hyped? What is solid that you are really going to production?
Neil: I think a lot of Hadoop is getting dumbed down in the enterprise. Once it became respectable to talk about SQL, SQL has kind of taken over Hadoop. So that at the end of the day by the time you add security and governance and profiling and masking and everything else, it doesn’t look any different than what we had before except there’s a much bigger file system down at the bottom with a lot of different kinds of data. A lot of enterprise IT organizations are just jumping on, they’re just trying to figure out which SQL tool to use with Hadoop.
Stefan: Do you lose opportunity there you think?
Neil: I don’t know yet. I think, intuitively I think the answer is yes. I never thought SQL was a good language for an analytics to being with.
Stefan: I’m sorry can you say it again? No I’m kidding.
Neil: You’re talking to a guy whose first programming language was APL, right? And when somebody told me I had to use SQL to do analytics, I really thought about going to like Dental school. You know I just, it was awful. But a couple of BI companies came along, that were clever enough to mask, optimize SQL generation behind analytics that worked pretty well. I’m talking about roll-app tools. But I still think of those roll app tolls generators or something other than SQL would’ve been even better. But the problem with Hadoop is everybody who has applications now in Hadoop, if you look carefully at them you have people writing codes in Java or Python or R or something else.
There are no tools, there are no questions to say there are no tools. But there aren’t enough tools and there’s not enough, you and I already talked about this, Hadoop and Big Data, the whole focus has been on data and size and architecture and everything but what is the point of any of that if you can’t help make informed decisions. Business analysts are not going to sit here, ok so i can sift through two PetaBytes of data in 5 seconds and now I’m going to spend the next day writing, intuiting a SQL theory.
Neil: That doesn’t make any sense, right? It has a little more automatic than that. I think that Tony Dare of Ovum said something very important today, he said, “Until business analysts can get their hands on Big Data without Java or Python or R, not much is going to happen.” I agree with that 100%.
Stefan: So the future is a visual front-end?
Neil: I guess it would be visual. I’d rather it was like, “Computer can I change this price or am I going to get killed? Right” That’s, today is 2014 for God’s sake, why don’t we have any that?
Stefan: We talked to him.
Neil: I have a problem with visual interfaces. Sometimes visuals interfaces are more complicated than writing code, depending on how their like, “OK I’m going to press this button, and am I going to get the right answer or am I going to launch an invasion of Crimea? Right?” I mean I don’t know what’s going to happen behind this button. Visual interfaces are good if they’re good but I would really love natural language. I hope I live long enough to see that.
Stefan: Sometimes it’s a little sad how little enterprise software applications actually invest into a user experience. Like they’re always saying, “Well it’s not consumer software but they’re smart enough to figure it out.” And then you spend 3 months along another crappy product even though you already have a crappy C-arm system and a crappy system.
Neil: Well, it’s hard. You can find fault with IT but their job isn’t easy either, right? They’ve got a budget this big and they think they need a budget this big and they come through, if you look at a person in IT who’s been on IT for more than 10 or 15 years, they come from a mindset and a methodology in a way of doing things I call managing scarcity. Meaning, we never had enough money, we never had enough CPU, we never had enough memory, we have to find they easiest way to do things. So we got a date field here, let’s just lob off the first two characters here right?
Stefan: Right. That’s good enough. Oh two months later, ahh, we really need them.
Neil: And then and now you have the Hadoop big Data crowd and these are the guys, I used to call them the ponytail guys but now I call them the Hoodie guys. They’re off building websites and doing web analytics and messing around with stuff and they’re not working from managing scarcity. They’re working with “I can do anything just leave me alone and I’ll go do it.” Well that isn’t IT. IT is not about “leave me alone I can do it”. IT is all about control and controlled release and security and reliability and everything else and those two cultures are going to clash. But it would seem that what Irv referred to as traditional Hadoop, the Hadoop, what was it before it was 2.0? It was 0.something rather, right?
Neil: Yeah 0.23. 0.23 is going to exist in a million places. There are going to be clusters running 0.23 with a few guys writing programs and crunching stuff. But what we’re going to be seeing in the enterprise is going to be something else, it’s just going to have the same name. It’s going to be called Hadoop, and there’s going to be the distributed file system down below and even that reduces is going to disappear, it’s all going to disappear. Hopefully Hive will disappear sooner or later but you’ve got Spark, you’ve got Tez and Stinger and all these other things coming along. So there’s all these pieces that are being called Hadoop that are just very, very different and there are going to be more palatable to enterprise IT because they look more like enterprise software.
Stefan: Do you buy in that Hadoop is small platform and virtualizes storage computer memory and there’ll have different engines running that?
Neil: I don’t know.
Neil: I don’t have any answer to that. I am not sure.
Stefan: So the idea would be the vertigo virtualization had one physical machines yet many virtual machines on top of that and Hadoop is upside down, you have one virtual machine, right? Virtual hard drive, virtual cpu maybe virtual memory, Storm or whatever and many physical machines on that. And then that’s why the Hoodie crowd is saying, “I don’t care about storage and computer anymore I just do it. I just rent five more machines on Amazon.”
Neil: Well here’s a question I have. The opposite of managing scarcity is not managing scarcity. But when do you reach the point of diminishing returns? What does somebody say, “OK so we solved a problem with 10,000 pores and now that I look at the utilization, were only utilizing 20% of them.”
Stefan: That would be Facebook. Well that’s what they did. They build a $100 billion company with PHP and they actually what was it? A year before or two years before their IPO, tens of millions of dollars in revenue, wrote a php to C++ compiler to improve the utilization of the hotbed. What I thought was interesting.
Neil: Well there you go, so the point is here’s my concept with Hadoop, buy as many cheap blades as you can find and solve the problem of that later and who cares what the utilization is because the whole thing was cheap compared to a proprietary relational database blah, blah, blah blah blah. When does somebody take a look at that? When did the bean counters look at that and say. Well we have 10,000 course why can’t we solve this problem with 5,000 course and when do we eventually get Hadoop back to where we are now where everything becomes optimized, optimized, optimized instead of the forget about how many you have or how much it cost.
Stefan: One thing that I’ve observed felt really interesting in that context is, what do you value, optimization or innovation? Right? So the Hoodie crowd, it doesn’t matter is really we are this high innovation pressure and you even see that with banks right? They say, you know what it doesn’t matter is when they like, “Holy moly they have a virtual currency and we never thought about that, how fast can we get involved there?” And then the dollars just flow as you have innovation pressure but in the moment you have, you know, like in retail you have very little margins then you come in and then you optimize where you spend. That’s kind of what I saw with different organization, the innovation pressure in some of those organization overcome the counter that you mentioned.
Neil: Well I think what you’re talking about is on an individual organization. I am just saying in general that there is this euphoria with Hadoop that you can put together clusters as big as you like because they are so cheap. But sooner or later somebody is going to say that doesn’t really makes sense. Could we have just maybe half as many? Right? So what we is Hadoop, the original Hadoop is going to disappear over time.
Stefan: It’s a brand really, it’s not a technology.
Neil: In the enterprise sure. So the interesting question is in science and in government, in NGOs and non-profits and so forth that had analytical problems to solve, how significant is that? Because if you go to a non-profit organization has maybe $50 million in, what do they call it? They don’t call it revenue, they call it contributions or something like that. They try to put together a database and they go to Oracle and find out it’s going to cost them a million dollars a year of Oracle licenses and they just flip out. It’s the same thing that happen to Hadoop something that was once very inexpensive to put together, as it becomes more enterprised would become, how can these companies survive on selling software 10%-20% enterprise software?
Stefan: Because we see money. Billions dollars we see money, no problem to us, that’s right? Twice as much custom-acquisition cost as your has been, with a billion dollar in cash, why not? It will be very interesting though to see how that checks out. I talked to a lady from Wall Street Journal early on and she really, “I really don’t understand how this Open Source companies want to make business in the future.” That was interesting. Great. So what do you think will happen to the next year when you come to the next summit? Prediction one yard, three yard.
Neil: I’m fairly sure that Philly is going to trade away quickly. That’s about as much as I know. Get rid of that huge payroll. I don’t know, I mean I would have never predicted what I am seeing here today a year ago. I guess I’ve never been real good on prediction like that. I guess the trend I see is that you’re right, that Hadoop has become a brand and open source products has become a huge brand it’s crazy, isn’t it? And it’s going to become more and more enterprised because companies probably find it’s easier to sell software to enterprise IT than it is somewhere else, right?
All of these small companies would be here next year? I don’t know. I was talking to a venture capitalist today and he said his company hasn’t made a single investment in any Hadoop-related company yet because the founder of the company happens to be extremely famous person, venture capitalist, since he doesn’t believe in Hadoop. I just wrote an article called, “I’m finally starting to get used to Hadoop, sort of.” But in my age, I have to right to take my time to get used to things.
I just found it funny that last year everybody in the Hadoop market is talking about No SQL and about what do you call it, a scheme on reed and so forth and they came and they displayed their new product and they were showing me their lightning-fast queries on Hive based on a starch team. I said, “wait a minute”.
Stefan: You saw that before, like 20+ years ago.
Neil: Yeah you guys have done a 180 on this right? So I don’t know. I don’t know who drives this market. Maybe you have a better idea than I do but I think the legacy analysts have a lot to do with driving this market like a Gartner, Forster, OBMIDC, and I think they were tentative about Hadoop for a while. They love to talk about Big Data but they were tentative about endorsing Hadoop which was an Open Source product. But they’re all Hadooped up now. So that makes-
Stefan: Hadooped up, I will quote you on that.
Neil: I spent many, many years running my own systems integration business. Many times we would get the trust of our clients to do things for then and we would come in and some IT person said yeah but I just had a call with Gartner and they say that’s not true. So they have so much influence. I guess they have something to, but when I saw a couple of things Gartner has said recently, it’s all been pretty high-level. They haven’t got down to detail and you and I have talked about this before, the thing that I find maddening is nobody talks about solutions, nobody talks about decisions, nobody talks about doing things that help people make up their mind about what to do and to be better informed. They love to talk about how much data there is right?
Neil: But there’s a problem with too much data. Too much data can obscure the truth. Too much data can make you overconfident. Too much data make you over-calibrated. So you have to be able to work, I said this to somebody just the other day, I said, “You know, good models trumped with algorithms. All the algorithms int eh world are no good if you don’t have a good model to build around them. Nobody talks about modeling. I’ve had this conversation with the big Hadoop vendors and I said “I haven’t heard you speak the word model yet. So you tell me what is a day in the life of somebody who works with this software like. If they can’t build models, and they know how to build models and they know how to test them and continuously improve them, you’ve got nothing. That’s what I like about your product. You came at me with, here we are at the other end, we’re not going to talk about the layer cake.
Stefan: Optimization program. We do have a layer cake though.
Neil: I know, I know. I know you do but I don’t care about it, right? Because you know you’re saying here’s Data, I’m getting to manipulate this data right here and the way I look at it. I think that’s fabulous. I think that’s great.
Stefan: Well thanks for joining and then maybe next time we can have a beer together.
Stefan: Enjoy the show.
Neil: Later today.