About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Special Big Data & Brews from Hadoop Summit: Ovum, Tony Baer on SQL on Hadoop

By on June 5, 2014

Another Hadoop Summit is coming to a close and for the second year in a row, one of the biggest trends being discussed is SQL on Hadoop. At last year’s show, I sat on a panel moderated by Ovum Analyst Tony Baer about this very topic. This year, I got to turn the tables and talk to Tony about his views, one year later, in a special Hadoop Summit Big Data & Brews episode:


Stefan:           Welcome to the special edition of Big Data and Brews with Tony at Hadoop Summit. Hey, Tony.

Tony Baer:     Hello, Stefan. Good to see you again.

Stefan:           We’re drinking today. Do you like Stella?

Tony Baer:     Stella.

Stefan:           It’s nice and cold though.

Tony Baer:     Oh yeah.

Stefan:           Do you want a sip?

Tony Baer:     I would love it.

Stefan:           Alright.

Tony Baer:     I thought you’d never ask. Thank you, sir. Is this comedians in cars driving around for coffee? The Seinfeld thing?

Stefan:           Yeah.

Tony Baer:     Okay.

Stefan:           Maybe. Jeez.

Tony Baer:     Okay.

Stefan:           You know, as CO you can almost design your job.

Tony Baer:     Yeah.

Stefan:           I thought making a video show of drinking beer with really smart people was kind of the best thing I can do with my job.

Tony Baer:     Or with drinking beer.

Stefan:           Yeah. What other big new things from your perspective at Hadoop Summit? Anything you’re really excited about?

Tony Baer:     It’s kind of funny because in my presentation earlier today, I was kind of going through and was kind of like, “No way there’s nothing new in this industry,” but everything is new. Last year, of course, you graciously sat on my panel as being basically the dissenting voice as vendors were in a rush last year to plant their stakes, saying we do interactive sequel at Hadoop.

Stefan:           Yeah.

Tony Baer:     As I recall, there were a couple guys sitting next to you who basically spent part of the session saying my interactive SQL is fair fossilized.

Stefan:           You and your – We had that a few years ago, right?

Tony Baer:     Yeah.

Stefan:           I wrote more code for Hadoop than you.

Tony Baer:     Yeah, exactly, exactly.

Stefan:           I’m more skillable than you, I’m less fault tolerant than you, you know? I’m more interactive than you.

Tony Baer:     As I get older, I have to be tolerant of more faults. It’s just one of the lessons I’ve learned, and not just in this industry. Take away some of this event, I mean obviously beyond the expanded enterprise focus, was it obviously – Okay, last year, vendors planted their stakes in interactive sequel and this year, they’re basically painting it as the battle ground. There have been a lot of questions about basically a SQL now, or I should say it’s Hadoop now, just a big SQL industry, not trying to steal our trademarks there. I actually just came out with a post earlier this morning.

Stefan:           Uh-huh. (Affirmative)

Tony Baer:     It is my fervent hope, of course I don’t want to steer any IT people down the path to addiction, but I do hope that in one way, I do, which is that I’m hoping that SQL will be the gateway drug to Hadoop.

Stefan:           That’s an interesting idea.

Tony Baer:     Yeah?

Stefan:           I definitely can see that you want to lower the buyer to entry and people just always go the path of the lowest resistance, right? If I know a skill, why don’t I run a skill in Hadoop?

Tony Baer:     Here’s the deal, it’s like where do you start?

Stefan:           Right.

Tony Baer:     I think there is a point to be made that if we start at some place, what’s to keep us from continually doing what’s in our comfort zone?

Stefan:           Right.

Tony Baer:     No question that there’s certainly concern about that. On one hand, you have to look at what the talent base, the skills base is out there. Yes.

Stefan:           Yep.

Tony Baer:     You have lots of computer science programs out there that are turning out data scientists, or quote-on-quote, “data scientists”, whatever that is. I’m still trying to meet one. I’ve met a bunch that have it on their card.

Stefan:           Yeah?

Tony Baer:     So they must be data scientists. Of course, its given new life, new relevance to job developers now who can graduate from the middle tier to actually doing something.

Stefan:           Yeah.

Tony Baer:     “Oh, boy, I’m going to get a lot of people. There’s probably going to be a contract out on my head after this one”. Okay, no, job development is actually really important. I like job developers.

Stefan:           Your job of development is your undergraduate and then you do statistics on top of that.

Tony Baer:     Yeah, well here’s the thing. What I’m getting at is, sure there are those folks that are coming out of computer science programs and that’s good and we want to encourage that, we want organizations to enrich themselves and get some new blood.

Stefan:           Yeah.

Tony Baer:     On the other hand, let me ask you, do you think it’s realistic to go to an IT organization and say, “Okay, we now have a new platform, new means for analytics, so should we basically not utilize the skills that your staff already has, that already exists?”

Stefan:           Yeah. Certainly that’s the challenge. You go into a big organization and you have a thousand people that know a skill.

Tony Baer:     Right, right.

Stefan:           I talked to Neil yesterday and he said, “Well, there’s the hoodies, the people that come in the Ruby on Rails before”.

Tony Baer:     Right. By the way, Neil’s going to have to answer his comments tonight in San Diego. I will see him in person.

Stefan:           He said, “Yes, the hoodies would come in and they do Ruby on Rails, they used active records, they never wrote an SQL query in their life,” and those guys are now doing Hadoop. Then you have the people that spent the last ten years doing data cube and this, and that. There’s a shift here, right? What I think is interesting is that the people who know all the SQL do development now and in five years they are architects and maybe in another five years, they are CIOs.

Tony Baer:     That’s being pretty optimistic. That’s a pretty fast career projectory.

Stefan:           Yeah, but you get the idea. I think we’ll move in that direction. That’s that though. We definitely see that the lowest buyer to entry is to write some SQL scripts on Hadoop, right? The challenge with that though is that people very fast run into a wall. They think, “Oh, Hadoop can SQL, let me use the iTool that uses SQL on top of Hadoop,” and it generates an SQL script big, like this, with a million subcategories and combined keys, and what have you because you’ve gone and clicked something together and you know, Hive just…

Tony Baer:     Right, right.

Stefan:           It does nothing else than just, “Oops,” and that’s it.

Tony Baer:     Yeah.

Stefan:           We very frequently see the, “Oh, it can SQL,” “Oh, actually it can’t,” and the challenge is, at what point do you figure that out? The moment you walk into production, or early enough to you know.

Tony Baer:     You know what we should be doing right now? We should both be sitting. There’s a session going on at the moment about the difficulties of writing SQL inside Hadoop.

Stefan:           Oh. The interesting thing I think, overtime, people will go for is applications on top because nobody would write their own CRM system, everybody would buy sales for those.

Tony Baer:     Right.

Stefan:           Those applications generate terrible SQL code that no SQL engine can handle at this point.

Tony Baer:     Right.

Stefan:           I do know it was intended to –

Tony Baer:     Right, right, but let me try and seize on that. I was at, on Monday, Hortonworks actually had a very good session. They basically had a bunch of panelists together, including those – actually, I don’t think Neil was in the room that day. We didn’t have the hoodie guy there, I’m surprised. Everybody but him. At the end of the day, they had the panel of some of their customers. Of course being referenced customers, these guys are going to be basically pretty heard, by definition.

Stefan:           Yeah.

Tony Baer:     I asked them –

Stefan:           They might go to discount to be on stage.

Tony Baer:     I’m not even going there. The deal is, I even asked them a question, this is what feeds into what you’re saying, which is, what type of skills did you need? I sort of asked, did you really data science, and what they entered was, “Well, we’re looking for our teams, our Hadoop teams. We have looked for Java programmers, Python programmers, statistical programmers” and I turned, actually, there was a guy next to me who was from Hortoworks from the marketing side. I turned to him, and I was just like, “Wrong”. Right for those organizations, obviously, very bleeding-edge work, but if you’re trying to build a commercial market … you cannot build a market that is reliant on customers having their own staffs doing program at one-off solutions. That’s just not going to scale. That goes to what you’re saying. This has to be a market that is populated very heavily with applications and tools.

Stefan:           Yeah. I think we get there, right? Datameer is maybe just the first company, there’s other companies coming.

Tony Baer:     I hope there’ll be more.

Stefan:           I’m a little surprised that we’re kind of the only one.

Tony Baer:     I’ll put this to be honest, you’re one of the very few at this point. I’m actually quite surprised. I’ve got to expect this time next year, there are going to be more. What I’m seeing, I think a lot of the emphasis at this point is a lot of buzz around data preparation which is also a very important part of it.

Stefan:           It’s just a piece of it.

Tony Baer:     I agree, but it’s a part. There’s no one thing that’s going to be the grand way to find solutions. Even at the BI data warehousing world, even though there were always attempts by certain vendors saying, “We will be the all-encompassing solution”. The reality was really kind of a best-of-breed stack. To me, what was the signifying moment in that market, I would say that history is definitely repeating itself here is that I’m going to take us back, I’ve used this story a number of times, shoot me if you’ve heard it before. One of the last big DB Expos in San Francisco, Mosconi, this was back in 1996. I remember it because I had a really nice vacation before that with my wife in Sonoma. We had a wonderful time, by the way, the dress that she bought still fits her. It’s a compliment to my wife.

Stefan:           Wonderful.

Tony Baer:     Or the fact that I don’t feed her enough. No, she’s kept her figure very well, I will say. Anyway, I got a call from this pure firm, and they said, “We have this company, they’re too small to exhibit at DB Expo, but would you mind coming up to the hotel suite the day before? We’d love to show this to you”.

Stefan:           Yeah?

Tony Baer:     It was Informatica.

Stefan:           Oh.

Tony Baer:     They have to start someplace.

Stefan:           Yeah.

Tony Baer:     The deal is that at that time, what was data warehousing? Number one, data warehousing, we had initial databases on their way to becoming DB facto enterprise standard. They were on their way.

Stefan:           Yeah.

Tony Baer:     He had these rich client desktops, so you had this appetite for more than green screen reports. In fact, with relational databases, creating the illusion of “open data”, there was a perception that there’s kind of a big data of its time. We have access to all this data that’s now been liberated from jail houses.

Stefan:           Yeah.

Tony Baer:     What is a data warehouse?

Stefan:           Right.

Tony Baer:     What kind of data do you put in there? What do you have to do to it? Oh, you need to write – Oh, he did transcribe, well, I guess you have to write a script, it’s not very repeatable and nobody ever heard of data quality. Saying something like Informatica was in a tool, a visual tool, that looked like a data modeling tool but existing database professionals could understand. You follow that with also other tools that simplify the reporting instead of having to write some from GL language. That was basically what led to the path for DBI to come around and become an enterprise market. That’s basically the same track I’m starting to see here.

Stefan:           I think so, too. I think that history will maybe not repeat itself, but certainly rhymes – to infrastructure. Hadoop does all kind of the old databases and competing and it very quickly commoditized, more people have it. I think in the beginning people wrote stop procedures and SQLC and what have you and more, more standard it became. NZS Curel, 92 to the blah, blah, blah, blah. Then, bring it together. I think we will see this here too. I think, for me, as a vendor, what is really interesting is that the value creation was less in infrastructure, the value creation was in the enablement of the next generation of applications.

Tony Baer:     Exactly.

Stefan:           That’s kind of what is really fascinating for us to see because we’re very close to, “Oh, people actually doing something,” and we want to focus on that.

Tony Baer:     Yeah.

Stefan:           We are out of other folks, of course, “Oh, we’re running three seconds faster,” nobody cares. You want to get an insight, you want to do something that will change the world, not just a more secure, more scalable as you go on.

Tony Baer:     You want to contribute some value.

Stefan:           Yeah, that would be nice.

Tony Baer:     One of my colleagues on the panel says there’s a lot of IT professionals that are doing this because they want to put this on their resume. There is a lot that I actually saw some of them that our enterprise finds. I was doing a trip with some work and they’re saying, “You should pack, too, you know. We have a lot of clients in Pacific Rim,” and I actually was in the column for QL and it’s a great chaotic city. I have some very fun memories of the place. Anyway, I was at this insurance company. Insurance companies are not exactly known for state-of-the-art technology.

Stefan:           No? I think that’s all customers said. The love of a main frame connector, that makes me scratch. That’s where I’m like, “Oh, really? And do you run my insurance? Okay”.

Tony Baer:     Here’s the deal, and unfortunately it’s going to be another Informatica story. Anyway, one of these clients that we were visiting, they’re big concern was why was Informatica bolting support on AS 400, version two. What can we do? Meanwhile, the younger guys were saying like, “How can I get some experience on Hadoop?” There is that hunger out there. I told this guy, “You’ve got to have some meet-ups out there. You’ve got to have some data to play with, you know? I would love to see you basically play and maybe you have a community edition, like play with tools like yours,” like, “Hey, let’s work with this stuff, we could actually have fun and do some specific good, actually, working with a non-profit project with all this open data”.

Stefan:           Yeah.

Tony Baer:     There’s a lot of cool ways to do this stuff and eventually, this is going to percolate its way into the enterprise.

Stefan:           Yeah. We have open trial, if you can play with your Facebook data. In fact, we’re printing our Facebook data over there.

Tony Baer:     I don’t want to play with my Facebook data, I’m sorry.

Stefan:           I think it will be really interesting to see when we bring open and internal data together.

Tony Baer:     Right, right.

Stefan:           Thank you very much for stopping by.

Tony Baer:     Thank you so much.

Stefan:           It was good to have a drink with you.

Tony Baer:     It was a pleasure. Do it again.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.