This week and next we’re coming to you with several special “Live from Hadoop Summit” Big Data & Brews snapshots and episodes. First up, meet Anoop Dawar, product manager from MapR. We talked about their new app gallery, the role of Big Data in the Internet of Things, and about what’s going on with Apache Drill.
Stefan: Welcome to Big Data and Brews, special edition today from Hadoop Summit in San Jose. Could you introduce yourself.
Anoop: Yeah, thanks, Stefan. I am happy to be here. My name is Anoop Dawar, and I work at MAPR.
Stefan: Anoop, how long you’re at MAPR?
Anoop: I have been at MAPR for almost a year now.
Stefan: What is your focus at MAPR?
Anoop: Focus, MAPR has three editions of its Hadoop distributions, that is M3 which is a free version. There is M5 which is our enterprise date version, and M7 which is enterprise date plus enterprise date new SQL database. I look after M3 and M5.
Stefan: M7 is more kind of an HBase on top.
Anoop: M5 also HBase but with M7 what we do is we allow you data in the HBase applications but without the limitations of HBase. So we get rid of the issues with compactions, I/O storms, I think so.
Stefan: Cool. What’s new?
Anoop: What’s new? There is a lot of new stuff. It’s exciting to be there here at Hadoop Summit. We just announced the industry’s the first app gallery which has nearly 30 solutions.
An interesting thing is if you look at most partner websites nowadays, like you say, yes, MAPR and Datameer, yes, we are partners, we work together and that’s it. But typically a developer or a customer looks at it and says, “Okay, but I have more questions. I need to know, have you get done interoperability testing, what version of Datameer works with what version of MAPR? Is there a sandbox I can play with? What’s the documentation for installation steps?”
We have taken this and double-clicked it a level down and said, “okay, here is a solution, here is how everything works together, across these 30 soutions.”
Stefan: What did you do before MAPR?
Anoop: Yes, that’s a good question. I’ve been in systems software for a long time, and I’ve built switches, networking switches, and then product managed networking switches, then built wireless access points, branch routers. My last job was drilling wireless XPoint branch out and switches with a cloud backend. That’s where we started looking at how much data’s coming in, and I became a Big Data user. Then Tomar found and now I am a Big Data provider.
Stefan: What’s your point of view about the whole Internet of Things? Is that a big topic for you guys?
Anoop: Yeah, yeah. Actually it’s a topic close to my heart because coming from wireless access points, means we were seeing the explosion of internet of things devices already. Clearly the interesting thing with internet of things is the amount of data that’s coming in really quickly and the nature of being able to discover what that data is, what’s changed, and being able to react really quickly. It brings the ability … it’s sort of challenging Hadoop to be real time right now instead of waiting and creating other solutions.
Stefan: What’s the technical answer that you guys are providing for that?
Anoop: For us, we believe in providing a platform that sort of fresh, it provides you a reliable compute dependable storage and an open ecosystem. As part of that we already have a strong file system that allows you to be able to stream in data really natively and to the …
Stefan: You have a network attached storage API.
Anoop: Yeah, and an NFS API so you can stream data in at an extremely fast rate. You can tail that data, or you can push it to any of the multiple applications we have there. Lots of customers use Apache Storming, lots of customers are looking Spark Streaming and we provide all these capabilities so you can pick the tool for your need.
Stefan: I can just run Spark on top of MAPR? No problem?
Anoop: Yes, you can run the entire Spark stack on MAPR including the Shark SQL without any issues, and we provide very simple RPM package, so you can install it, and manage it and monitor it through management control systems and MAPR providers.
Stefan: Tell me about Drill. That’s a project you guys started. We asked Drill right now and what’s difference between the Spark, Storm, Drill?
Anoop: Apache Drill is a really exciting project from our perspective. It’s one of those that’s getting a lot of community support now, lots of other non-MAPR contributors and committers are all part of that project.
What’s really different about it, the question that always gets asked is, “Wow, yet another SQL on Hadoop project client.” The way we look at things is, this is not about creating a regular SQL on Hadoop project. This is about creating a project that allows you to discover Schemaless or semi-structured Schema directly through SQL to allow data exploration.
Imagine a JSON object or nested JSON object and imagine being able to just do a select SQL query on it. Imagine that while that query is running the Schema internally might change because there are multiple files you are going through. The files from last month, of the IoT data may not have a certain timestamp or a query but now you have introduced it, so it needs to be able to look at that and respond at live time and be able to deliver that information.
Stefan: So it’s more Schema on demands kind of thing. But as a scripting language rather than an Hive having tables.
Stefan: In static Schema you more have kind of a pick like language.
Anoop: No, there is no data definition we have to do upfront. It can connect to Drill, it can connect to Hive, it can use the Hive Metastore but it doesn’t have to. It will discover Schema and when it works with files that already have some Schema information it can utilize it. Like if it’s looking k or eyebrow it can utilize that information and help you with that.
Stefan: Thank you very much, Anoop. Nice to meet you …
Anoop: Yeah, pleasure to meet you too, sir.
Stefan: … and enjoy the show.