About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Big Data & Brews: Anil Chakravarthy & How Consumer Tech Will Influence Enterprise Tech

By on September 29, 2015

If the sky was the limit and we had unlimited storage and compute, what would the future of the data world look like? In part 4 of my interview with Informatica, acting CEO, Anil Chakravarthy, says we’re already seeing a preview of it in the consumer world. What does he mean? Watch below to find out more:


Stefan: Let me switch gears here a little bit. Where do you see the future really in the data world? If sky’s the limit, and we have unlimited storage on compute and, you know, Ray Kurzweil is right and we have chips are faster than our brains in something like five years. Where is this going?

Anil: Yes, to me actually I think we already see a preview of the future. I’m talking about enterprise data right now. I think we see a preview of that feature already in the consumer world. I mean think of the Apple App Store for example – what are there, over a million apps right now at this point? But the apps are already separated from the data. Your data that the apps operate on is kind of under your control; you may have a separate repository that you use for it, either your own or iCloud, etc, and the apps are extremely modular and the apps come and go very quickly, the data lives a lot longer.

If you contrast that with the enterprise world, the enterprise world has been one where the data has been very closely tied to the apps. You know you have ERP apps or CRM apps or other kinds of apps, or custom apps where the data models have been very closely tied. You still have some separation, that’s why you can reuse the data, but the data and the apps have been very closely tied together. To me, that world is going to go the same way as the consumer world already has gone. So if you ask me what’s the future, it’s like, the data models, the understanding of what different data types are, whether it’s schema-on-read or pre-defined schema and things like that, the data will be designed for durability and will be designed essentially to be used by a variety of apps, maybe cloud-based apps, maybe on-premise apps, etc, etc. The apps will become a lot more modular and the apps will come and go, and maybe apps may be segmented by user-base or by business need, way more so than they have been in the past. So I think to me, that’s the future of the world of data, it’s very clearly defined around data as the asset, data being separate from the apps and, consumable by the apps of course, otherwise it’s useless, but a clear separation between the two.

Stefan: But there’s a data repository and we spin up apps as we need.

Anil: We spin up apps as we need and there’s clear interfaces between them. We already see this happening if you look at all the API world etc. There’s enough there to know, but I think it’ll just go to the next level where clearly there’s a data architecture and it’s not a subset of the application architecture, it is a separate, separate but equal, if you will, architecture.

Stefan: And will all applications just be kind of a stitched together fabric of Web APIs?

Anil: I believe so. I mean the APIs themselves may evolve, I’m assuming they will evolve. They will also evolve in conjunction with what’s going in the application infrastructure world. You know, I mean you see around containerization and things like that, so that will also influence the API layers. So the APIs, I’m pretty sure will evolve, but it is essentially the idea you described. You have data, you have the availability of data and then you have the application and a lot of the concepts that you already have in the regular world, which is bring the application to the data etc, will obviously be very relevant. But there will clearly be an ability to try out apps, which are very quick. You know, you can build an app in a week, try it out, if it’s not the right app you you can work on it. All the kinds of things that you described with what’s going on. Today the data is the barrier. I mean if you want to try out an app in a week today, you wouldn’t just go through provisioning the data, exactly.

Stefan: What’s some of the interesting open-source technologies that you are watching?

Anil: Well, you know Spark for sure, we are doing a lot of work with Spark. In fact, I think at Informatica World, we had Professor Franklin from AMPLab, he was the keynote speaker for us.

Stefan: Is it ready for prime time?

Anil: You know, that’s a very good question. I mean we are releasing the first version of our product later this year with Spark in it. Obviously we have a user base that is conservative, so if they start using it, that answers your question. So we should meet again and drink a Guinness, you know, in a few months. But I think it is definitely, I believe, getting there. There is a lot of attention, and there is a clear need for something like Spark so I think architecturally it seems to be very sound, there is a lot of customer interest, obviously not only us, a lot of companies are building that technology base, so I think it shows a lot of promise. So to us that’s one clear area, which is a lot of interest. 

Stefan: Is Informatica basically then putting a security on top of Spark. 

Anil: Right, I mean the holistic way to think about it is, you know, Informatica has basically had it’s own processing engine, right. That engine is really our pride and joy because we have built it so that it can run on Teradata, it can run on Hadoop,etc, etc. We have built this so that all the work that’s been done using Informatica last 25 years can be re-purposed. That’s the most important thing. So what we’re doing now is saying, “Can we re-implement that in Spark?” Right, I mean, if we can take any work that somebody did, using, extracting data from whatever system, processing it and putting it to a target, if we can re-do it so that you can re-use it in Spark and then Spark itself basically doesn’t depend, it co-exists very nicely with Hadoop, but it can work with HDFS, it can work withother technology. It’s a nice layer of abstraction that … so for us the intellectual property would be: we know how customers have used data in the past and that’s embedded into the mappings and the work they have done in the past. If we can bring all of that into Spark, that’s a major coup.

On a new technology, they get the benefit of the new technology, they get the benefit of all this massive level processing, you know all of that. Most of all, like you said, but without throwing away any of the work they have done in the past. That’s our goal. 

Stefan: What’s your feeling that Spark is leaning more and more on Kafka. 

Anil: Yeah, you know I think different layers I believe in the sense that Spark is much more of a, if you will a processing layer, it’s essentially a compute engine. Whereas Kafka is much more of a data ingestion layer, a data streaming layer. I believe that they will co-exist. I mean, you know we were of course … in terms ofstreaming there are a lot of different protocols. I think there’s a lot more to be seen to see which one kind of wins out but it looks like for the data streaming use cases with high volume data, Kafka definitely seems to be getting some traction there. We have some proprietary technology too, so we’ve got a dog in that fight. But we are very agnostic, the world moves quickly and for us it’s more important to make sure that we do what the customer wants rather than stick to some specific protocol or technology. 

Stefan: That’s a great perspective. The reason I was asking is that some of the reliability technology that Spark is rolling in is heavily relying on replaying, you know, Kafka and those kind of things. So, what of course, kind of tints the directions a little bit, you know, informs the direction, which Storm etc, etc is leading on top of Spark then and maybe prefers one doc over the other. Beyond Spark, what els is exciting for you? 

Anil:: We look at a lot of the container technologies. So that’s very interesting. Docker is very interesting and new because I think it’s especially because before, Symantec not only did security but also was involved in some of the storage software, server software that came from Veritas which was the HA technologies and the grid technologies. Docker has the potential to change a lot of that, right, because instead of a traditional grid that you can think of, Docker could make i tmuch more easy to roll out. For example, think of a cloud application architecture where five years ago multi-tenancy was the big deal, right, you hadto have multi-tenancy to be scalable. Now you can think of a world where maybe Docker says you don’t need multi-tenancy because …

Stefan: We solve that for you.

Anil: Exactly! To me, this world of what we talked about, this application layer and data layer, I think Docker plays a critical role. Of container, Docker seems to be the most promising, but containerization plays a critical role in enabling that interface, so that’s the other big one that we’re watching and working on.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.