About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Big Data & Brews: Anil Chakravarthy Diagrams the Big Data Ecosystem

By on October 19, 2015

Our last installment of Big Data & Brews with Anil touches on a cool topic. Of course, I like that we get to use the chalkboard but we also had a chance to break down how Informatica sees the ecosystem (hint, the data intelligence layer is the most promising). We also talked about what he sees happening in the next 10 years that will really accelerate change in the industry.

The full conversation is just a click away – tune in!

TRANSCRIPT:

Stefan:           What would be interesting to see is in this ecosystem really of data technologies, right, where are you guys are sitting and then where you see Hadoops, Teradatas, Microstrategies, Datmeers. I kind of see you as the fabric that brings it all together. Is there a central brain of that fabric?

Anil:                Right. You know, we believe so. Let me just take a stab at how we think of the word. This is obviously a logical view and it has to be translated based on … We see the world as start with this is — think of this as data persistence. This world is obviously is changing very rapidly. It was basically the databases of the world. Could be anything from mainframe database to relational database, etc. Now Hadoop and NoSQL and this world could be either on the framework or in the cloud or a combination.

Then we see the world or what we think of as data infrastructure. So this is the world, which we have traditionally played in and this world is also changing rapidly because it obviously, when this changes, this has to change here. You have things like data ingestion, which is changing very rapidly. Somebody once joked to me that that whatever IBM worked on in the 1970s always will be useful at some point so it’s like that. Things, concepts like changes and capture. The concepts like real time, streaming, etc. so all of those are coming back, right?

You have ingestion. You have data integration. Obviously that’s where you put it together, the aggregation etc. I think you have a lot of work around data quality, which is increasingly, “How do you do quality, especially on unstructured data” and things like that. That becomes a lot of work to be done with that. I think you have a lot of data security, as you said, and particularly in governance. Maybe we put that together and what we like to think of as the 360 degree view, which is the mastering of the data, which has to happen at this layer and especially at the variety and volume that you’re getting at this layer for it to be useful.

Then, I’ll fill out this last, this 3rd layer here in a second. This is the layer that you just talked about, which is the apps. When I think of apps, I would even add user because the world is blurry between apps and user so this could be operational apps. A lot of analytical apps. A lot of, I would say, hybrid apps between the two, which is machine assisted, decision making etc.

If you think of that world, this world is evolving rapidly and that’s where I think over time, like we said, This is where it’s going to go, right? Now, this layer is the layer we think is very promising and if you, I mean, I don’t know where you guys see yourself, but we call this data intelligence. If you think of this as data infrastructure, this is data intelligence. This was not really possible before. Now if you think of how much we’ve seen happen through things like machine learning, how much we have seen happen through things like recommendation engines. Essentially everything Amazon can do right now with the data it has, if you can think of inference engines, recommendation engines that are essentially taking metadata from this layer, processing it and making it available to the apps or to the users through API access.

Just if you will, our simplistic world view is this is where it’s kind of going. Does it make sense?

Stefan:           Absolutely. Yeah, I couldn’t agree more with you. What’s very interesting for us … We do have connectors, right? Maybe not the quantity that Informatica has but as we started the company we saw a huge problem. How are people even getting data into Hadoop.

Anil:                Right.

Stefan:           We made our connectors bi-directional so we can pull in data, we can push data, and more and more people are really taking advantage of getting data out of our analytical engine, right, if you will? That’s really fascinating and I can absolutely see here where we see more and more use cases where we have the big telecommunication company to significantly cut down on their truckloads, right? They pull data in, they do the analytics, and historically they would look at a bar chart and say, “Hm. They have to change something here.” Where now the results show, “Okay. We create maybe a risk score or something and push that back into the system?”

Anil:                It influences … Got it.

Stefan:           Yes. Exactly. We see more and more of those use cases where analytics isn’t done anymore to inform the user. It’s also done to inform the application.

Anil:                That’s right.

Stefan:           To do the right decisions.

Anil:                Exactly.

Stefan:           As you pointed out earlier, yeah.

Anil:                Exactly. You can, in fact, influence the data gathering at this …

Stefan:           What will happen in the next 10 years that will really accelerate all the changes? What’s putting the pressure on the pipe?

Anil:                Yeah. I think two things. I think one, from a business perspective and one from a technical perspective. The business perspective is clearly that the idea of data-driven business processes is just getting embraced very quickly. I was actually talking to an insurance company earlier this week. These are non-technical people. These are people who have lived in the world of actual data in the past and they realize that this can actually change their world very quickly and many of them also realize that if they don’t do it, somebody will. It’s like they will get … they’ll be an Uber or an Airbnb or somebody like that in their space who puts it together and makes them part of the old world.

I think that’s the business interest and the business pressure. I think that’s a very good thing because that will drive a lot of investment. I think from a technology perspective, I think you already hit the availability of not only the storage capacity, cheap processing power and all of that, but the availability in a very easy to consume manner through all these concepts, for example whether it’s Amazon or Azure or Google. They are pushing the boundary of how easy it is to consume and utilize for most customers. The two together are, I think, is providing a lot of impetus for these kinds of architectures to become reality.

Stefan:           Do you think that data is the competitive edge for the future?

Anil:                I believe so. I think for many companies, it already is, right? It won’t be the only competitive edge. I think if you think of an online manufacturing company, if their quality sucks, then ultimately they will ultimately go out of business but those are disciplines that are becoming table stakes. The ones that couldn’t get their quality problems fixed are already out of business so the ones that are left already have pretty good quality in their manufacturing. What next? How do they turn this into, essentially, in economics as things get better they lead to commodity. The only way to avoid commoditization is to having an edge and that’s where data becomes a critical role so I think with what we are seeing in many, many industries, it will definitely be a competitive differentiator.

Stefan:           Thank you very much for joining for big data and brews.

Anil:                My pleasure, Stefan. Thank you for having me. Thank you.

Stefan:           It was fun. Thank you.


Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook


Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.

Subscribe