About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Big Data & Brews: Informatica Talks Security

By on September 8, 2015

I’m extremely excited to return from our hiatus with a new interview with Informatica’s Acting CEO, Anil Chakravarthy. He has over 15 years of experience in security and given the importance of big data governance, I thought he was the perfect candidate to share what he sees coming down the pipeline.

Tune in below to see the first installment.



Anil Chakravarthy, Acting CEO, Informatica

Stefan: Welcome to Big Data and Brews. It’s been a long time. I’m very excited to start off a new season of Big Data and Brews with Anil Chakravarthy from Informatica. Thanks for joining.

Anil: My pleasure.

Stefan: Usually we ask to please introduce yourself and the brew you brought, but it’s so early in the morning, we decided we’d go for coffee and refreshing water. Tell me a little bit about your background. You have a very interesting background, very security-focused. How did that shape how you got to Informatica and what you’re doing there?

Anil: Yes, as you said, I’ve had a deep background in security for the last 15 years. I was at Symantec, where I ran the enterprise security business. I was at Symantec for nearly 10 years. Before that at VeriSign, where I was responsible for product management of the VeriSign security services. Coming to Informatica, to me, was really a great way to bring that security expertise to the data layer.

As you know, a lot of the security world is still very much at the network layer. It’s creeping up into the application layer, but if you really look at where security can be most affective, it’s really at the data layer. There you know what you are trying to protect, what is sensitive, what is valuable. We at Informatica are taking a new approach, based on my background, but based also on what we see from the industry. We are taking a new data-centric approach to security.

Stefan: I think there are two topics I want to talk to you about today. One is really securing data and one is using data to secure, if that makes sense?

Anil: Yeah, yeah, it does.

Stefan: Why don’t we start with the first one? What’s your perspective about what’s going on in … Maybe we expand it from security to overall data governance. What is really the requirement of the market? Where are the products today? Where do they have to come, where are the shortcomings?

Anil: Yeah, let’s start with the state-of-the-art in security today. There’s no shortage of spending on security. If you go talk to virtually any customer and you ask how much is your security budget, how much has it grown, it has grown exponentially for virtually every customer. At the same time, you hear of breaches. Every week there is a major breach. There’s minor breaches every day that are getting disclosed, so something doesn’t add up.

If there’s so much spending going on in security and yet you hear of these major breaches like this OPM, the Office of Personnel Management breach that just happened, what is really not adding up? Our view on that is what’s really not adding up (this goes back to your securing data topic) is, because the current technologies are indirect ways of protecting your data, their effectiveness is limited. You have to say this segment of the network contains valuable data so I’m going to protect that network, but then the data doesn’t have any perimeters anymore.

Big data and the cloud are pushing those boundaries even faster, because you have valuable data everywhere now. It’s sitting in Salesforce.com, it’s sitting on Amazon, it’s sitting in any other data repository that you are building yourself. The perimeter has gone away and therefore the technologies that were all about “Hey I want to create a perimeter” are not as effective anymore. We believe that that approach has to change and the securing the data has to start with, basically with an approach to understanding what is sensitive and what needs to be protected.

Stefan: More zooming in and protecting the data itself than just globally putting walls up around…

Anil: You cannot put walls up anymore. It’s not that you would stop putting up walls. That would not be the right way to go, but you have to realize it takes more than that.

Stefan: You touched on cloud. What’s your perspective? Is data more secure in the cloud than on premise?

Anil: That’s actually very difficult to answer because it’s a very tough description, in the sense that in the cloud, if you look at the infrastructure in the cloud, it is probably better protected than in infrastructure in a typical data center. All of these are more professionally run, better organized, automated etc. therefore the infrastructure is better protected.

Like we were just discussing, the data on top of the infrastructure is as open as it was in your data center. Whoever is providing the infrastructure layer does not have visibility to your data, so you could be on Amazon and if you have poor data security practices, the infrastructure security at Amazon is not going to protect you. That is really where I think in the cloud, there is one dimension that is better protected but the data dimension is as open as or as protected or unprotected as it was in your own data center.

Stefan: Is there especially a weak link between moving data back and forth between premise and cloud? I think you guys have interesting products around that. How is your approach to really closing the loopholes there?

Anil: Right, I think definitely more data movement there is, the more data proliferation there is, the greater the exposure in terms of who can access the data and where the data could go. Our approach right now is building on understanding metadata. Essentially from our heritage in data integration, we’ve always had very good access to metadata. Basically for us, metadata essentially means if it’s a structured data, it tells us what is the table structure, what are the fields, what type of data is in the fields, etc. That is a great way of understanding the security profile of the data.

For example, there have been lots of databases built over the years where the Social Security number was used as the customer ID in lots of databases. It was very common until about 10-15 years ago and people became more conscious that “Oh my goodness, that’s a problem.” But people have applications running which are 30 years old, that all have that right now. Now the field might be called as customer ID, but in reality it’s a Social Security Number. You would never know that unless you profiled that data and you realized “Wow, they’re all 9 digits and they’re all within the right range.” Then once you get deep down, you realize that’s a Social Security number therefore it needs to be protected. That’s the approach we are taking is to say, let’s profile the data, draw the metadata, and for customers who’ve been using our tools, the metadata already exist there. This can be done for any database, virtually every database, any structured data repository.

Of course when it comes to semi-structured and unstructured data, it is really being able to understand the schema if you will, understand the metadata and of course as you know, there’s a lot of activity in the market right now on how to be able to do that using machine learning, using a lot of new techniques. We believe that that technique can be extended to a lot of other data types.


Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.