About Us Icon About Us Icon Business Analyst Icon Business Analyst Icon CEO Icon CEO Icon Datameer Icon Datameer Icon Envelope Icon Envelope Icon Facebook Icon Facebook Icon Google Plus Icon Google Plus Icon Instagram Icon Instagram Icon IT Professional Icon IT Professional Icon Learn Icon Learn Icon Linkedin Icon Linkedin Icon Product Icon Product Icon Partners Icon Partners Icon Search Icon Search Icon Social Networks Icon Social Networks Icon Share Icon Share Icon Support Icon Support Icon Testimonial Icon Testimonial Icon Twitter Icon Twitter Icon

Datameer Blog

Big Data & Brews: Part II on Data Security with Informatica

By on September 15, 2015

Informatica’s Anil Chakravarthy and I continue our conversation around data security, this time discussing how risk management is a perfect example of  a data-driven exercise. He elaborates that in the past it was either driven by human expertise or by process and increasingly, it’s becoming process-driven.

We also talk about the role Informatics plays and how cloud and data aggregation is their sweet spot.

Don’t miss it! Tune in below for part two of our Big Data & Brews with Informatica.


Stefan: But let’s talk a little bit about that “using data to secure” topic. Where do you see the opportunity in the market?

Anil:  You mentioned Splunk earlier. You see a lot of companies now which have really changed the ways essentially security happens. Or, I can even broaden the topic further to your earlier conversation about risk management. When you think of managing your risk, that is essentially a data-driven exercise right now. In the past, it was either human-expertise-driven or process-driven. I think increasingly we’ve seen that it is becoming data-driven. A great example is, think of just what is happening even at the network security level. In the past, it used to be that you had specific devices like routers and firewalls, etc. from which you collected logs and you prerecorded what you were looking for and you basically said, “This is what a security attack looks like.” And then, you look for patterns that match that prerecorded knowledge that you had.

Now that world is changing very quickly even at the network level. You basically now collect logs not only from all the network devices, applications, active directory interface, user access. You pretty much collect all of that information and then you use big data techniques to find the pattern rather than say, “Hey, I already know the pattern of attack and I’m just going to go look for that pattern.” I say, “I don’t know the pattern of attack.” The assumption right now is, I have all this work and attackers only needs one way to get in. Therefore, I don’t know what way they’re using to get in. So, let me get the data and see what the data tells me in terms of what made me abnormal and then use that to find if it’s really a security vulnerability, right? That, to me, is how data is being used to change the world and that’s happening in fraud detection. That is happening in cyber security. It is happening in, for example, insider threat detection.

So, in a variety of various areas it all used to be that I would call a pattern and then I go look for data that fit that pattern. It’s now like, “Let me get the data and identify what the pattern might be,” because there are just way too many ways to get in for a malicious person and so, I can’t predict what way they will use. So, it’s the mindset … The mindset has changed and the technology is now available to make that happen.

Stefan: Where is Informatica playing a role there? In the data aggregation part?

Anil: Yes, exactly. We are not the analytics provider. We will work with any analytics provider. We’ll work with any kind of visualization technology or user-oriented technology that the customer wants to use. We are a data infrastructure provider so it’s really data aggregation, data integration, cleansing and getting a 360 degree view of data so that if you identify, “This is the area that I need a 360 degree view of,” I can get that and then that can then be consumed by the applications. That’s really the role that we play.

Stefan: What are some of the biggest challenges to get all this data fabric into place and in big companies?

Anil: The challenges are … Right now, for example, this is all the different types of data. It’s not only structured data but machine data, unstructured data, semi-structured data, etc. So, there’s a lot of variety of different data types. Second is the latencies. Some data you need to get in batch. Some is in streaming, some real-time, etc. Third is the location of data, right? On premise versus cloud versus some combination of the two and making sure you get the access. And the fourth is, once you get all these different data types, how do you make sure that you process it efficiently, because you don’t want to get data and store it all in one place and that could become a security hole by itself. So, how do you process it most efficiently for all these different data types and keep up with different technologies? Think of NoSQL for example, changing so rapidly. For a customer, how do you build an abstraction layer so that, while you can make use of the new technologies, you don’t go out of fashion or you’re not stuck with the wrong technology. It’s helping the customer strike that balance. Those are really the challenges for us.

Stefan: Do you see that the data volume is a bigger problem or the data variety? Are companies challenged by the exponentially fast-growing number of data sources or is each data source just becoming bigger?

Anil: Yeah, I think it depends on the company. I’ve talked to customers where both holds. For example, we have customers who are now saying, “Look, rather than going to traditional data providers which are not collecting as many data sources, we want to go to as many sources as possible.” I’ll give you an example. The Weather Channel, for example, is a customer. Historically, to predict the weather, and to provide weather forecasts, they go to the government sources and get some data sources from there, etc. But, now what is happening is there is a ton of available data. For example, every town, every city, publishes certain weather forecasts and source data that they can use right? For such companies, the volume is huge but the data types are relatively still the same.

Now on the other hand, if you think of someone like an insurance company today, and they are trying to use, for example, new types of techniques to predict risk and now suddenly they’re getting some IoT type of data from sensors that may be deployed. They may be working with the re-insurers or providing certain data. They have their own main frame systems which have certain data. So, for them, variety is the problem. So, it’s not as much. The volume is manageable. Variety is the problem. So, I think we’re still in the early days. So, I think we’re definitely seeing all these different kinds and I think that’s part of the challenge in building … I don’t know if you can build a universal platform that can handle … It’s very difficult to believe but who knows? There may be some promising technologies that seem to be very flexible in being able to do that.

Stefan: From your perspective, where’s the biggest growth opportunity for your company?

Anil: We look at it as the intersection of what’s happening with the cloud and big data. Not only the movement of data between our premise and cloud and within cloud to cloud but also just the sheer growth of data in the cloud. This is a big opportunity. And if you look at the big data world, I think a lot of what happens in the big data world from our perspective, the value, especially for enterprise customers, the value of big data comes from when they can derive insights by combining data that they have from their own systems, etc., with either third-party data, customer-generated data, machine data that they can put together. So, that intersection is good for, and we are a data infrastructure provider, so those are the two big areas where we see opportunity.

Stefan: Yeah, couldn’t agree more with you that really enriching more and more data will give you more context and therefore, better insights.

Connect with Datameer

Follow us on Twitter
Connect with us on LinkedIn, Google+ and Facebook

Stefan Groschupf

Stefan Groschupf

Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which 10 years later, is considered a 20 billion dollar business. Open source technologies designed and coded by Stefan can be found running in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark, all rely on technology Stefan designed more than a half decade ago. In 2003, Groschupf was named one of the most innovative Germans under 30 by Stern Magazine. In 2013, Fast Company named Datameer, one of the most innovative companies in the world. Stefan is currently CEO and Chairman of Datameer, the company he co-founded in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, EMI Music, Hoffmann La Roche, AT&T, the European Union, and others. After two years in the market, Datameer was commercially deployed in more than 30 percent of the Fortune 20. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and is advising a set of startups on product, scale and operations. If not working, Stefan is backpacking, sea kayaking, kite boarding or mountain biking. He lives in San Francisco, California.