The Big Data Perspective, With Shawn Rogers of Statistica [Podcast]

Today we’re featuring Shawn Rogers, Statistica’s‘s Chief Research Officer. He’ll provide us with his perspective on what makes big data projects successful, what’s holding them back and much more.

This is part of our podcast series on big data thought leaders. Be sure to subscribe to our blog to get updates as soon as they’re published!

Transcript, lightly edited for clarity:

When it comes to big data, everyone wants a success story. Today, we’re talking with Shawn Rogers, Chief Research Officer at Statistica, about what holds people back from big data success. I’m Joanna Schloss, and this is The Big Data Perspective. Good morning, Shawn.

Shawn: Hey Joanna, nice to hear your voice.

What Holds Big Data Projects Back From Success?

Joanna: Yeah, good to chat. We both get to see how big companies adopt big data, and we often see them launching projects, only to fail to achieve the results that they’re looking for. In your perspective, what do you see as holding them back from true success?

Shawn: You know, I’ve had the pleasure and the agony of watching an awful lot of projects over the last 10 years, and it’s like anything else. Did you have good scope? Did you understand what the return on investment was going to be? Did you have the right players involved? Did you have an executive stakeholder? All of those things are necessary, but very common across the board.

I think with big data in general, people started to feel like some of the systems that were out there were a catch-all and kind of a magical solution to a lot of problems. Early on, I saw people trying to force fit big data projects into or onto platforms where they probably didn’t need to be. There was also a preoccupation with the data itself. I think there was an awful lot of excitement around collecting that data and being able to hold onto more data than we ever had before.

I give talks about some of the technologies in this space, and I’ve always said that the most dangerous thing you can encounter is a C-Level business person who just got off an airplane and just read a business magazine with a bunch of buzzwords in it. Big data and others were certainly very common early on in the space, and they would walk back into the board room or the office and declare, “Well, we need one of these, and let’s go do it,” and they never thoroughly understood the path to getting there.

I think having a strategy around big data is extremely important today. I don’t think all of our enterprise data belongs in those wonderful, traditional places that we used to try to jam it. I think smart companies are diversifying their strategies towards big data in a way that makes sense to them. I think everybody’s focus on it is going to be different, and I think it needs to be part of what you do, but I think you have to take a common-sense approach to it.

How Do You See Failed Big Data Projects Affecting the 360-Degree Customer View?

Joanna: That makes sense. Once again, you prove the “if you build it, they will come” approach is not successful. With these failure to launches on these big data platform initiatives, how do you see these sort of failing to deliver on the promise affecting analytics? You and I are both analytics enthusiasts, and analytics requires data.

With all this data being gathered but not delivering on the promise, do you see them affecting our analytic projects? Because ultimately, I think we’re in agreement that analytics drives a lot of these useful projects. How do you see this failure to launch affecting that 360-degree view of their customers or their business? Any kind of insight you might be able to give them on how they can still deliver the analytics, even though the project may have somewhat fell short of their promise?

Shawn: Well, there’s that old phrase, “Don’t boil the ocean,” right? Make sure you’re kind of looking at things through a finely-focused filter, especially around big data. I think it’s easy for us to get over-enthusiastic on these projects, and they certainly can crawl out of scope pretty easily. I like the way you phrased the question, delivering on the value. I’ve always felt that analytics in general, and even advanced analytics, is the mechanism that does deliver the value and does give that 360-degree view or deeper insight of customers and how they’re interacting, whether it’s with your brand or your service.

I think a lot of people in the early days got a little distracted by the technology and the collection process around the big data space, but didn’t spend as much time as they could have or should have around getting the insight and taking action.

I think that if you have a mechanism in place that delivers insight, and you mentioned the 360-degree view of customers, even if it’s 290-degrees, it’s probably a huge improvement over what most companies have today. It’s the analytics that get you there. If you have the insight from the analytics meshed or right alongside of the right data, and then you’re able to ultimately take action, I think that’s the life preserver to a lot of these early and not super-well-planned-out big data projects.

If you find yourself standing inside of your company and you’re looking around and you’re marveling at all the date you’ve collected instead of marveling at the insight, the action, you’ve got a problem. You can still solve it, and I think your point is that you can still throw a life preserver around what you may have done with big data. The great part is generally, it does hold true; the more data we have, the more interesting the insights can be. The analytic platforms today enable that.

Why Has Big Data Analytics Suddenly Become Newsworthy?

Joanna: Well, it sounds like it would be wise for us in achieving small goals in reasonable time frames, and as you said not to boil the ocean, but to take advantage of all the data that we are now able to very quickly curate and pull together. I like your life preserver analogy.

We did mention how a lot of our customers fell in love with the technology and the buzzwords. It seems like in the last couple of years the whole analytic space is now experiencing a renaissance of buzzwords as well, from cognitive, neural, deep learning, machine learning, culminating in advanced analytics, are all being pushed around in the marketplace today as the next hottest thing.

Being that you’ve been in this space for advanced analytics and analytics at large for several decades, could you share with us why it’s become resurfaced, why has this caught fire once again, and some of the impetus around this and how our customers can potentially sift the buzzwords from the practical and the useful?

Shawn: Yeah, well playing buzzword bingo in our industry is a pretty easy thing to do if you read industry news on a daily basis. I absolutely agree, the coverage that I see in this space does make an awful lot of these things sound like they’re brand new, hot technologies. I’m not sure I agree.

I kind of think it’s that kind of old saying of “Everything old is new again”. Advanced analytics, for instance, has been in our industry for decades. Some of the market leaders have been around 20, 30 years with some of their technology, and they do a wonderful job of delivering this insight and being able to make things actionable against whatever data is available.

What’s really changed, I think, is our ability to have more data available, which adds more value to the insight, and then on the other side is these traditional technologies like advanced analytics, have become a lot more consumable. I think all of the leaders in the market are spending a lot of time and energy and investment to kind of democratize access to these more sophisticated forms of data analysis. Everybody’s heard that term “data scientist”, the sexiest job on the planet, the “unicorn”, as they call them, but that’s already kind of come and gone.

Data scientists are still really cool, but at the same time, power users and citizen data science people, folks that are kind of living on the far edge of the BI side and kind of jumping that gap to deeper analytics, these people lack fear and they want to get their hands on the tools and the insight.

They want to infuse their business insights with things like predictive analytics instead of just historical views of their business. I think it’s a pretty exciting time. All keywords and buzzwords aside, no matter what technologies you’re looking to bring from an insight standpoint, I think they’ve become a lot more consumable and I think the old consumption model is broken.

I remember the first time I wanted some predictive insight into data I had many years ago. Frankly, I wasn’t capable of driving the platform. It was too difficult, it wasn’t user friendly. You had to go down the hallway and ask, and then wait. That is not the speed of today’s business. Now, no one wants to wait, people are interested in self-serving or at least going to get it on their own. That kind of brings up the foundational stuff here, which is how big data enables these insights and how the new technologies that have been in the marketplace for a long time are taking a new approach to power more people.

How Will Big Data Models Change?

Joanna: Right. I think it’s interesting that in our times, they’re ponying up these terms as novel and new. My favorite VP of R and D suggested to me the other day that machine learning is nothing more than just statistics on a system, on a cluster, and we’ve been doing that for decades.

I would argue that businesses don’t want to run their business on new, novel analytics that are unproven and untried. That’s pretty risky for a large company, but I like your idea that because of the democratization of data, and the accessibility and this notion of end-user access, content is now revitalizing these analytics and delivering them to a broader audience. More people can be introspective and start looking at the content, and delivering on that.

I think of it as the journey and I know we’ve talked about this before, but the journey is more important than potentially the milestones along the journey. Many of my analysts are much more interested in asking that next question, “Why?”, and the next question of, “How or what?”, and I think all these platforms deliver that ability to create that journey with these models, and the data supports it.

How do you see models changing in the next few years? Do you have some insight into sort of … In the past, we would go to the pocket-protecting data scientists/unicorns and say, “Oh guru of all things, build me a model that does mortgage optimization against customer base,” and he or she would go off and build an elegant and beautiful model, and then deliver us some metrics.

I think we were suggesting that’s changing as well. How do you see that changing, and how do you see, what do you imagine as the emerging, on the other side, with models and creativity from that perspective?

Shawn: Well, you know I think there’s a couple of things going on. Speed and scale come to mind, but it’s not maybe the traditional speed and scale that we might think about when you hear someone utter that phrase. The speed part is, are we able to react to the business at the speed of the business? That wasn’t part of the model many years ago, especially around advanced analytics, you had to wait in that hallway after you made the request, and you didn’t get your insight, sometimes for weeks and weeks and weeks.

You and I saw many types of workloads many years back, where they would set the model in place, they’d start hitting the data, and then everyone would leave and come back a few days later.

Joanna: That’s right. Lunch time, not even coffee break, lunch time.

Shawn: And hope like heck that the model ran. Now, there’s this speed thing, and part of the speed thing is the ability to influence the insight value for iteration. It used to be you had to take a rifle shot of the data. Now you can take more of a shotgun approach and iterate over the data, and be more exploratory in nature. It’s faster, that’s the speed part. The scale part that’s interesting for me, that’s kind of different right now, isn’t about the data scale, which is generally where we go with that part of the conversation, it’s more about can you scale the actual analytic practice in your business?

Using our traditional sort of footprint, go down the hall, ask someone to help you, that was kind of small-scale. There was a handful of people that were smart enough to ask the questions, and there were a handful of models that a company was running to understand their business better.

We see customers in the marketplace now who have thousands and thousands of active models, and they’re not just in the environment of safely behind the firewall of their company. Some of these models are sitting on IoT devices way out at the edge, or in between.

The scale issue today isn’t around … I mean, I’m sure there are some vendors that struggle with scale of data, but I think it’s more scale of management and governance. Can I optimize my analytic environment? Do I have a tool or suite of tools that allow me to make sure I am getting the very best output out of these models?

When we used to babysit just a couple of them, it wasn’t that big a deal, but now with the diversity of models that customers want to run, and the pure scale of them I think there’s kind of going to be a new shift towards centralization, better management, better optimization and so on. Because even the best models start to fall apart over time, and you have to go back and check their validity, and make sure that you’re on target with this analytic insight. That’s going to become more and more difficult as customers sort of scale up.

For me, scale and speed are kind of two of these new differentiating things, and perhaps diversity. You know I mentioned IoT and analytics outside your environment, we’re starting to see our customers wanting to do analytics just about everywhere, whether it’s inside a database, or on a gateway, or next to an IoT sensor or on an industrialized PC in a utility environment.

It’s getting kind of crazy and I think, going back to the conversation of big data, I think that’s also substantiated by what we see in the big data space, which is kind of allowing your data to live where it should and not always trying to necessarily shove and push it around.

I think data has significant gravity today, and so there are some places where it’s good to let the data be where it is. Instead of always trying to bring the data to the analytic, I think people are going to want more to bring their analytic to the data.

Joanna: That makes sense, I love that. It’s like traveling, and I know you’re a seasoned traveler. Anytime we get off a plane, it just adds complexity and risk. It sounds like potentially the way we deal with big data, analytics, and how we deliver that data, where it is, the fewer jumps we have, the more successful we may be in affecting those changes and being agile.

I know “agile” is another one of those buzzwords that everyone has adopted, but in this case it does sound like you are describing an analytic environment that is both agile, speedy and evolutionary. As you gain more insight, in theory, what I heard you say is, you can then optimize your models again wherever it is, create it, make it better, tweak it, and then run it again and ultimately have a faster, more efficient way of delivering value and increasing that insight, and increasing the why and the where of what you’re doing with more confidence.

Shawn: Yeah, well the iteration part I think is something that’s pretty important. You and I had a conversation a year or so ago about the speed gap, and when is fast fast enough, and when does the value proposition shift? I think that you had some very interesting insights around that topic, that at some point, analytics are moving fast enough that it’s not just the speed of the answer, it’s the ability to look at the problem multiple ways. If you can stand back and look at analytics and say, “Well I’m going to run these three models, and then decide which one’s giving me the best input,” and I can do that so fast that I can still keep up with the business, that’s kind of a newfound value in this. The smart companies are going to look at doing that as well.

What Needs to Happen for Improved Data Governance and Data Curation?

Joanna: You mentioned in passing the idea of governance and using the data where it was. I feel like we have to address, and rightly so, the idea of privacy. Customers have all sorts of different kinds of secure and private data, and I feel like some of our customers are leery of creating analytics around data that does affect privacy. Do you see privacy adoption, governance and security as a continuing barrier for big data and analytics? Depending on your opinion, how do you see consumers and customers alike taking steps to sort of overcome that barrier to entry or barrier to adoption?

Shawn: You know, I think it’s a very interesting topic, and I talk on this topic a fair amount because I’m kind of enamored by it. My slogan that goes with it is, “Just because you can doesn’t mean you should.” I think a lot of companies are struggling with that and it’s part culture, and it’s part governance, and part regulatory, when we start talking about privacy issues.

I’ve seen a lot of use cases where companies have realized that with all this great big data and this really great analytics that they can suddenly do things that they could never do before. But just because you can doesn’t necessarily mean you should.

Some companies are learning really hard lessons on that, and some of them are rather public and embarrassing, and then there are other companies that are getting pretty smart and making sure they’re aligned corporately and culturally as to what they’re doing with data. Every business will have strict issues around personally identifiable data, or PII, or in the healthcare world, HIPPA data and so on. That stuff kind of comes with governance that’s easier to understand.

In other segments or verticals within our industry, people are over-innovating and they’re finding themselves in trouble. I think at the core of this over-innovation side is a couple of things to think about. I call it PCA, and PCA for me is,

  • Do you have permission?
  • Is the context correct?
  • Are you accurate?

If you start to think about some of your interactions, maybe in online shopping, where you’re suddenly surprised that the website seems to know something about you that you don’t remember telling the website, that starts to make you think about, “Well, did I give permission for this data to be shared here, and did I authorize access to this information?”

I think customers and consumers are starting to get a little bit sensitive to that. I think we don’t get annoyed at sort of over-innovation if it’s in the correct context. If I’m shopping for baseball gloves and an ad shows up on the right side for a baseball glove and it’s a good deal, I tend not to be nearly annoyed as I might be at some other advertisement that seems out of context.

Then last bit is the accuracy thing. I tell a story that years ago, my wife asked me to purchase door knobs for our cabinets in the kitchen. I’m 52 years old and I’ve never bought a door knob in my life. I went online to Knobs R’ Us, or whatever the website was, and I bought some cool things that my wife liked and I installed them. Then for the next couple of weeks, everywhere I went on the internet, I saw ads for this website, and I thought, “Well, it’s not very accurate. I already made my purchase, why are you still tracking me?”

I think you have to be careful with the permission, the context of what you’re doing, and how accurate your analytics are. I don’t think that that solves all the problems, but I do think that at the tip of the spear, it’s something that needs to be considered. Then at the back side, you have to agree on what your culture is for innovation, especially when we’re using data. Joanna, you and I have talked about the cool things that you can get when you take data point A, B and C and bring them together, and you’re able to drive data.

That’s something that consumers don’t really understand, that a couple of data points mashed up from different resources can actually give me a picture or insight of a customer that you may find unexpected. You may not be comfortable with me deriving that information about you, and it could be your gender, it could be your salary range, it could be the region of the US you live in, there are all kinds of data points that can be derived. I see that a lot of the people that I deal with and talk to about privacy, you don’t necessarily have permission to derive data. You don’t have permission to kind of pull that data from other data sources.

As big data gives us access to so much more, we have to remember that just because you can, doesn’t mean you should. Corporately, you have to have a culture in place, and some governance to make sure that people aren’t over-innovating.

Joanna: Okay, well certainly it sounds like we are in agreement as privacy, governance and security are not only technology issues, but much more a cultural issue and a “ought you do this as a practical perspective?” Your PCA acronym reminds me of the saying that I say, which is, “Would you do this with your grandmother? Would you say this to your grandmother?”, as you post things or create derived data.

“How would you make your grandmother feel?”, because we all have a grandmother that we wouldn’t want to upset. If it passes the permission context and accuracy tests with your grandmother, it might be the way to go. If you can’t get past that test, you might want to re-think what we’re doing.

Shawn Rogers’ Big Data Predictions

We’re coming to the end of our time together today, and given that we’re about to hit the new year, I would be remiss to not give you an opportunity to do a little bit of predictions for next year, and maybe for the future beyond. What do you see in the world of analytics and big data? What trends are you seeing, and what might you think will be different when we chat again next year at this time?

Shawn: Well, you know you led me to a couple of them, and I’ve definitely kind of touched on it, but as analytics and advanced analytics continue to mature and more people get their hands on it, I do think that some level of standardization and control and governance are going to start to play a role.

Because as I mentioned, in the early days of having a couple of models, a handful of algorithms within a business, that was easy to manage. Now, it’s going to be thousands, and as we become more prolific with our utilization of advanced analytics, you’re going to have to have better governance and control and standardization.

I also think it’s very important that that particular topic play a role around the data scientists, because data scientists are still right there at the top of the pyramid. They’re kind of the deep thinkers around analytic practices, but you can’t scale them, because most companies are lucky to have one or two, let alone a whole group of them. The data scientists will want to have a framework to work within, so that their work can be reused as a standard by others within the business, and I think that that’s going to help a lot with kind of this democratization thing that I was talking about.

I think governance is going to be a topic next year for all of us who are really getting serious around analytics. I think analytics all over the place or everywhere is going to be pretty interesting, I kind of touched on that earlier. I think the world of IoT is quickly moving away from the buzzword bingo side of things to just kind of discussing this opportunity to do analytics everywhere in your data landscape. Again, partially because of the gravity of data or the speed of data, but just basically the idea of being fast and being able to do analytics where you want.

I think analytics at the edge and everywhere are part of it, and I think that we’ll see a more sophisticated view of this, which is something I call “concentric analytics”, which is these sort of radiating waves of how analytics will be different at different circles. Picture a bullseye, at each radiating or concentric circle that goes out from the center, the workloads will be different, the data weights will be different, the data types will be different and the analytics will be different.

Orchestration across those rings are going to be very important. As data is analyzed at say, in a manufacturing process off an IoT sensor, that data will eventually move somewhere else or find a resting place somewhere else, and the work that you do on that data will look different than the work that we’re doing behind our firewall. I think you’re going to see this sort of radiating bands of sophistication coming and going towards the center. We call that concentric analytics, and the orchestration across each one of those rings is going to be very interesting next year. I think that’s a hot topic.

Joanna: That sounds really cool.

Shawn: Yeah, thanks.

Joanna: All right, well I’ll have to have a beer with you over that one, so we can noodle on that some more. Thank you so much, Shawn, for your time and your incredibly insightful thoughts. With that, this concludes today’s podcast. Join us on another podcast in the future, and have a great day.

Connect with Datameer: