What’s Needed for Data Governance Today?

  • Datameer, Inc.
  • February 27, 2018
Data Governance Featured Image Blog

Download your free white paper on best practices for data governance

Governance in Data Warehousing and BI Versus Governance in Big Data Analytics

In the data warehouse and BI world, things were relatively simple. The barrier to entry was fairly high to get data inside the warehouse or inside OLAP cubes. So by the time it got to some tool that was doing analysis on it, most of the vetting was already done and governance really was about managing permissions by users on particular subsets of the data.

In the world of big data and data lakes, it’s a very different equation. We have the ability for end users to bring data into the lake and to leave it in a raw form. So now what we need to do with governance is to not only enforce those access controls, but also make sure that the data that’s coming in meets a certain standard.

We also have to ensure that once it’s shaped and presented, it has an authoritative structure and is representing data that’s reliable and trustworthy. Governance is really all about trust when it comes to data. And because the source of the data is more varied now, there’s a lot more work to do to assert the kind of trust that we had before.

Is Data Governance Solely About Controls on Data?

When people hear the word governance, they tend to think about government. And maybe they don’t think about benevolent government but authoritarian government. A lot of people do kind of automatically equate governance with controls and with access to the data and certainly that’s part of it, but it’s only a part.

We need the ability to examine the lineage of that data. We need to make sure that the data is in the right structure, that the data types are correct and that we’re correlating datasets in a correct way. We don’t want to inadvertently come up with a misrepresentation, where some data that’s correct when joined with other data that’s correct somehow produces a joined dataset that really isn’t correct.

A lot of this is about curation. A lot of it is about cataloging and describing the datasets, and mostly making it possible for people to come into a data link, see what they have, field that and actually discover that it’s very accessible to them. But it’s best to do it in a way where we are implicitly giving people guidance on how to do analyses that are relevant and accurate and basically come out with a result that has a lot of integrity.

How Can Companies Balance Between Too Much Control and Too Much Access to Data?

So if democracy is a horrible form of government but the best that we have, I suppose we could say the same thing for data democratization at times. Actually really it’s not that bad.

But we need to strike a balance between assuring the pedigree of the data, making sure it’s authoritative, making sure that the access controls are correctly enforced. At the same time, we really need to keep usability up. If we’re going to talk about data-driven organizations and make sure that’s more than just talk but something real, where everybody feels not just empowered to analyze data and to follow what the data tells them, but actually enthusiastic and motivated and inspired to do that, then we have to keep the usability as high as possible.

In order to get this done it’s going to be super important that we have audit controls and the ability to look at lineage of the data, because it can go through so many steps before it’s actually analyzed and visualized. We need to have the confidence in the system that if we wanted to look back at the visualized data and trace it all the way back to its source, that we could. We’re not going to do that every time clearly. That is just not feasible.

But we need to have the confidence that we can, because that is going to establish a certain deterrent from people or from processes bringing in data that won’t meet that standard. As long as we have that confidence and as long as we have the ability to do the lineage and a process and standards and conventions set up for auditing that data at regular intervals, then we’ll probably hit the right balance.