Governance Best Practices

Governance Best Practices

We examined several different approaches and architectures for governance. Choose the right approach for the unique needs of your organization, data, analytics, and business teams. With Datameer, you can mix and match these models to fit individual needs with departments or business units.

Ebook Background

About the Governance Best Practices Ebook

In the world of big data and analytics, “governance” has become a buzzword. The notion of governing data is laudable. Data is strategic, important, and, if mishandled, potentially compromising. It certainly needs to be protected.

Your data needs a custodial layer to make that happen, and governance would seem to provide that. Its name makes that almost self-explanatory.

But although governance does encompass that layer, it extends well beyond it. In its best implementations, data governance does more than establishing a defensive regime around data. Instead, it creates an environment that makes data highly available, trustworthy, and easily discoverable. In general, good data governance entices people in the organization to explore, query, and contribute data, and it supports efforts around digitalization and promoting data-driven practices.

Introduction & Role of Governance

The notion of governing data is laudable. Data is strategic, important and if mishandled, potentially compromising. It certainly needs to be protected. Governance, concerning any data, is of paramount importance.

DataOps Process: Data Platform Capabilities

What Does Governance Encompass?

Lineage and Impact Analysis; Audit; Security; Data Quality; Compliance; Certification; Master Data Management; Data Cataloging.

Key Governance Features

Authentication; Secure Impersonation With Kerberos; Roles, Access Control and Permissions; Obfuscation; Lineage; Data Management; Auditing (Event Bus).

Approaches to Governance

Drivers Behind Governance; What You Need to Govern; Introduction; Data-centric; User-centric; Reusability-centric; Department-centric; Lifecycle-centric

DataOps Process: Drivers and Objectives of DataOps

Governance Reference Architectures

Datameer-centric; Data Lake-centric; Enterprise-centric.

DataOps Process: How it helps

Conclusion and Other Considerations

Scope, People, Process, and Fit.

DRIVERS BEHIND GOVERNANCE

As previously noted, governance covers many key aspects of how you want to operate your big data analytics. This includes:

  • Data security — This is important, but it’s not the only aspect of governance. For example, as I look at my big data analytics, I need to define how I lock down my data, provide secure views of the data and ensure the proper access controls are in place, both to the system and the data.

  • OptimizationGovernance also should help the team optimize their infrastructure to run effectively. Optimizing big data analytics involves creating the right structure and letting team members effectively operate and optimize what they know best.

  • Self-service — A well-aligned governance strategy will enable the degree of self-service you want to provide. Controls that are too tight will stifle self-service. If they are too loose, the risk is introduced.

  • Sharing and Reusability — With all the data involved in big data analytics, sharing and reusability bring greater economies of scale. Governance needs to implement the right structure for findability and the right blend of controls for reuse and sharing.

  • Operationalization — Governance plays an important role in how your analytics are put to work. This involves a clean structure and process to promote analytics to regularly running jobs. You want to be confident it runs cleanly, produces the right results, and is given to the proper business teams.

DATAMEER-CENTRIC

The first reference architecture focuses on Datameer, with much of the governance work performed using Datameer administration features for folder organization structure, user management, and role-based security. There are two critical integration points with external services:

  • A combination of LDAP or Active Directory, and possibly SAML for authentication

  • Secure Impersonation, optionally with Kerberos integration

When LDAP or Active Directory are used solely for authentication with roles still defined inside Datameer and applied to artifacts, secure impersonation is used to ensure jobs run with the same privileges as the Datameer user for close security Hadoop cluster. If the cluster is secured using Kerberos, then that integration should be configured.

Get the Governance Best Practices Ebook

Sign Up for Our Newsletter

If you liked this ebook, sign up and stay informed on the most popular trends in data management.