Datameer Blog post

GDPR Is Almost Here. Are Your Analytic Processes Ready?

by on Apr 18, 2018

The May 25, 2018 deadline for the General Data Protection Regulation (GDPR) is almost upon us. And the question many in management are asking is: Are we ready?

In most organizations, the risk and compliance teams view GDPR as another regulatory reporting process. Data needs to be managed and aggregated, processes need to be audited, reports need to be generated and investigations need to be performed.

Yet, just as new regulatory obligations gave financial institutions new ways to look at their business, customers and operations, GDPR provides a similar opportunity to use regulatory compliance to advance a company’s analytics. Organizations can use GDPR processes to advanced their analysis of customer interactions and experience, understand customer behavior and create stronger customer relationships.

With the deadline approaching quickly, many organizations need to focus on what is important about GDPR compliance and ensure specific aspects are in place. When it comes to GDPR compliance and the management of your data for analytics, there are four critical aspects that need to be addressed:

  • Is the data secure?
  • Who is using the data?
  • How is the data used?
  • Is there consent?

Compliance, governance and security are not the sole responsibility of an analytic data management platform. While the platform can supply the key capabilities, organizations must establish proper processes and policies that use these capabilities to ensure the appropriate levels of privacy, governance and compliance.

But the right processes and policies cannot be established without the critical underlying capabilities in the platform. Let’s explore what critical capabilities are required in a modern analytic data management platform to address each of the four key aspects of GDPR.

1.    Is the Data Secure?

Data security is arguably the most important aspect of GDPR. Fines for data breeches are very high under GDPR. If GDPR was in place, Tesco Bank reportedly would have faced £1.9 billion in fines for their November 2016 data security breach.

In the world of today’s data lakes, security can become a complex relationship between the analytic data management platform and the underlying data operating system – a.k.a. Hadoop or compute resources in the cloud. The two must work in concert, along with other enterprise security standards, to ensure data privacy.

To ensure data is secure and private, a modern analytic data management platform must support a number of key features:

  • Role-based security that controls access to artifacts and data
  • Integration with enterprise security standards – LDAP and Active Directory
  • Fine-grained access controls on both analytic logic, models and datasets
  • Data Encryption, both at-rest and in-motion
  • Obfuscation of fields and secure views
  • Integration with Kerberos for advanced security key management
  • Secure impersonation for integrated secure execution with Hadoop
  • Integration with Sentry and Ranger for single security models on all data access
  • Data retention policies that control how data is managed in analytic pipelines

Each of these underlying capabilities plays a critical role in securing data end-to-end and eliminating the potential for breeches and unauthorized access.

2.    Who is Using the Data?

GDPR compliance rules go beyond setting policies. Organizations must continuously understand and demonstrate who is using the data and for what purpose, in order to establish one aspect of unauthorized use.

Role-based security can determine who CAN access the data. This helps rule out specific users from unauthorized access. But, of those with access, it does not tell you who accessed different artifacts and datasets.

To ensure GDPR compliance around who is using the data, an analytic data management platform needs to support full logging and auditing services to track all activity. This includes user action and behavior logs, and security audit logs to track:

  • Who accessed various artifacts and datasets
  • What actions were performed by the users on artifacts and datasets
  • What access controls were in place at the time of the access and use
  • What jobs were run by users to produce analytic datasets

This information needs to be examined in two different ways: Identifying specific unauthorized actions, and examining behavior patterns that lead to invalid use of data by users.

3.    How Is the Data Used?

Looking at who used the data is only half the battle, simply pointing the finger. In order to truly determine if access led to illicit use (and potential violations), an organization must be able to examine how data was used.

The aforementioned logging and auditing services need to also keep trails of behavior to follow how analytic data is used. A full audit trail must include:

  • End-to-end lineage that describes where the data came from, how it is modeled and manipulated, and where the data is sent (or consumed) once complete
  • What management policies are applied to the data in the pipeline to determine what parts of the data are used
  • When analytic pipeline jobs are run, how they are run, and and the end result of these jobs

These aspects of the audit trail need to be combined with the “who” information to get a full 360-degree view on usage compliance.

For compliance reporting, aggregating the data from various audit trails, artifacts and datasets across various analytic systems is often centralized into a compliance hub. To aggregate this information, the analytic data management system needs to provide capabilities such as an Event Bus and metadata exchange to easily interact with these repositories to streamline reporting and auditing processes.

4.    Is there Consent?

Consent for the use of a person’s data is at the heart of GDPR. It is designed to protect the Personally Identifiable Information (PII) of consumers, and includes the “right to be forgotten.”

Determining consent is not a specific capability that an analytic data management platform can provide. How consent is granted and how it is managed will vary between organizations as well as different analytic processes.

Consent needs to be built into the logic of the data flow for the analytic pipelines. This requires an analytic data platform that offers sophisticated data flows that use various field to separate usable and unusable analytic data.

It also requires operationalization features that continuously flows fresh data into the analytic pipelines to maintain up to date usable data sets. Data retention features also need to retain various “point in time” datasets for auditing purposes.


With the deadlines for GDPR looming, many organizations are wondering if they are ready and if their analytic data management is prepared to support their compliance efforts. We’ve reviewed a few critical aspects of managing your analytic data for GDPR compliance and key capabilities you need from your modern analytic data management platform.

To further your education, we’ve provided two valuable resources:

To learn more about Datameer and producing robust, smarter analytic data pipelines, please visit our website at

Posted in How-to, Big Data Perspectives

John Morrell is Sr. Director of Product Marketing at Datameer.

Back to Overview

Subscribe to the Datameer Blog