GDPR Is Almost Here. Are Your Analytic Processes Ready?

  • John Morrell
  • April 18, 2018

In most organizations, the risk and compliance teams view GDPR as another regulatory reporting process. Data needs to be managed and aggregated, processes need to be audited, reports need to be generated, and investigations need to be performed.

Yet, just as new regulatory obligations gave financial institutions new ways to look at their business, customers, and operations, GDPR provides a similar opportunity to use regulatory compliance to advance a company’s analytics. Organizations can use GDPR processes to advanced their analysis of customer interactions and experience, understand customer behavior, and create stronger customer relationships.

With the deadline approaching quickly, many organizations need to focus on what is important about GDPR compliance and ensure specific aspects are in place. When it comes to GDPR compliance and the management of your data for analytics, four critical aspects need to be addressed:

  • Is the data secure?
  • Who is using the data?
  • How is the data used?
  • Is there consent?

Compliance, governance, and security are not the sole responsibility of an analytic data management platform. While the platform can supply the key capabilities, organizations must establish proper processes and policies that use these capabilities to ensure the appropriate levels of privacy, governance, and compliance.

But the right processes and policies cannot be established without the critical underlying capabilities in the platform. Let’s explore what critical capabilities are required in a modern analytic data management platform to address each of the four key aspects of GDPR.

1.    Is the Data Secure?

Data security is arguably the most important aspect of GDPR. Fines for data breaches are very high under GDPR. If GDPR were in place, Tesco Bank would have reportedly faced £1.9 billion in fines for their November 2016 data security breach.

In the world of today’s data lakes, security can become a complex relationship between the analytic data management platform and the underlying data operating system – a.k.a. Hadoop or compute resources in the cloud. The two must work in concert, along with other enterprise security standards, to ensure data privacy.

To ensure data is secure and private, a modern analytic data management platform must support many key features:

  • Role-based security that controls access to artifacts and data
  • Integration with enterprise security standards – LDAP and Active Directory
  • Fine-grained access controls on both analytic logic, models, and datasets
  • Data Encryption, both at-rest, and in-motion
  • Obfuscation of fields and secure views
  • Integration with Kerberos for advanced security key management
  • Secure impersonation for integrated secure execution with Hadoop
  • Integration with Sentry and Ranger for single security models on all data access
  • Data retention policies that control how data is managed in analytic pipelines

Each of these underlying capabilities plays a critical role in securing data end-to-end and eliminating the potential for breaches and unauthorized access.

2.    Who is Using the Data?

GDPR compliance rules go beyond setting policies. Organizations must continuously understand and demonstrate who is using the data and for what purpose to establish one aspect of unauthorized use.

Role-based security can determine who CAN access the data. This helps rule out specific users from unauthorized access. But, of those with access, it does not tell you who accessed different artifacts and datasets.

To ensure GDPR compliance around using the data, an analytic data management platform needs to support full logging and auditing services to track all activity. This includes user action and behavior logs and security audit logs to track:

  • Who accessed various artifacts and datasets
  • What actions were performed by the users on artifacts and datasets
  • What access controls were in place at the time of the access and use
  • What jobs were run by users to produce analytic datasets

This information needs to be examined in two ways: Identifying specific unauthorized actions and examining behavior patterns that lead to invalid use of data by users.

3.    How Is the Data Used?

Looking at who used the data is only half the battle, simply pointing the finger. To truly determine if access led to illicit use (and potential violations), an organization must examine how data was used.

The aforementioned logging and auditing services also need to keep trails of behavior to follow how analytic data is used. A full audit trail must include:

  • The end-to-end lineage describes where the data came from, how it is modeled and manipulated, and where the data is sent (or consumed) once complete
  • What management policies are applied to the data in the pipeline to determine what parts of the data are used
  • When analytic pipeline jobs are run, how they are run, and the result of these jobs

These aspects of the audit trail need to be combined with the “who” information to get a full 360-degree view on usage compliance.

For compliance reporting, aggregating the data from various audit trails, artifacts, and datasets across various analytic systems are often centralized into a compliance hub. To aggregate this information, the analytic data management system needs to provide capabilities such as an Event Bus and metadata exchange to easily interact with these repositories to streamline reporting and auditing processes.

4.    Is there Consent?

Consent for the use of a person’s data is at the heart of GDPR. It is designed to protect the Personally Identifiable Information (PII) of consumers and includes the “right to be forgotten.”

Determining consent is not a specific capability that an analytic data management platform can provide. How consent is granted and how it is managed will vary between organizations and different analytic processes.

Consent needs to be built into the logic of the data flow for the analytic pipelines. This requires an analytic data platform that offers sophisticated data flows that use various fields to separate usable and unusable analytic data.

It also requires operationalization features that continuously flow fresh data into the analytic pipelines to maintain up-to-date, usable data sets. Data retention features also need to retain various “point in time” datasets for auditing purposes.


With the deadlines for GDPR looming, many organizations wonder if they are ready and if their analytic data management is prepared to support their compliance efforts. We’ve reviewed a few critical aspects of managing your analytic data for GDPR compliance and the key capabilities you need from your modern analytic data management platform.

To further your education, we’ve provided the resource below:

To learn more about Datameer and produce robust, smarter analytic data pipelines, please visit our website at

Instant Access To Our Free Library Of Resources

Discover the Top ETL and Data Integration Platforms


Featured Blog Posts

Five Critical Success Factors To Migrate Data to Snowflake
Five Critical Success Factors To Migrate Data t...

You’ve decided to modernize your data and analytics stack and migrate analytics workloads to the ...

  • John Morrell
  • May 10, 2021
ETL++: Reinvigorating the Data Integration Market

(This article first appeared on Medium on April 6, 2021.) The definition of “++” means incrementa...

  • John Morrell
  • April 12, 2021
Spectrum ETL
Disrupting the no-code cloud ELT market: Datame...

More than just loading Data: Datameer launches Datameer Spectrum ETL++ to disrupt the no-code clo...

  • Press Release
  • February 9, 2021
Google Partners with Datameer
Datameer Partners with Google Cloud to Deliver ...

Datameer is now a Google Cloud migration partner The partnership will help customers build secure...

  • Press Release
  • December 2, 2020

More Resources We Think You Might Like

Top 5 Fivetran competitors

Top 5 Fivetran Competitors and Alternatives

What is Fivetran?  Fivetran is a cloud-based ELT integration tool that teams can use to synchroni...

  • Justin Reynolds
  • June 15, 2021
The Simplest Road to a Modern Data Stack with Snowflake

The Simplest Road to a Modern Data Stack with S...

The first building block of a cloud data stack starts with Snowflake.  Your analytics engine and/...

  • John Morrell
  • June 14, 2021
Top 5 Matillion Competitors

Top 5 Matillion Competitors and Alternatives

Matillion ETL Review Matillion is a cloud-based ETL tool that enables teams to create and orchest...

  • Justin Reynolds
  • June 10, 2021

Updating your ETL? Your guide to the 10 things to consider when modernizing your ETL.