Secure Impersonation with Datameer as Supergroup User

Prerequisites and Preparation

Before getting started with preparation, ensure that the Datameer application is configured with the appropriate authenticator to ensure that only valid HDFS users and groups exist in Datameer.

Authenticator integration

A key consideration for enabling the impersonation feature is that all users and groups available in Datameer must map directly to the HDFS user/group community. This is typically done by configuring Datameer to use an LDAP authenticator and employing group filtering to ensure that only valid HDFS groups are available within Datameer. See details on configuring the Datameer LDAP Authenticator.

Set up Authenticator First

For the best results, you should first configure the remote authenticator and then the import users to insure that the group filters are working properly.

Configure secure cluster mode

At this point, the Datameer installation should be configured to run in secure cluster mode. Please ensure that secure grid mode is configured and working before continuing.

Don't enable secure impersonation, yet!

HDFS group setup

It is recommended to create an HDFS group containing all Datameer users for a few reasons:

To avoid having to configure any directories to world writable.
To tightly control which users that the Datameer user can proxy.

This Datameer users group can be excluded from Datameer's LDAP authenticator if you don't want to expose it to end users.

Configuring Datameer as a super user and specifying allowed proxy users

Because secure impersonation in Datameer is based on native Hadoop instruments, the OS user which runs the Datameer application must be configured as both an HDFS superuser (member of the hdfs.supergroup) and allowed to proxy Datameer users from the Datameer machine.

Add a Datameer user to the HDFS supergroup

The HDFS supergroup is configured by default as {{supergroup}}, but is configured in hdfs-site.xml by the setting:

dfs.permissions.supergroup = supergroup

Once you have determined the supergroup, add the Datameer user to this group through your normal OS user management tools.

Configure proxy user

There are two configuration settings related to the proxy user capability that need to be set in core-site.xml on both the Namenode and on the JobTracker:

hadoop.proxyuser.<USERNAME>.groups
hadoop.proxyuser.<USERNAME>.hosts

For example, assuming the Datameer user is datameer and that a group exists called dasusers which contains all Datameer users, the groups setting are as follows:

hadoop.proxyuser.datameer.groups = dasusers

Next, assuming that the Datameer application is running on datameer.example.com then hosts are configured as:

hadoop.proxyuser.datameer.hosts = datameer.example.com

If using Cloudera Manager, update the safety valve on the Name Node, Secondary Name Node and the Job tracker. You might need to reset any override that is present for these settings to take effect.

If you are using a Kerberos-secured cluster with secure impersonation and HDFS transparent encryption, you also need to configure the proxy user for KMS.

Preparing the Datameer application

Before finally enabling secure impersonation, you must prepare the Datameer application and HDFS by following the instructions here. When that task is complete, you can continue with enabling the feature.

Enabling Secure Impersonation

To enable secure impersonation, navigate to the secure grid mode settings and select Enable Impersonation:

After enabling secure impersonation, there is a message about cluster validation. In order to ensure best operation, Datameer can run a validation job to ensure that the cluster adheres to certain configuration guidelines. To run the set of assertions associated with secure impersonation, click Run Tests.

Workbook being shared with multiple groups

If you have concurrent modifications on permissions of a job configuration (e.g., workbook), they might be be incorrectly set in the database to be shared for both separate groups. This isn't allowed in cluster mode and the job can't run.

A solution to this problem is to run the following commands for your database:

To enable a constraint for a single group permission:

mysql -u dap -p dap < bin/create-single-group-permission-constraint.sql

The script can be executed without a shutdown of Datameer.
When trying to add more than one permission group to a file, a browser exception is given.
Only one group is allowed to be added for the file.

To disable the secured permission group check constraint:

mysql -u dap -p dap < bin/remove-single-group-permission-constraint.sql

The script can be executed without a shutdown of Datameer.
Use this script when it must again be possible to add more than one permission group to a file in the database.

Kerberos principal name rules

Depending on your naming conventions for Kerberos principal names you might need to override the 'hadoop.security.auth_to_local' property. In fact, you might have already overridden this on the cluster. Datameer needs the rules from this property in the custom properties section of the cluster configuration. The custom property section doesn't support property values across multiple lines, so the rules should be separated by a single space. As an example, the following can be useful when not all of the principals are from the default domain:

hadoop.security.auth_to_local=RULE:[1:$1](.*) RULE:[2:$1](.*) DEFAULT

You can find more information about the mapping Kerberos principals to user names in the following book:

Hadoop Security, 1st Ed. by Ben Spivey and Joey Echeverria, Ch. 5: Identiy and Authentication - Mapping Kerberos Principals to Usernames, p. 68 ff.

If you need to see how your AD/LDAP user names are submitted to the cluster after the rules are applied when secure impersonation is implemented you can add additional logging.

Expected impersonation behaviors

Refer to the following table to understand how secure impersonation affects the ownership of import jobs, file uploads, data links, workbooks, and export jobs. Note that the group permissions apply to the artifact, not the folders the artifacts are in.

Scenario	Owner in HDFS	Group in HDFS	Permissions for Owner in HDFS	Permissions for Group in HDFS	Owner of YARN application (when job is triggered manually)	Owner of YARN application (when job is triggered by schedule)	Preview data accessed
Creating an artifact	Creator	Group selected, if none selected, the default Datameer group	Read and write	Only read	n/a	n/a	n/a
Running a job	Creator	n/a	Read and write	Only read	Creator	Creator	Logged in user
Previewing data	Creator	Group selected, if none selected, the default Datameer group	Read and write	Only read	Creator	Creator	Logged in user
Saving edited artifact (not as creator)	Creator	Group selected, if none selected, the default Datameer group	Read and write	Only read	Creator	Creator	Logged in user
Updating permissions	Creator	Newly selected group	Read and write	Newly selected group and read permission only	Creator	Creator	Logged in user