Secure HDFS Tool Manual

Usage:

secure_hdfs_tool.sh [options] arguments
 -G (--hdfs-groups) [comma separated    : Comma separated list of applicable
 list of HDFS groups]                   : Datameer HDFS groups, when set extra
                                          validation is run to ensure only
                                          these groups are present in permission s
 -g (--core-group) VAL                  : Group for the core Datameer directorie
                                          s, this HDFS group should contain ALL
                                          Datameer users (only applies when
                                          -u,--update-core-directories is used)
 -s (--sync-hdfs)                       : Sync Datameer entity permissions down
                                          to HDFS
 -u (--update-core-directories)         : When set, create/update all core
                                          Datameer directories and set permissio
                                          ns according to Datameer user and
                                          optionally --group

Default Usage

By default, with no arguments, secure_hdfs_tool.sh validates that all Datameer entities conform to the requirements of secure impersonation. (i.e., they have exactly one group permission entry.)

secure_hdfs_tool.sh

Change to HDFS permissions in Hadoop 2.3 and higher

Changes have been made within the HDFS Permissions that causes Datameer's secure_hdfs_tool.sh to fail.

Follow this workaround to set up secure impersonation in Datameer when using a version of Hadoop 2.3 or higher.

<super user of HDFS> is the user who starts the Datameer application.

Users of the <user group> group are allowed to impersonate; as defined by the hadoop property hadoop.proxyuser.<super user name>.groups in core-site.xml.

The super user must now be added to the user group before the tool is run.

  1. Start the Datameer application and configure it to secure Hadoop.
  2. Stop the Datameer application.
  3. Add the <super user of HDFS> to <user group> 
    1. Execute the command: usermod -g <user group name> <super user name>.
  4. Run Datameer's secure HDFS tool.
    1. Execute the tool: bin/secure_hdfs_tool.sh -u -g <user group name> as <super user>.
  5. Start the Datameer application.

Datameer recommend removing the superuser from the user group and adding them back to their original group once the tool has run.

If the current cluster mode for Datameer isn't "Secure", then the tool aborts. You must have a properly configured connection to a 'Secure' cluster to use this tool. To achieve this, navigate to Administration > Hadoop Cluster and configure Secure mode.

Running the command with -G (--hdfs-groups) followed by a comma separated list of group names, adds extra validation, and considers a group entry invalid if it isn't a member of this list:

secure_hdfs_tool.sh -G foo,bar,baz,das_users
# same as above, but with long arguments flags
secure_hdfs_tool.sh --hdfs-groups foo,bar,baz,das_users

Invalid Entities

In all execution modes, the tool emits lines to STDOUT describing invalid Datameer entities which need to be fixed for secure impersonation to work properly. As you fix entities, for example when preparing for secure impersonation the first time, you can simply rerun the script to find out what is left to update. Redirecting STDOUT to a file after grepping for INVALID_ENTITY is a good way to build a work list when dealing with large numbers of entities.

secure_hdfs_tool.sh -G foo,bar,baz | grep INVALID_ENTITIY > datameer_invalid.txt

Updating Core Datameer Directories

Another use for the tool, is to reset/create the Datameer core HDFS directories with appropriate ownership and permissions. This is mostly done when enabling or re-enabling secure impersonation mode.

secure_hdfs_tool.sh -u -g das_users
# same as above but with long argument flags
secure_hdfs_tool.sh --update-core-directories --core-group das_users

Passing the optional -g (--core-group) argument changes group ownership of the core directories to match the argument's value and set permissions to 770. By default, with no -g, the group inherits from its parent and the core directories' permissions is 777. Datameer strongly recommends using a core HDFS group containing all Datameer users to control access to these directories.

Sticky Bits aren't supported by Datameer. To avoid access problems, don't use them for Datameer core directories.

Synchronizing HDFS Artifacts

The final major use for the tool is synchronizing HDFS artifacts with the Datameer entities represented in the database. There are several occasions where this might become necessary:

  • Initial setup of secure impersonation
  • Disabling and re-enabling of secure impersonation support
  • Recovering from internal corruption, i.e., software bug
  • Recovering from external corruption, i.e., external HDFS use changed owners or permissions

Synchronization can be combined with any of the above uses and is activated with the -s (--sync-hdfs) switch:

secure_hdfs_tool.sh -s
# same as above but with long argument flags
secure_hdfs_tool.sh --sync-hdfs

Examples

secure_hdfs_tool.sh -u -g das_users -G foo,bar,baz

Updates the core Datameer directory ownership with secure prinicpal as username and das_users and group, setting permissions to 770. Also runs Datameer entity validation ensuring that proper permissions exist, including guaranteeing single groups are in the set: foo,bar,baz

secure_hdfs_tool.sh --hdfs-groups foo,bar,baz

Validates Datameer entity group permissions, emitting any invalid groups. Will also check that groups referenced are in the set foo, bar, baz. This is an example of something you would run while modifying Datameer entities until there are no more errors.

secure_hdfs_tool.sh -s -G foo,bar,baz

Synchronizes HDFS artifact ownership and permissions with those stored in the Datameer database. Note that you want to continue to include the -G (--hdfs-groups) argument if it applies to you, this guarantees complete validation.