By default, with no arguments,
secure_hdfs_tool.sh validates that all Datameer entities conform to the requirements of secure impersonation. (i.e., they have exactly one group permission entry.)
Change to HDFS permissions in Hadoop 2.3 and higher
Changes have been made within the HDFS Permissions that causes Datameer's secure_hdfs_tool.sh to fail.
Follow this workaround to set up secure impersonation in Datameer when using a version of Hadoop 2.3 or higher.
The super user must now be added to the user group before the tool is run.
- Start the Datameer application and configure it to secure Hadoop.
- Stop the Datameer application.
- Add the
<super user of HDFS>to
- Execute the command: usermod -g <user group name> <super user name>.
- Run Datameer's secure HDFS tool.
- Execute the tool: bin/secure_hdfs_tool.sh -u -g <user group name> as <super user>.
- Start the Datameer application.
Datameer recommend removing the superuser from the user group and adding them back to their original group once the tool has run.
If the current cluster mode for Datameer isn't "Secure", then the tool aborts. You must have a properly configured connection to a 'Secure' cluster to use this tool. To achieve this, navigate to Administration > Hadoop Cluster and configure Secure mode.
Running the command with
-G (--hdfs-groups) followed by a comma separated list of group names, adds extra validation, and considers a group entry invalid if it isn't a member of this list:
In all execution modes, the tool emits lines to STDOUT describing invalid Datameer entities which need to be fixed for secure impersonation to work properly. As you fix entities, for example when preparing for secure impersonation the first time, you can simply rerun the script to find out what is left to update. Redirecting STDOUT to a file after grepping for INVALID_ENTITY is a good way to build a work list when dealing with large numbers of entities.
Updating Core Datameer Directories
Another use for the tool, is to reset/create the Datameer core HDFS directories with appropriate ownership and permissions. This is mostly done when enabling or re-enabling secure impersonation mode.
Passing the optional
-g (--core-group) argument changes group ownership of the core directories to match the argument's value and set permissions to 770. By default, with no -g, the group inherits from its parent and the core directories' permissions is 777. Datameer strongly recommends using a core HDFS group containing all Datameer users to control access to these directories.
Sticky Bits aren't supported by Datameer. To avoid access problems, don't use them for Datameer core directories.
Synchronizing HDFS Artifacts
The final major use for the tool is synchronizing HDFS artifacts with the Datameer entities represented in the database. There are several occasions where this might become necessary:
- Initial setup of secure impersonation
- Disabling and re-enabling of secure impersonation support
- Recovering from internal corruption, i.e., software bug
- Recovering from external corruption, i.e., external HDFS use changed owners or permissions
Synchronization can be combined with any of the above uses and is activated with the
-s (--sync-hdfs) switch:
Updates the core Datameer directory ownership with secure prinicpal as username and das_users and group, setting permissions to 770. Also runs Datameer entity validation ensuring that proper permissions exist, including guaranteeing single groups are in the set: foo,bar,baz
Validates Datameer entity group permissions, emitting any invalid groups. Will also check that groups referenced are in the set foo, bar, baz. This is an example of something you would run while modifying Datameer entities until there are no more errors.
Synchronizes HDFS artifact ownership and permissions with those stored in the Datameer database. Note that you want to continue to include the
-G (--hdfs-groups) argument if it applies to you, this guarantees complete validation.