Behavior Change

Setting Access Permission Evaluation for Workbooks

Overview

Datameer 7.4 introduced the default property workbook.source.permission.validation to control the way permissions of data sources and external sheets are evaluated in a workbook.

Behavior change

workbook.source.permission.validation can be set to strict or loose

  • if the property is set to loose source permission, a check will not be performed
  • if the property is set to strict source permission, a check will be performed for the following:
    • access capabilities for the specific entity types
    • read permission or access to full data in the case of an underlying external source sheet

The strict/loose setting affects access of another group to a Workbook that references import jobs, data links or other workbooks that the group does NOT have access to. Strict means the workbook will be produce an error if a user from another group does not have access to one of the sources. Loose allows them to perform functions such as viewing data.

In addition to visibility of records in data source or external sheets, affected functionalities are Undo/Redo and the Workbook Overwrite API.

With Datameer v7.5, having only Full data permission on a workbook will:

  • make it visible in the File Browser (active file options are: browse full data, show dependencies, add to export job)
  • allow workbook to be exported
  • allow data to be browsed
  • allow data to be added to a new workbook in the form of an external sheet

In strict mode a group has to have either Read or Full Data Access to sources for the Workbook to not fail at runtime or for the Workbook to be visible to that group.

Versions affected

  • 7.4
  • 7.5

Ticket reference

  • DAP-38178
  • DAP-38188

REST API Improvements for Variables 

Overview

Datameer 7.4 introduced the ability to set variable values via API REST calls. In Datameer 7.5, the REST calls have been updated and the original REST calls no longer function. Using scripts with the variable REST calls from 7.4 now result in error. 

The new variable REST API documentation:

Behavior change

The REST call URLs have changed as well as the payload type. Previously, form variables were used and this has been changed to JSON.

Versions affected

  • 7.5

Ticket reference

  • DAP-38583

Sharing Datasets with View Full Results Selected

Overview

In order to fix a security hole in how datasets (import jobs, data links, or workbooks) were handled when being shared with various groups, we have introduced improvements on handling shared artifacts.

If the following situation occurs, current workflows might need to be adjusted:

  • A privileged user creates a datasets and doesn't share this dataset with other users.
  • The user then adds this dataset to a new workbook and shares the new workbook with a group of users.

If the above has happened, these users are unable to see non-kept sheets and cannot copy this workbook.

Best practices suggestions

  • Datasets shouldn't be directly shared with most users.
  • The datasets should be chained together with a transformation workbook where ETL, data-cleansing, and/or anonymization can take place. This workbook should have have View Full Results sharing for the intended groups.
  • Downstream workbooks using the transformed data will work as before

Versions affected

  • Datameer 7.2: 7.2.5
  • Datameer 7.1: 7.1.9.1, 7.1.10
  • Datameer 6.4: 6.4.11

Kept External Worksheets Are Now Copied Rather Referenced - Migrating ImportSheetData to WorkbookSheetData

Previous behavior

When a worksheet was added in a different workbook as a data source, Datameer only referenced that data from the original.

New behavior

To increase the clarity that data persists in HDFS, Datameer changed when and where data objects are saved. When using data from a sheet in another workbook or an import job, the data from that source sheet is copied to the new sheet if the new sheet is marked as kept. The source sheet is no longer referenced as this led to workbook data not being deleted correctly by housekeeping

Due to the possibility of multiple copies of a data object existing in HDFS, an increase in disk usage may be observed for heavily referenced data sources. 

Best practice suggestions

  1. An impact analysis should be performed before upgrading to Datameer 6.3 or higher determining the best way to mitigate this issue:
    1. Overview of referenced external workbooks
      SELECT 
      	data1.effective_date date, 
      	wsd.id workbook_sheet_data_id, 
      	data1.dap_job_configuration__id workbook_id, 
      	wsd.sheet_name, 
      	CONCAT(data1.uri,'/', wsd.sheet_name) sheet_uri, 
      	ewsd.selected_partitions, 
      	data2.dap_job_configuration__id referenced_workbook_id, 
      	ewsd.referenced_workbook_sheet_data_id, 
      	CONCAT(data2.uri,'/', rwsd.sheet_name) referenced_sheet_uri 
      FROM 
      	workbook_sheet_data wsd, 
      	workbook_sheet_external_data ewsd, 
      	workbook_sheet_data rwsd, 
      	data data1, 
      	data data2 
      WHERE 
      	wsd.id = ewsd.workbook_sheet_data_id 
      	AND ewsd.referenced_workbook_sheet_data_id = rwsd.id 
      	AND data1.id = wsd.workbook_data_fk 
      	AND data2.id = rwsd.workbook_data_fk 
      	AND wsd.kept is true;
    2. Overview of referenced data sources
      SELECT
      	data1.effective_date date,
      	wsd.id workbook_sheet_data_id,
      	data1.dap_job_configuration__id workbook_id,
      	wsd.sheet_name,
      	CONCAT(data1.uri,'/', wsd.sheet_name) sheet_uri,
      	iwsd.selected_partitions,
      	data2.dap_job_configuration__id referenced_data_source_id,
      	CONCAT(data2.uri,'/import') referenced_data_source_data_uri
      FROM
      	workbook_sheet_data wsd,
      	workbook_sheet_import_data iwsd,
      	data data1,
      	data data2,
      	workbook_sheet_import_data_data_source_data mapping
      WHERE
      	wsd.id = iwsd.workbook_sheet_data_id
      	AND data1.id = wsd.workbook_data_fk
      	AND wsd.kept is true
      	AND mapping.workbook_sheet_import_data_workbook_sheet_data_id = iwsd.workbook_sheet_data_id
      	AND data2.id = mapping.data_source_datas_id;
  2. If the number of data objects is small, this behavior won't cause undue strain on the system. No additional action is needed.
  3. If the number of data objects is high, plan for the increase in consumption of space in HDFS and work with your services representative about off-loading data to other systems.

Versions affected

  • 6.3+

Ticket Reference

  • DAP-29391

"Save & Migrate" for Partitioned Import Jobs Only Starts the Migration Job

Overview

When migrating a partition (re-configuring a partitions granularity), Datameer would sometimes run into an infinite loop if the checkbox Import Now was also checked. To prevent this happening, whenever an import job's partitions are migrated the checkbox is ignored. 

Previous behavior

After reconfiguring a partitioned import job used as a data source in a workbook, clicking Save & Migrate would open an infinite running job.

New behavior

After reconfiguring an import job's partitioning and selecting Save & Migrate, only the migration job is triggered. If this action is selected, Datameer now ignores the flag import now so that only the migration job is triggered.

Versions affected

  • 7.1.10
  • 7.2.5
  • 7.4.0

Ticket reference

  • DAP-38040

Users' Capabilities are Properly Respected by the REST API

Overview

In a few cases, users working with the REST API were allowed to perform actions that were unavailable to them through the UI, as not all capability checks were conducted when using the REST API. The following capability checks are now fixed:

  • Only users with proper capabilities can change the ownership of Datameer artifacts
  • Only users with proper capabilities can download data from a Workbook

Previous behavior

Datameer had a security concern regarding access via the REST API. Before, users with the role of Analyst were able to change the ownership of files within Datameer using the REST API without the correct permission settings.

New behavior

The security concern has been corrected. Because the role of Analyst doesn't have the setting "User can modify every file and folder" enabled, users operating under these permissions are unable to change the ownership of an artifact via the REST API. 

If a user doesn't have the correct permissions to change the owner of an artifact, they now receive the following error message:

File doesn’t exist or isn’t accessible with current permissions

Best Practices

In the interest of security, it is critical to create user roles with the appropriate capabilities. Datameer ships with two default user roles (ADMIN and ANALYST). These can be used as examples, to create user roles that match your organization.

Versions affected

  • 6.4.9 (Ownership change)
  • 7.1.5, 7.1.7 
  • 7.2.1, 7.2.3

Ticket reference

  • DAP-32715
  • DAP-32729

The REST API Now Properly Respects the Role Permissions for Downloading Workbook Data

Previous behavior

Datameer had a security concern regarding access via the REST API. Even when the permission setting "Download - allows to download workbook results data" wasn't set in the User Role settings, registered users could still use the REST API to download data from a workbook.

New behavior

The security concern has been corrected. Now, users in roles without the setting "Download" under Workbooks aren't able to download workbook data using the REST API. 

Best Practices

Datameer strongly recommends to ensure that only necessary users have the ability to download workbook data. If single users need the ability to download workbook data, best practices is to create a new role for that user

To allow a user role to download workbook data, go to the user role settings and enable "Download" under Workbooks.

Versions affected

  • 6.4.9 
  • 7.1.5
  • 7.2.1 
  • 7.4.0

Ticket reference

  • DAP-32729

CSRF Token Lifetime is Tied to Session Length

Overview

Datameer found an error in our CSRF token handling. This has been fixed and CSRF token lifetime is now tied to the session length. When a user reloads a page or affects a change in the UI, both the session length and the CSRF token are renewed. The default session length and CSRF token lifetime are ten minutes.

Previous behavior

Datameer's CSRF tokens weren't timing out correctly intended which created a security concern. 

New behavior

Datameer now creates functional CSRF tokens that are valid per HTTP session. Both the CSRF token and Datameer session have a timeout after 10 minutes of inactivity. When a timeout occurs, a new token is created with a successful login.  

No change when using the REST API as the CSRF tokens aren't required when an HTTP session hasn't been established.

Best practice suggestions

Administrators should set an appropriate session length for their organization. This can be based on user feedback with regards to security. Session length can be set in the web.xml file, found under <datameerInstallLocation/webapps/conductor/WEB-INF/web.xml. This requires a restart of the DM server.

Versions affected

  • 7.1.8
  • 7.2.2
  • 7.4.0

Ticket reference

  • DAP-36910
  • DAP-22902

The Maximum Number of Errors to Log Is Now Required for Import Jobs

Overview

Import jobs now require a value in the field Max # of errors to log in order to successfully perform features including copy/paste, duplication, and backup/restore. 

Previous behavior

A value in the field Max # of errors to log was not required for import jobs.

New behavior

A value in the field Max # of errors to log is now required for import jobs.

Best practices suggestions

If you have any scripts in use to create or update import jobs, ensure they are updated to include the Max # of errors to log field.

Versions affected

  • 7.2.8
  • 7.4.2

Ticket reference

  • DAP-34795