Page tree
Skip to end of metadata
Go to start of metadata

You can create export jobs to export data from Datameer to other connections such as a database, remote file server, or export results to a third-party BI (business intelligence) software package. Exported data are the results from executing analyses in a workbook on the original data set. You can initiate a one-time manual export, or you can configure the job to run each time the workbook is updated or at a specific time interval.

Introduction

Before you can export data, you must create a workbook. Once the workbook has executed you can then export data from one of the saved sheets in that workbook.

Exporting data is only available in Datameer's Enterprise product.

File Formats Available

Choose to export a worksheet to one of the following file formats:

  • Apache AVRO
  • CSV
  • Parquet
  • Tableau (TDSX)

 

Exporting Data

Only the data from a saved workbook sheet are available for export.

To export data:

  1. Choose Export Data from the + drop down menu located in the top left corner of the screen or right click in the navigation bar are the left side of the screen and select Create New > Export Data.
  2. Select a workbook by clicking Select then click Next.
  3. Choose which sheet to export, then click Next. At the bottom of the page you see a preview of the a sheet's data.
  4. Choose which connection the data should be exported to, either an existing connection or create a new one. Choose to export the data as a CSV file, an AVRO file, a Parquet file, or Tableau (TDSX) file. Then click Next

  5. Enter the information on the data details tab which varies depending on the type of file you are creating. Click Next.
    1. Under File settings, specify the file name and file path as well as whether existing data should be replaced.
      Under Advanced Settings, specify the maximum file size and how consecutive files should be numbered. All export jobs will add a task number to the end of the filenames (before the extension) generated by the export job.
    2. If the file is being exported as a CSV there are additional settings. Indicate whether column headers should be contained in the first row, and how delimiter, quote, and escape characters should be defined.

      If you are exporting a float value, an additional trailing zero (.0) is added following the original value. Leading zeros (0.) are removed if the integer part of the float value is larger than zero.

    3. If you export to a database such as MySQL, you need to select a table and c hoose either Use Existing Table to overwrite the database table or Create new if it doesn't exist to add the table. If you use an existing table with lots of previous data, the overall export might be slow as it first needs to delete the old data. However, if you create a new table, delete statements aren't used and performance might be improved. Note: There is a hard limit of 1,000 entries.

      Exporting to a database

      When exporting to a database, the database connector will be associated with a user name. This database user must have write and/or create permissions on the database itself in order to complete the export job.

      Please note that when exporting data to DB2 or to Oracle connection, there are limitations on the number of characters contained in string data field.

      • for DB2 the number of characters is limited to 2000.
      • for Oracle the number of characters is limited to 4000.

      Attempting to export data fields containing a larger number of characters will result in dropped records. 

       

      Record Replacement Policy

      If the "Use existing" option is selected:

      Datameer does not create a primary key column with name "ID". (i.e., The workbook must be aware of the ID column. All columns from the destination table (can not be null) must be mapped to a column of the workbook.)

      Append Mode

      • The records append to the existing table.

      Overwrite Mode

      • All records from the existing table are deleted.
      • New records are written to the destination table.

      Tips:

      • Always select “Use existing” for tables created outside of Datameer (e.g., Netezza)
      • Never select “Use existing” for tables created via Datameer.

       

      If the "Create new if not exist" option is selected:

      Datameer creates an additional column with name "ID" (primary key). When the database supports auto increment, the database computes the value of this column.

      Example: Netezza does not support auto increment. In this case Datameer uses a sequence database object to generate the ID. Specific for Netezza: the name of the sequence used is <tablename>_seq.

      Append Mode

      • The table is created if it does not already exists, otherwise the existing table is re-used without changing any schema.
      • The records are appended to the existing table.

      Overwrite Mode

      • Any previous tmp table is first dropped.
      • A tmp table is created.
      • The records are written to the tmp table.
      • The entire destination table is dropped if one exists.
      • The tmp table is renamed to the destination table.
      • The tmp table is dropped.

      Tip:

      • Always select “Create new if not exist” for tables created via Datameer.
      • Never select “Create new if not exist” for tables created outside of Datameer (e.g., Netezza)

      Common problem:

      • Export a workbook into a table via "Create new if not exists".
      • A new table is created with an additional primary key column with name "ID".
      • Export a second workbook with "Use existing" into the same table.
      • This fails because the ID will not be filled. (See description above)

       

      Advanced Settings

      • Setting the number of concurrent data base connections gives control of database parallelism. The Export Job will not launch more than the set number of parallel tasks. Each task opens one database connection.
      • Setting the maximum records per transaction will export approximately this number of records within a single database transaction.
      • Setting the rows per batch will establish a limit for the number of rows inserted for each transaction within a task. Tuning this value can influence speed and avoid potential timeouts. Very large batches can overwhelm some databases and limiting the number of rows returned at a time can reduce strain.

       

  6. Specify which fields to include. Clear any fields that should be excluded. Specify whether empty fields can be accepted. 

    Optionally, if there is a date field included in the data, you can change the parse pattern format to be used during export. The default date parse pattern for an export job is [yyyy-MM-dd HH:mm:ss].

  7. Specify how to handle invalid records, and click Next.

    Exporting to Hive

     Big integers in Datameer are treated differently than in Hive and use a larger range of values, so they are written as strings into a Hive table on export.

  8. Determine the schedule for exporting the data. Choose Manually for a non-recurrent export, or you can choose either After the workbook is calculated or On a Schedule. If you choose On a Schedule, indicate the time settings or use a custom cron pattern. Under Advanced Settings, you can enter custom properties as key/value pairs. Then click Next.
  9. Enter a description if desired, click Save As, give the file a name, then click Save. You can also enter an email address to receive any error messages.

Exporting to Teradata

If you are exporting to Teradata, a row-length limit is automatically set to 64 kilobyes by Teradata. Additionally, by default for every string column is set to a limit of 8019 bytes, so exports with more than 8 string columns will reach this limit and show an error message. To fix this, manually create the table in Teradata and then set Datameer to append instead of replace the table.

 

Editing Export Job Settings

Only the data from a saved workbook sheet are available for export.

To edit export job settings:

  1. Click on Export Data in the navigation bar on the left side of the screen.
  2. Highlight by clicking on the Export Job you wish to edit.
  3. Right click on the Export Job and select edit or simply click on the edit button in the toolbar at the top of the screen.
  4. Make the necessary changes (see exporting data for more information.) Use the Next button to navigate through the individual configuration screens.
  5. Click Save after you have finished or Save Copy As if you want to save the changed settings under a new name.

Duplicate Export Jobs

To create a copy of an existing export job:

  1. Click on Export Data in the navigation bar on the left side of the screen.
  2. Highlight by clicking on the Export Job you wish to duplicate.
  3. Right click on the Export Job and select duplicate or simply click on the duplicate button in the toolbar at the top of the screen.
  4. Click the create duplicate button.

A duplicate is created and is named "copy of ..." and the name of the original export job.

Running an Export Job

To run an export job:

  1. Click on Export Data in the navigation bar on the left side of the screen.
  2. Highlight by clicking on the Export Job you wish to run.
  3. Right click on the Export Job and select run or simply click on the run button in the toolbar at the top of the screen.

Depending on the amount of data, this process might take awhile. 

Export jobs run as a single task if the source sheet is sorted. The sorted sheet can be generated in parallel in the workbook, but the export job forces the job to run as a single task. To get around this, create a filter on the primary export sheet restricting it to a specific range and select the option to create this filter in a new sheet. Do this multiple times until all of the data is filtered out into separate sheets. Once done, run export jobs against these newly created sheets.

 

Deleting an Export Job

Only the export job is deleted, not the original data.

To delete an export job:

  1. Click on Export Data in the navigation bar on the left side of the screen.
  2. Highlight by clicking on the Export Job you wish to delete.
  3. Right click on the Export Job and select delete or simply click on the delete button in the toolbar at the top of the screen.
  4. Click the Delete button and confirm that you would like to perform this action.

Editing Export Job Permissions

To edit an export job's permissions:

  1. Click on Export Data in the navigation window on the left side of the screen.
  2. Click to highlight the Export Job for which you wish to edit permissions.
  3. Right click the Export Job and select Information or simply click the Information button on the toolbar.
  4. The permissions setting is listed at the bottom of the Sharing section of the information page.
  5. To add group permissions, click Add Group and select each group you want to add. Then set the read, write, and run (execute) permissions for that group. To delete a group, click the trash can icon next to that group.
  6. Set the results sharing permissions for each group and all others.

 

 

  • No labels