Page tree
Skip to end of metadata
Go to start of metadata

When you import data, you import it into a connection which is a collection of data from different sources such as various types of files and databases. See Configuring a Connection to learn more. Set up the connection first, then you can set up import jobs to import the data you want to use. You can also edit, rename, create a copy, run, view the full data, view the details and information, or delete an existing import job.

Watch our data integration videos for more information about importing data.

See Types of Data Supported for information about the types of databases and files that you can import into Datameer.

See Define File Path Range to learn how to use a date range to limit which files get imported.

 

Table of Contents

About Import Jobs

You can import each type of data from a connection and then create a workbook that uses data from multiple sources, for example, a user list from a database and an Apache error log file.

See Types of Data Supported for information about the types of files that you can import into Datameer.

As of Datameer v5.11

Obfuscation is available with Datameer's Advanced Governance module.

To create an import job:

  1. Click the + drop down box in the top left corner and select Import Job or simply right click in the navigation bar and choose Create New > Import Job.
  2. Click Select Connections, choose the connection and click Select, then choose the file type and click Next. Click New Connection to add a new connection if needed.
  3. Specify the file and folder location and click Next. You can use wildcard characters. In the Encryption section, enter all columns to obfuscated with a space between the names. Note that when a column is obfuscated, that data is never pulled into Datameer. See the sections that follow for additional details about importing each of the file types.
    • Apache log: specify the file or folder and the log format. See the samples provided in the dialog box for details.
    • CSV/TSV files: specify the delimiter such as"\t" for tab, comma ",", or semicolon ";"; specify if any of the first lines in the data should be skipped and then if column headers should be made from the first non ignored row. You can also specify that the column headers are in a separate file by selecting to provide a custom schema, indicate where it is located and what the delimiter characters are. The first row is assumed to be column headers.
      In Advanced Settings, specify the escape character to "escape" processing that character and just show it, set the quote character, and if Enable strict quoting is checked, characters outside the quotes are ignored.
    • Fixed width: specify the file or folder and specify if any of the first lines in the data should be skipped and then if column headers should be made from the first non ignored row.
    • JSON: specify the file or folder and other parameters about what to parse within the JSON structure
    • Mbox: specify the file or folder. This is a format used for collections of electronic mail messages.
    • Regex Parsable Text files: specify the file or folder, a Regex pattern for processing the data (see Note below), and specify if any of the first lines in the data should be skipped and then if column headers should be made from the first non ignored row.
    • Twitter data: specify the file or folder
    • XML data: specify the file or folder, the root element, container element, and XPath expressions for the fields you would
  4. View a sample of the data set to confirm this is the data source you want to use, and use the checkboxes to select which fields to import into Datameer. (See image below.) Choose which data columns to import with the include checkbox.  The accept empty check box allows the user to specify if null and empty values will be used or dropped upon import. You can also specify the format for date fields. Click the Help link question mark to see a complete list of supported formats. You can specify the data type using the list box as shown.


    Then, specify how to handle empty fields and invalid data, and click Next.
     
  5. Define the schedule details. 

     Loading - Choose to "manually" rerun the import job in order to update or "on a schedule" to run the import job update at a specified time.  To learn more, watch our scheduling video.



    Data Retention Policy - Choose to replace new updated data or to append (join with existing data) when updating an import job.  If you append data with a sliding window, define when the window expires and how may results to keep.


    Adding or deleting columns

    As of Datameer v2.1.4

    The appended import jobs can contain additional or less columns than the before run import job. When the schema is rescanned Datameeer will notice the change in columns and no data will be lost.

    Data can be lost due to append changes:

    -If a column name has been renamed then the schema will use the column with the new name and delete data from the old column name.

    -If the data type of a column has been changed (example: the column was a integer and was relabelled as a string)

    -If a change in the partition schema is made the data will reset starting from the new change.



  6. Add a description, name the file, click the checkbox to start the import immediately if desired, and click Save. You can also specify notification emails to be sent for error messages received and when a job has successfully run.

    The maximum length for a table description is 255 characters.




    Note: Sample Regex pattern for importing data: (\S+) (\S+) (\S+) (\S+) (\S+) See Importing with Regular Expressions to learn more.

Type Conversions

  • Integer columns can be imported as date by interpreting the integer value as  UNIX timestamp or epoch timestamp.
  • Date columns can be converted as integer, which will then shown as an epoch timestamp.
  • Strings can be converted to Boolean, where "false", "no", "f", "n" and "0" will be converted to false and "true", "yes", "t", "y" and "1" will be converted to true.

Raw Records

Click Raw Records to view an expanded sample of the raw data not in tabular format. Click it again to hide the raw data.

Merge and Compact Data Objects

When choosing the data retention policy of data being imported to Datameer, an option is to append the data is given.  Appending the data for a data object will join the new data with the old data under a new Job ID. 

It is possible to merge multiple data objects together:

  1. From the Datameer browser, select the job that has data objects to be merged.
  2. Click on the Details button in the tool bar or right click the job and select Show Details.
  3. The data objects are displayed under Current Data.
  4. Click the Merge and Compact Data button.

available as of Datameer v2.1.2

How to view and edit the job settings

Some of these settings can also be accessed through the Save Workbook settings. The Save Workbook settings lets you specify when jobs are run, how error handling should be done and specify who gets notified, and lets you specify what data is saved with the workbook and how much historical data (if any) is saved. See Configuring Workbook Settings to learn details about each of the settings.

To view and edit the job settings through the Import Data view

  1. Click the Browser tab at the top of the page.
  2. Click on ImportJobs in the navigation window on the left side of the screen.
  3. Highlight the name of the datasource you want to view.
  4. Click the Edit button.
  5. Click the Next button to view each type of job setting. You can also make changes.
  6. The Schedule tab has settings that can also be set through Save Workbook settings.
  7. Specify whether to replace or append data and whether to append using a sliding time window. You can then specify when the data should expire and how many results you should keep.
  8. Select which groups have read, write, and run access permissions and specify what access permissions other users have.
  9. Click the Save button on the last tab to save your changes. You can also click the Rename button to rename the import.

To view and edit the job settings through the workbook view

  1. Click the Finder tab at the top of the page.
  2. Highlight the name of the workbook you want to view.
  3. Click the Edit button.

Viewing dropped records

If the import job runs with dropped records, the icon on the page listing all import jobs displays an icon "Completed with warnings" You can easily find out what caused the problem.

To view dropped records:

  1. Right click the import job name and select Show Details to view the summary page.
  2. Click the recent listing ID in the History list.
  3. In the Job Run details page, view the Errors list to find out what happened.

 

The job details page showing Job History.


The details page showing the list of errors.

 

The error log shown when you click an entry in the list of errors.



The job run details page showing statistics and the job logfile.



Use the links in the Job Logfile section to download the logfile, download the job trace, or to report an issue to Datameer. When you click the Report an issue link, fill out the bug report and provide steps to recreate the error.

How to schedule data

You have a great deal of flexibility in choosing when jobs are run.  You can choose to run them manually, when data changes, or at a interval you specify. See Configuring Workbook Settings for information on the schedule details.

  1. Click the Browser tab at the top of the page.
  2. Click workbooks from the navigation bar on the left side of the screen.
  3. Highlight the name of the workbook you want to view.
  4. Click the Edit button.
  5. Under the Calculation settings, choose one of the three choices: Manually, When new data comes in, or Scheduled.
  6. If Scheduled is chosen, specify the time settings.
  7. Click the Save button to save your changes.

Edit an import job

To edit an import job

  1. Click the Browser tab at the top of the page. 
  2. Click ImportJobs from the navigation bar on the left side of the screen.
  3. Highlight the import job you want to edit and click the Edit button.
  4. Make your changes and click the Next button to move through the screens.
  5. Click Save when you are finished.

Create a copy of an import job

To create a copy of an existing import job

  1. Click the Browser tab at the top of the page. 
  2. Click ImportJobs from the navigation bar on the left side of the screen.
  3. Highlight the import job you want to copy.
  4. Click the Duplicate button.

The copy is created and is named "copy of " and the name of the original import job.

Run an import job

To run an import job

  1. Click the Browser tab at the top of the page. 
  2. Click ImportJobs from the navigation bar on the left side of the screen.
  3. Highlight the import job you want to run and click the Run button.

Depending on the amount of data, this may take awhile.

Delete an import job

Note that this deletes the import job, not the original data.

To delete an import job

  1. Click the Browser tab at the top of the page. 
  2. Click ImportJobs from the navigation bar on the left side of the screen.
  3. Highlight the import job you want to delete and click the Delete button.
  4. Click OK then confirm the deletion.

Linking data to a new workbook

You can link data to a new workbook.

To link data to a new workbook:

  1. Click the Browser tab at the top of the page.
  2. Click ImportJobs from the navigation bar on the left side of the screen.
  3. Double click the import job name to link into a new workbook.
  4. Click the Link Data in New Workbook button.
  5. The data is loaded into a new workbook.

Viewing import job upload size and monthly upload sizes

You can view the count of processed bytes for each upload and their total volume counting towards the license term.

To view the processed bytes per single job execution and totals for that job configuration of the Import Jobs:

  1. Click the Browser tab at the top of the page.
  2. Click ImportJobs from the navigation bar on the left side of the screen.
  3. The size of last job run is displayed first and the total for that job configuration is displayed to the right in parentheses.

If a new license term starts and the Import Job is processed again, the count starts with a new total processed data amount.

 

Identify workbooks affected by an import job schema change

Datameer gives you a notice when editing an import job if a schema change will affect corresponding workbook.

To view which workbooks are affected by a schema change from the import data:

  1. Complete the import job configuration until reaching the Save section.
  2. Review the note box detailing the changes this configuration save will have on the previous save and which workbooks will be affected.
  3. Check the box to email users of the workbooks that will be affected from the new schema changes.

 

  • No labels