Page tree
Skip to end of metadata
Go to start of metadata

A data link lets you feed data into a workbook without using an import job. The difference between the two job types is when the full data set is loaded. A data link fetches only the preview data for the workboook view; when you run the workbook, the full data set is used. 

When you run an import job the full data set gets loaded. 

You can also edit, rename, create a copy, run, view the full data, view the details and information, or delete an existing data link.

You can create a data link from an existing connection or by specifying a new connection.

See Types of Data Supported for information about the types of files that you can link to in Datameer.

To create a data link

  1. Click the Browser tab at the top of the page. 
  2. Click the + plus box and select DataLink or right click in the navigation box on the left side and select create new > DataLink.
  3. Click Connection, choose the connection and click Select, then choose the file type and click Next. Click New to add a new connection if needed.
  4. Specify the file and folder location and click Next. You can use wildcard characters. See the sections that follow for additional details about importing each of the file types.
    • Apache log: specify the file or folder and the log format. See the samples provided in the dialog box for details.
    • CSV/TSV files: specify the delimiter such as"\t" for tab, comma ",", or semicolon ";", specify whether the first row contains the column headers and click Advanced Settings. In Advanced Settings, specify the escape character to "escape" processing that character and just show it, set the quote character, and if Enable strict quoting is checked, characters outside the quotes are ignored.
    • Fixed width: specify the file or folder and specify whether the first row contains the column headers
    • Mbox: specify the file or folder. This is a format used for collections of electronic mail messages.
    • Text files: specify the file or folder, a regex pattern for processing the data (see Note below), and specify whether the first row contains the column headers
    • Twitter data: specify the file or folder
  5. View a sample of the dataset to confirm this is the data source you want to use, and use the checkboxes to select which fields to link into Datameer. (See image below.)




     Then, specify how to handle empty fields and invalid data, and click Next.
  6. Define the schedule details and click Next. See Configuring Workbook Settings for information on the schedule details.
  7. Add a description, name the file, and click Save.

Note: Sample regex pattern for importing data: (\S+) (\S+) (\S+) (\S+) (\S+) See Importing with Regular Expressions to learn more.

To edit a data link

  1. Click the Browser tab at the top of the page. 
  2. Click the DataLink from the navigation box on the left side.
  3. Highlight the data link you want to edit and click the Edit button.
  4. Make your changes as desired and click Next to move through the screens.
  5. Click Save when you are finished.

To create partitions when linking data:

  1. Create a new datalink or choose to edit a current datalink.
  2. Go to the Data Details section.
  3. Enter the path for the files or folders and include the %pattern% to where the files are located.  (Example File Path: /Users/MattSmith/Desktop/Geo_coords/%pattern%/geodata.csv)

    The %pattern% specifies a folder structure and defines which files from the included folders should be included in the DataLink partition. This feature may not be used on direct filenames.

    Example:

    (tick) /Users/MattSmith/Desktop/Geo_coords/%pattern%/geodata.csv

    (error) /Users/MattSmith/Desktop/Geo_coords/%pattern%.csv)

  4. Scroll down to time based partitions and select the ON setting.
  5. In the Partition Pattern box enter a date format expression like 'yyyy/MM/dd' which will replaces the %pattern% placeholder in the file path.
  6. Click Next when you have finished and save the file.

When creating or opening the linked data in a workbook the partitioned data will be displayed.  You may then select the data you wish to analyze.

          

To copy a data link

  1. Click the Browser tab at the top of the page. 
  2. Click the DataLink from the navigation box on the left side.
  3. Highlight data link you want to copy and click the Duplicate button.

The copy is created and is named "copy of " and the name of the original data link.

To run a data link

  1. Click the Browser tab at the top of the page. 
  2. Click the DataLink from the navigation box on the left side.
  3. Highlight the data link you want to run and click the Run button.
  4. Depending on the volume of data, this may take awhile. 

Note that deleting a data link deletes the link in Datameer but does not delete the actual data.

To delete a data link

  1. Click the Browser tab at the top of the page. 
  2. Click the DataLink from the navigation box on the left side.
  3. Highlight the data link you want to delete from Datameer and click the Delete button.
  4. Click OK and then confirm the deletion.

Set permissions

 Only an administrator can set permissions.

To set permissions for a data link

  1. Click the Browser tab at the top of the page.
  2. Click the DataLink from the navigation box on the left side.
  3. Highlight the data link for which you want to set permissions and click the information button.
  4. Optionally add one or more groups to this link. Set read, write, and run permissions for each group.
  5. Set read, write, and run permissions for all others who are not part of a group.

Viewing import job upload size and monthly upload sizes

You can view the count of processed bytes for each upload and their total volume counting towards the license term.

To view the processed bytes per single job execution and totals for that job configuration of the Data Link:

  1. Click the Browser tab at the top of the page.
  2. Click DataLinks from the navigation bar on the left side of the screen.
  3. The size of last job run is displayed first and the total for that job configuration is displayed to the right in parentheses.

If a new license term starts and the Data Link is processed again, the count starts with a new total processed data amount.

 

  • No labels