Page tree
Skip to end of metadata
Go to start of metadata

When importing data via an import job or a file upload, you can partition your data using date parameters. When this data is loaded into a workbook, you can choose to run your calculations on all or on just a part of your data. Also if you decide to export data, you can choose to export all or just a part of your data.

Please note that if you are using Microsoft Internet Explorer 7 or 8, these do not support Scalable Vector Graphics (SVG). When using partitions in a workbook or when exporting a partition, the simple partition filter will not be available - only the Time Window and Advanced Partition Filter. Please see Supported Browsers for alternate web browsers.

 

To use partitions you must first configure your data within an import job or a file upload.

Simple Partition Filter

On the define fields step when importing your data, you can choose to partition your data. Currently, Datameer only supports time-based partitions.

Here you can choose a column that contains a date. You can also choose which scale to use. If you want to concentrate on only periods of years, months, weeks, etc.

Advanced Partition Filter

With the advanced tab, you can enter a formula to represent the partitions. This especially useful when you do not have one column containing dates, rather the date is split into more than one column.

ASDATE(CONCAT(#Year;"-";#Month;"-";#Day;" ";#Hour;":";#Minute;":";#Second);"yyyy-MM-dd HH:mm:ss")

To create partitions when linking data:

  1. Create a new datalink or choose to edit a current datalink.
  2. Go to the Data Details section.
  3. Enter the path for the files or folders and include the %pattern% to where the files are located.  (Example File Path: /Users/MattSmith/Desktop/Geo_coords/%pattern%/geodata.csv)

    The %pattern% specifies a folder structure and defines which files from the included folders should be included in the DataLink partition. This feature may not be used on direct filenames.

    Example:

    (tick) /Users/MattSmith/Desktop/Geo_coords/%pattern%/geodata.csv

    (error) /Users/MattSmith/Desktop/Geo_coords/%pattern%.csv)

  4. Scroll down to time based partitions and select the ON setting.
  5. In the Partition Pattern box enter a date format expression like 'yyyy/MM/dd/HH/mm/SS' which will replaces the %pattern% placeholder in the file path.

    Keep the following points in mind when determining granularity:

      • The selected granularity of a partition can affect the performance of downstream workbooks.
      • If the defined partitions contain few records, job performance might be slow in downstream workbooks.
      • Using a granularity of minutes or seconds could cause the simple partition filter to be unreadable. Datameer recommends only using these partition granularities with the Time Window and Advanced Partition Filter.
      • Using a granularity of minutes or seconds isn't available for import jobs.
  6. Click Next when you have finished and save the file.

 

Using Partitioned Data in Workbooks

After partitioning your data during an import or file upload, open this data in a workbook. (See Working with Workbooks for more details.)

Static and Dynamic Partitions

Static - The partition parameters have a set value(s) that will not change.

Dynamic - The partition parameters have the ability to change the partition values.
 

The Simple and Time Window partition filters offer static parameters. The Advanced filter offers the ability to create dynamic parameters.

Simple Partition Filter

When you first load your partitioned data into a workbook, you will see a sunburst graph representation of the partitions in your data. You must first select which partition or partitions to use before you can work with your data.

This sunburst graph shows the different periods of time in your data. In this example the outside ring represents days, the next ring represents months and so on. You can choose a ring by clicking. By double clicking you can zoom in. You can also select a range of partitions by holding down 'shift' and clicking. Select all the partitions by clicking the Select All button in the bottom right corner.  

 

Time Window Partition Filter

Another way to select which partition to use is with a calendar.

Simply choose the start and end date for the time period desired.

Advanced Partition Filter

There is also an advanced partition filter. Here just enter a formula to signify which partition should be returned.

Here is an example of a formula returning the partitions representing the past seven days along with the partition for today:

 $partition > TODAY()-7d && $partition <= TODAY()

Take note, this will not respect daylight saving changes as a day can possibly have 23 or 25 hours.

Here is an example of a formula returning the partitions representing the past seven days along with the partition for today which respect daylight saving time:

 $partition > ADDTODATE(TODAY();"-7d") && $partition <= TODAY()

And here is an example of a formula returning the partitions representing yesterday which respect daylight saving time:

 $partition == ADDTODATE(TODAY();"-1d")

To catch the last seven days of the previous month in a workbook using a formula that respects daylight saving time.

$partition < ENDOFMONTH(STARTOFMONTH(ADDTODATE(TODAY();"-1M"))) && $partition >= ENDOFMONTH(STARTOFMONTH(ADDTODATE(TODAY();"-1M")))-7d

Understanding the $partition variable for different partition resolutions

When a partition is created in Datameer for an Import Job or Data Link, there are four different partition resolutions that may be configured: YEAR, MONTH, DAY or HOUR. The $partition variable is <date type> object in Datameer. This variable is set to the start of the partition window. Depending on the resolution of the partitions, the $partition variable takes on different date field type values. 

For example, suppose that Apache log data is being imported from July 4, 2014 22:00:00 until July 4, 2014 22:59:59. The following table describes how an Import Job would partition data for each possible resolution. Additionally, it describes what the corresponding $partition variable would be for this data set:

Partition ResolutionPartition Range (Inclusive)$partition Variable Value
YEARJanuary 1, 2014 00:00:00 - December 31, 2014 23:59:59January 1, 2014 00:00:00
MONTHJuly 1, 2014 00:00:00 - July 31, 2014 23:59:59July 1, 2014 00:00:00
DAYJuly 4, 2014 00:00:00 - July 4, 2014 23:59:59July 4, 2014 00:00:00
HOURJuly 4, 2014 22:00:00 - July 4, 2014 22:59:59July 4, 2014 22:00:00

 

The $lastpartition variable

Another variable is available for use when defining an Advanced Filter for partitions in a Workbook. This variable, $latestpartition, represents the most recent partition that contains data. 

$partition == $latestpartition

 

Exporting Partitioned Data

It is also possible to export only the desired partition or partitions during export.

Simple Partition Filter

During export you can choose to export only certain partitions.

This sunburst graph shows the different periods of time in your data. In this example the outside ring represents days, the next ring represents months and so on, until you reach the center which represents all of the partitions. You can choose a ring by clicking. By double clicking you can zoom in. You can also select a range of partitions by holding down 'shift' and clicking.

Time Window Partition Filter

It is also possible to use a calendar to partition your data.

Simply choose the startn and end date to partition your data.

Advanced Partition Filter

There is also an advanced partition filter. Here just enter a formula to signify which partition should be exported.

Below is an example formula showing how to export data from 15 Sep 2010 between 5:00 AM and 8:00 AM.

$partition > ASDATE (2010-09-15 05:00;yyyy-MM-dd hh:mm) && $partition <= ASDATE (2010-09-15 08:00;yyyy-MM-dd hh:mm)

Repartitioning

After a partition has been set up it may become necessary to change the partition's resolution. It is possible to change the resolution for import jobs and file uploads. Changing the resolution requires a few simple steps.

  1. Highlight and right click the partitioned import job or file upload and click Edit.
  2. Click next until you reach define fields.
  3. Under time-based partitions you can then change the resolution, like you would upon initial import.
  4. After choosing the desired resolution, click next until you reach the final screen and save the repartitioning under the same name or under a new name.

The repartitioning process runs automatically and as long as the resolution has become finer, all workbooks are updated automatically, also. If the resolution has become less fine, then you will be prompted to choose partitions again when opening an existing workbook.

If due to connectivity problems, a running repartitioning is aborted, just process the import job or file upload again. As long as the job configuration has been saved, then you just need to re-start the repartitioning job.

  • No labels