Page tree

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If desired, you can use the Split tab in the Inspector to enter new split criteria. Click Update Sheet to apply changes.

...

Encoding Columns

Info
titleINFO

Column encoding performs ordinal, one-hot or binned

...

encoding on column data

...

which assigns a unique numeric value to each categorical or

...

continuous value. Once applied, column encoding can be

...

updated as needed until the desired results are achieved.

...

Columns with a high cardinality are not suited for ordinal/ binned encoding.


Tip

Column encoding provides a consistent view of prepared data

...

which is especially helpful for teams working together on model building and testing activities.

...

Ordinal Encoding

For ordinal encoding:

  1. Right-click on the a column header of the column you want to encode and select "Encoding" or click on the "Encode Column from the context menu, use the menu option Edit > Encode Column, or click on the Encode Column icon in the toolbar.  
    Image Removed 
  2. In the Encode Columns dialog, the Input column will be selected automatically. 
  3. Select the Type of encoding, either Ordinal Encoding or 1-Hot Encoding. For dates and integers, Binned Encoding is also an option, where values are grouped and elements in a group are encoded the same way. 
    Image Removed
  4. Unknown Values controls how values beyond the 100 most frequent values are processed. Drop value ignores " icon from the toolbar. The 'Encode Column' dialog is displayed on the right. 
    Image Added 
    or Image Added
  5. If needed, change the column by entering the required column name in 'Column'. 
    Image Added 
  6. Select the encoding type "Ordinal Encoding" from the drop-down. Further selection options adapt to the needs.
    Image Added 
  7. Decide how to deal with unknown values by clicking the required statement. 
    INFO: 'Drop Value' ignores values beyond the first 100 most frequent. 'Default Value' shows values, which can not be encoded. 
    Image Added 
  8. View the top 32 values (by count).
    Image Added 
  9. If needed, add a new value in the blank field, change the order of the top values or delete single values.
    Image Added 
  10. Confirm with "Encode"The encoding result is displayed in a new encoding sheet within the workbook. Ordinal Encoding is finished. 
    Image Added 

One-hot Encoding

For one-hot encoding:

  1. Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right. 
    Image Added 
    or Image Added
  2. If needed, change the column by entering the required column name in 'Column'.
    Image Added 
  3. Select the encoding type "1-Hot Encoding" from the drop-down. Further selection options adapt to the needs.
    Image Added 
  4. Decide how to deal with unknown values by clicking the required statement. 
    INFO: 'Drop Value' ignores values beyond the first 100 most frequent.  'Include as at last column adds Output Encoding determines how many columns the data will be divided into. For 1-hot encoding, you have the option to output As Columns, which spilts each binary pair into its own column, or As List, which ' adds a new element to the list that encodes together all values beyond the 100 most frequent. 
  5. The bottom of the dialog displays the top 15 values, and also provides a blank field where you can add new values. You can click on value in the list to change its order or remove it, or type in the blank field to add a new value.
    Image Removed 
  6. Image Added
  7. Select the output format from the dropdown 'Output'. 
    INFO: 'As List' keeps all binary pairs together in a single column. 'As Column' creates binary pairs each in their own column.
    Image Added
  8. View the top 32 values (by count).
    Image Added 
  9. If needed, add a new value in the blank field, change the order of the top values or delete single values.
    Image Added 
  10. Confirm with "Encode"The encoding result is displayed in a new encoding sheet within the workbook. One-hot encoding is finished.
    Image Added 

Anchor
DM_WB_Encoding_Binned
DM_WB_Encoding_Binned
Binned Encoding

For binned encoding:

  1. Right-click on a column header and select "Encoding" or click on the "Encode Column" icon from the toolbar. The 'Encode Column' dialog is displayed on the right. 
    Image Added 
    or Image Added
  2. If needed, change the column by entering the required column name in 'Column'.
    Image Added 
  3. Select the encoding type "Binned Encoding" from the drop-down. Further selection options adapt to the needs.
    Image Added 
  4. Select the output format from the dropdown 'Output'.
    INFO: 'As List' keeps all binary pairs together in a single column. Similarly, for binned encoding choose As Columns to create binary pairs each in their own column, or Ordinal to encode 'Ordinal' encodes as ordinal numbers in a single column.
  5. For binned encoding, Bin Dividers determine the number of bins, and the highest value for each bin divider determines the value distribution in each percentile. You can use Add new Divider to add a new bin. You can change the highest value in a given bin by selecting a different value. For dates, set the Date Bins by option to day, hour, month, quarter or year. 
    In the example shown below, years between 1999 and 2006 will be grouped and encoded together, 2007 to 2011 in the next, 2012 to 2018 in the next, and 2019 to 2021 in the last.       
    Image Removed 
  6. Click Encode. The new encoded columns are added in an EncodingSheet.

Once encoding is initiated you can use the Encoding tab of the Inspector to add to or change encoding criteria. Click Update to apply changes.

  1. Image Added 
  2. View the default value distributions. 
    INFO: The graph changes according to the amount of dividers. 
    Image Added 
  3. Enter the required bin dividers to change the percentile size of the divider. 
    INFO: There are 3 dividers set as default, e.g. 'Divider 1' contains the 25% of the selected column values. 
    INFO: To delete a divider, click on "x" next to the required divider.
    Image Added 
  4. If needed, click on "Add new Divider" to add a new divider. 
    INFO: Clinking the button adds an additional bucket, recalculates the percentile size and the corresponding absolute values. 
    Image Added 
  5. Confirm with "Encode"The encoding result is displayed in a new encoding sheet within the workbook. Binned encoding is finished. 
    Image Added

Anchor
format_column
format_column
Formatting columns

...