How to impute a value for missing data in Alteryx
- How-Tos FAQs
- March 3, 2021
It is common to find null or missing values within the databases during the ETL process. Null or missing values are cells containing no values. These can dirty calculations and output files. Let’s figure out how to impute a specific value to handle missing data in Alteryx.
We can use Alteryx to handle missing data.
There are three possible ways, depending on the data type of the field:
- If the field has a numeric data type, you can use an Imputation tool or a Formula tool for basic replacements and the ‘Append Fields’ technique for more complex ones;
- If the field has a non-numeric data type, you can use a Formula tool for basic replacements and the ‘Append Fields’ technique for more complex ones.
The starting field should appear as shown below.
Using an Imputation tool is the simplest way to replace null values in a numeric field with descriptive statistics (4). It also helps keep track of replaced null values (5).
1. Connect an Imputation tool directly to your workflow
2. In the first section of the configuration panel, select all the fields where you want to replace nulls
3. In the second section, choose ‘Null’
4. In the third section, choose which numeric replacement you want to use between average, median, mode, user-specified value
5. In the fourth section, you can add a field that indicates which rows have been replaced and if to create a new field with replacements instead of modifying the original one
Using a Formula tool is the simplest way to replace nulls in any field with a specific value (2) or a row calculation (3).
1. Connect a Formula tool directly to your workflow
2. Use an if-statement as shown in example 1 to replace nulls with a specific value
3. Use an if-statement as shown in example 2 to replace nulls with a row calculation
Using the Append Fields technique is helpful to replace null values with vertical calculation or with data coming from a different point of the workflow.
1. Connect your workflow to the Target (T) anchor of an Append Fields tool
2. Connect the information you want to use as a replacement to the Source (S) anchor of the Append Fields tool (it has to be a dataset made of one row; otherwise, each row of the T-anchor will be duplicated for each row of the S-anchor)
3. Use a formula tool to make the replacement
4. Use the select tool to keep only valuable fields
The ending dataset should appear as shown below.
We have seen three different techniques on how to impute value for missing data in Alteryx.
1. Imputation tool to replace nulls with average, median, mode, or user-specified value in numeric fields
2. Formula tool to replace nulls with a specific value or row calculation in any fields
3. Append Fields technique to replace nulls with vertical calculation or information coming from a different source
Check out: How to Parse fields using RegEx in Alteryx