Why is my Alteryx formula returning 0’s when I try to replace null values with the average?
- How-Tos FAQs
- August 1, 2021
In this article, let’s learn why Alteryx formula returning 0’s when replaced null values with average. While there are many ways of handling null values in an ETL process, replacing them with the average of a numerical column is the best option, especially when the data is normally distributed.
The TWO main ways of replacing null values with the average include;
- Use of Imputation Tool in the Preparation category
- Use of Summarize Tool in the Transform category
We will discuss these methods of replacing the null values with the average of the numerical columns, but first, let’s understand why is it important to replace null values with the mean of the remaining values?
Mean substitution is a simple and direct concept. It is normally a viable option whenever your values are missing completely at random, and the univariate mean of your values is your only target. In this case, your variables’ mean is not biased.
Let’s use different methods to replace all the NULL () AGES with an average of the AGE column in the data below.
Imputation Tool in the Preparation Category
This is always the simplest method of replacing specified values in one or more numeric data fields with other defined values. The overall procedure of using the Imputation Tool to replace NULL () AGES with the mean of the remaining values for the AGE Column includes;
Select the AGE field you wish to impute in the impute tool options.
Check the radio buttons for NULL and AVERAGE in the ‘incoming values to replace’ and ‘replace with value’ sections, respectively.
The default output consists of the imputed values and the original data stream. If ‘Include imputed value indicator field’ is checked, an extra column with the same prefix as the original column name and a suffix of _Indicator appears next to each field specified for imputation.
While this is the simplest and direct method of replacing the null values, it may be extremely slow when trying to substitute a large number of values.
Summarize Tool in the Transform category
With the summarize Tool, first, you have to generate an average field for the AGE column.
Select the AGE field from the field options.
Select the AGE field in the actions window. Click the Add dropdown button, and then under Numeric select Average.
The result is an average of the AGE column.
After generating the average, use the Append Tool in the Join category to join it back into the stream.
You can then use the Formula Tool in the preparation category with the function below to replace null ages with the calculated average.
If [AGE] = Null() then [Avg_AGE] else [AGE] endif
The final output is the original fields with the NULL () AGES substituted with the average of the remaining values of the AGE column.
While the use of Summarize Tool required more setup time, it was considerably more effective and faster in executing very big datasets. The imputation tool helped in imputing columns with few rows while solving the issue of why the Alteryx Formula Returning 0’s When Replaced Null Values with Average.