Why is my Alteryx formula returning 0’s when I try to replace null values with the average?

  • How-Tos FAQs
  • August 1, 2021
Faq banner

In this article, let’s learn why Alteryx formula returning 0’s when replaced null values with average. While there are many ways of handling null values in an ETL process, replacing them with the average of a numerical column is the best option, especially when the data is normally distributed.

The TWO main ways of replacing null values with the average include;

  1. Use of Imputation Tool in the Preparation category
  2. Use of Summarize Tool in the Transform category

We will discuss these methods of replacing the null values with the average of the numerical columns, but first, let’s understand why is it important to replace null values with the mean of the remaining values?

Mean substitution is a simple and direct concept. It is normally a viable option whenever your values are missing completely at random, and the univariate mean of your values is your only target. In this case, your variables’ mean is not biased.

Let’s use different methods to replace all the  NULL () AGES  with an average of the  AGE  column in the data below.

Alteryx Formula Returning 0’s When Replaced Null Values

Imputation Tool in the Preparation Category 

This is always the simplest method of replacing specified values in one or more numeric data fields with other defined values. The overall procedure of using the Imputation Tool to replace  NULL () AGES  with the mean of the remaining values for the AGE Column includes;

Select the  AGE  field you wish to impute in the impute tool options.

imputation tool

Check the radio buttons for  NULL  and  AVERAGE  in the ‘incoming values to replace’ and ‘replace with value’ sections, respectively.

imputation tools

The default output consists of the imputed values and the original data stream. If ‘Include imputed value indicator field’ is checked, an extra column with the same prefix as the original column name and a suffix of _Indicator appears next to each field specified for imputation.

Alteryx Formula Returning 0’s When Replaced Null Values with Average

While this is the simplest and direct method of replacing the null values, it may be extremely slow when trying to substitute a large number of values.

Summarize Tool in the Transform category

With the summarize Tool, first, you have to generate an average field for the AGE column.

Select the  AGE  field from the field options.

field options

Select the  AGE  field in the actions window. Click the  Add  dropdown button, and then under  Numeric  select  Average .

Null Values with Average

The result is an average of the  AGE  column.

After generating the average, use the Append Tool in the Join category to join it back into the stream.

append tool with Alteryx Formula

You can then use the Formula Tool in the preparation category with the function below to replace null ages with the calculated average.  

If [AGE] = Null() then [Avg_AGE] else [AGE] endif

The final output is the original fields with the  NULL () AGES  substituted with the average of the remaining values of the AGE column.

Replaced by Null Values with Average

While the use of Summarize Tool required more setup time, it was considerably more effective and faster in executing very big datasets. The imputation tool helped in imputing columns with few rows while solving the issue of why the Alteryx Formula Returning 0’s When Replaced Null Values with Average.

Related Posts

Top 5 Snowflake tools for Analysts- talend

Top 5 Snowflake Tools for Analysts

  • Ndz Anthony
  • February 26, 2024