How to Parse fields using RegEx in Alteryx

  • How-Tos FAQs
  • March 8, 2021
Get Started Transforming Your Data in Snowflake - feature img

While we are analyzing or extracting data from a database, it is often helpful to exploit  regular expressions to parse data from a specific field.  In Alteryx, there is a dedicated tool to write and use regular expressions to parse data. It is the  RegEx tool. Lear how to Parse fields using RegEx in Alteryx.

RegEx tool  is a powerful tool mainly used to  parse match  or  replace  data exploiting regular-expression syntax.

Our starting field should appear as shown below.

How to Parse fields using RegEx in Alteryx

We aim to get  three different fields  from the original one to parse the  initial date , the  name  in the middle, and  the VAT number  at the end.

RegEx tool

The  RegEx tool  is one of the most powerful data parsing and text recognition tools.

In Alteryx, it is located in the  Parse tab  of the  Tool Palette

  1. Connect a RegEx tool directly to your workflowHow to Parse fields using RegEx in Alteryx
  2. In the first section of the configuration, panel  select the field to parse.How to Parse fields using RegEx in Alteryx
  3. In the second section,  write the regular expressions  to parse data.How to Parse fields using RegEx in Alteryx Use () to identify the marked groups (fields we want as output) and flag the ‘Case insensitive’ option
  4. In the third section, select the Parse  methodHow to Parse fields using RegEx in Alteryx

You can also set names, data types, and sizes for any output column.

The syntax used for the marked groups:

  • First one – Birth Date

(\d{2}\-\d{2}\-\d{4})

It says we want two digits, then a dash, two other digits, another dash, and four digits at the end.
Use {} after a special character to specify how many should be
Use () to define the marked group

  • Second one – Name

(\l+)

It says we want any lowercase character (‘+’ means one or more).
It should not extract correctly what we want, but we have chosen ‘Case insensitive,’ so both upper and lowercase characters are included.
Use () to define the marked group

  • Third one – VAT

(\d+)

It says we want any remaining digit (‘+’ means one or more).
Use () to define the marked group.
The ending dataset should appear as shown below

How to Parse fields using RegEx in Alteryx

As you can see, the regular expressions used have correctly parsed all the three marked groups we have set: Birth Date, Name, and VAT.

We have seen how to use regular expression syntax to parse data from a specific field in Alteryx. RegEx tool is beneficial not only to parse data but also  to check if a string matches a specific expression  (match method),  to replace a specific string in a field  (replace method),  to perform more complex data parsing  (tokenize method).

Here the most basic syntax for regular expressions are reported:

. = any single character
\w = word character
\l = lowercase character
\u = uppercase character
\d = digit
\n = new line
\s = white space character
+ = one or more
[] = any specified character
() = marked group (i.e., the information you want to extract)

And that’s how to parse fields using RegEx in Alteryx!

Continue reading

Check out: How to impute a value for missing data in Alteryx

Related Posts

banner-data-security-alteryx-blog

Can You Trust Your Data In Alteryx?

  • John Morrell
  • March 5, 2018

Announcing the Datameer + Snowflake Partnership

  • Press Release
  • January 25, 2022