GROUPCOUNTDISTINCT

Syntax

GROUPCOUNTDISTINCT(<any>)

Description

Counts the distinct values of a group.

This is an aggregate function

Example

Given the following data:

Customer_NameItem
2535JonZamioculcas
2888MikeSpider Lily
2535JonDatura
2535JonDahlia
5788JonSnapdragon
2888MikeMarguerite
2535JonWindflower
2888MikeElephant's Ear
2535JonChinese Evergreen
5788JonBegonia
5788JonStarfish Plant
2535JonHare's Foot Fern
2535JonVenus Flytrap
3545MoniqueRed Flame Ivy

First create a group using GROUPBY(#RawData!Name)

Name
Jon
Mike
Monique


Then use the GROUPCOUNTDISTINCT(#RawData!Customer_), and the results are a count of each time an individual value appears in that column in relation to the GROUPBY() column.

NameCustomer_COUNTDISTINCT
Jon2
Mike1
Monique1

Known issue:

When working with the GROUPCOUNTDISTINCT function on a dataset with a very high number of records in a group, out of memory exceptions might occur.

Cause:

In order to determine the true distinct value, GROUPCOUNTDISTINCT shifts all mapped information to disk on one reducer. This can cause disk space and performance/memory issues on large datasets. This is a known limitation that Datameer X is working to resolve.

Solution:

In order to work around this issue, an intermediary sheet along with a combination of GROUPBY and GROUPCOUNT functions is suggested as described below.

Original:

Sheet 1: user_id, product 

Sheet 2: GROUPBY(Sheet1.product), GROUPCOUNTDISTINCT(Sheet2.user_id)

Workaround:

Sheet 1: user_id, produt 

Sheet 2: groupby(Sheet1.product), groupby(Sheet1.user_id) 

Sheet 3: group by(Sheet2.product), groupcount

This work around gives the same desired output of the GROUPCOUNTDISTINCT function.