Page tree
Skip to end of metadata
Go to start of metadata

In Datameer grouping functions are split into two distinct types of functions:

Group Series Functions
Aggregate Functions

Group series functions in Datameer are similar to aggregation functions - they group values that have the same key, but they do not aggregate values. The biggest difference between group series functions and aggregation functions is that group series functions usually have more than one result per key where as aggregation functions have only one result per key.
In order to use these functions you must first define groups within your workbook. This can be done using the GROUPBY (), GROUPBYBIN (), or the GROUPBYGAP () function.

Group Series Functions

Group series functions operate row-wise within a group, i.e. the function is applied to every row and therefore returns a value for every argument in the group.

The following functions in Datameer are group series functions:

Aggregate Functions

Aggregate functions combine and then operate on all the values in a group, i.e. the function returns one value for each group.

The following functions in Datameer are aggregate functions:

Due to the different nature of group series functions and aggregate functions, they cannot be used in the same workbook sheet.

Real World Examples of Group Series Functions

Here we show a few examples of the power of using group series functions and how they can be used in your analyses:

Click stream analysis with Session ID

Group series functions can be used for click stream analysis. To do this you must implement the following steps:

  1. Group according to the session key.
  2. Sort all values within each session by the timestamp.
  3. Generate click paths using the URLs.

For example, suppose we have the following input data:

Generating click stream information is easily done within one formula sheet:


The result of this sheet looks like this:

After that, you can do whatever analysis is required, e.g.

  1. Find the page where users spend most time on, by:
    1. Doing GROUPBY(#ClickStream!From) and GROUPMEDIAN(#ClickStream!TimeSpent).
    2. Sorting by TimeSpent.
  2. Find the page where a user most often leaves the application, by:
    1. Filtering all records, where #ClickStream!To == "external".
    2. Doing GROUPBY(#From) and GROUPCOUNT().
    3. Sorting by count in descending order.
  3. Find most often page transitions, by:
    1. Doing GROUPBY(#ClickStream!From), GROUPBY(#ClickStream!To) and GROUPCOUNT().
    2. Sorting by count in descending order.

Click stream analysis without Session ID

In many cases you might not have a session ID in your data so you are not able to group on a session quite so easily. But usually it's possible to extract a session from your data. For example, if you have IP addresses and timestamps in your server log, you could assume that for each IP address a new session begins when there is a gap of more than x minutes between one and the next log entry. This means that the request is coming from the same machine, but it's probably a different session. The GROUPBYGAP function helps you determine the session information.

For example, suppose we have the following input data:

Generating click streams can be done within one formula sheet:


The result of this sheet looks like this:

Market basket analysis

Another use case for group series analysis is determining which products have been bought together in one transaction in order to do recommendations. To do this analysis, you can easily use a GROUP_PAIR function that creates all pairs of a value within one group.

Suppose you have a data structure like this:

You can now do the following analysis:

	-The GROUPBY function groups all the transactions of the same value together.
	-The GROUP_PAIR function can be used after a GROUPBY function to return all pairs of values of each GROUPBY value in a list.
	-The LISTELEMENT functions are used to extract the first and second elements of the list into separate columns.   

The result sheet will looks like this:

Next, you can aggregate on pairs to see which pairs occur most often in one transaction.

  • No labels