Once you have decided on a focus point from your data, finding outliers is a simple and repeatable process using Datameer.
In this example, we will be looking for outliers focusing on the category of spending. Download the sample data and try it yourself!
Download Sample: CreditCardData.csv
Table of Contents
1) Set a Baseline - Sheet 1 (Baseline)
First, create a new sheet in your workbook and name it Baseline.
Because we are looking for outliers by category, we will create our first GROUPBY on the VendorCategoryCode data from our source:
Finding the Average is a vital step to determining a baseline, here’s your next column:
2) Compare Averages - using a Join Sheet
We want to compare original data with our baseline, so we have to join those sheets in order to reference information from each; here’s how:
Create a Joined Sheet
Preform an inner join based on the VendorCategoryCode from the source sheet and the and the Category grouping that we created on the new Baseline sheet.
3) Define Outliers - Advanced Filtering
Though there are many ways to do this including a new sheet with mathematical functions, using advanced filtering keeps your workbooks clean and efficient. Here’s how:
Create a filter on the join page and use the Advanced Filter setting.
Full expression: #CreditCardData.AmountUSD-#Baseline.AverageSpend>2*#Baseline.SpendStandardDeviation