Drag And Drop Data Analysis With Datameer: A Walkthrough

Towards Data Science
December 3, 2021

One of my data engineering team members was working on a new requirement from the business. And he happened to encounter a relatively new SaaS tool from an already established data player, “Datameer”.

This was originally published on Towards Data Science

The tool is fairly new and supports “drag and drop” data analysis functionality.

Anyway, let’s get started. First off, let me show you some of the sample data I am working with today. I have two tables, 1 dimension and 1 fact within my Snowflake database account.

D_Tracks:

Drag And Drop Data Analysis With Datameer — Image by author

F_Streams:

The goal today is to simply find how many times a track has been streamed every day. Technically, we just need three columns i.e., the date, the track name, and the counts. Furthermore, we want only those, which have been played more than once every day — we’re interested in the popularity of a song.

Assuming you have configured Snowflake with Datameer SaaS, let’s see how easy it is to represent/or implement a solution to the above requirement:

Step 1: Bring the datasets into Datameer SaaS tool

Step 2: Click the “New Transformation” button and select “Formula”

Drag And Drop Data Analysis — Image by author

Step 3: Convert Time_Stamp into Dates

Image by author

When prompted select “Time_Stamp” column and enter the following formula:

TO_DATE(Source.TIME_STAMP)

Step 4: Rename the new “Recipe” with “F_STREAMS_WITH_DATES”

Step 5: Click the new transformation button after “F_STREAMS_WITH_DATES” and select “Blend” this time

Please note, we are trying to perform an inner join between “F_STREAMS_WITH_DATES” and “D_TRACKS” on the “Tracks_Key” column.

In order to do so, please perform the following:

This will help us to create a new “Recipe” — “F_STREAMS_DATES_WITH_TRACKS” as in the next flow image.

Step 6: Add a new aggregate transformation now

The idea is to group the records by dates and track names. Also we will create a counts column that will help us filter out records which have been played less than two times.

For achieving the above, simply go through the following image:

It is important to note that the data looks like the following now:

Step 7: Let’s remove data where counts ≤ 1

Be sure to select the “Filter” transformation this time.

Very simply, add a new filter to the dataset and return records only for counts greater than 1.

Step 8: Order the records based on dates

Lastly, we need to select the “SQL” transformation to write standard ANSI SQL on our recipes and return results ordered by Dates in ascending order.

Our flow currently looks like the below:

This is done as below:

select * from F_STREAMS_DATES_TRACKS_AGG_G1 AS FG1
ORDER BY FG1.TIME_STAMP;

The final data set of 32 records looks as below:

Now if we wanted to write the entire SQL code, we could always do the following from within the SQL input option in the same tool:

SELECT TO_DATE(FS.Time_Stamp) as Stream_Date, DT.Track_Name, COUNT(FS.Streams_Key)
FROM EDW.FACTS.F_STREAMS as FS
INNER JOIN EDW.DIMS.D_Tracks DT ON FS.Tracks_Key = DT.Tracks_Key
GROUP BY TO_DATE(FS.Time_Stamp), DT.Track_Name
HAVING COUNT(FS.Streams_Key) > 1;

This concludes our discussion on Datameer — Simple transformations. Feel free to reach out to me if you have any questions.

Drag And Drop Data Analysis With Datameer: A Walkthrough

Related Posts

Navigating Data Privacy in the Age of AI: Strategies for ...

Top 5 Snowflake Tools for Analysts

Should You Learn to Code for Data Analytics? – Code...

Product

Company

Resources

Sign up for our newsletter

Follow us on