Drag And Drop Data Analysis With Datameer: A Walkthrough
- Towards Data Science
- December 3, 2021
One of my data engineering team members was working on a new requirement from the business. And he happened to encounter a relatively new SaaS tool from an already established data player, “Datameer”.
This was originally published on Towards Data Science
The tool is fairly new and supports “drag and drop” data analysis functionality.
Anyway, let’s get started. First off, let me show you some of the sample data I am working with today. I have two tables, 1 dimension and 1 fact within my Snowflake database account.
D_Tracks:
F_Streams:
The goal today is to simply find how many times a track has been streamed every day. Technically, we just need three columns i.e., the date, the track name, and the counts. Furthermore, we want only those, which have been played more than once every day — we’re interested in the popularity of a song.
Assuming you have configured Snowflake with Datameer SaaS, let’s see how easy it is to represent/or implement a solution to the above requirement:
Step 1: Bring the datasets into Datameer SaaS tool
Step 2: Click the “New Transformation” button and select “Formula”
Step 3: Convert Time_Stamp into Dates
When prompted select “Time_Stamp” column and enter the following formula:
TO_DATE(Source.TIME_STAMP)
Step 4: Rename the new “Recipe” with “F_STREAMS_WITH_DATES”
Step 5: Click the new transformation button after “F_STREAMS_WITH_DATES” and select “Blend” this time
Please note, we are trying to perform an inner join between “F_STREAMS_WITH_DATES” and “D_TRACKS” on the “Tracks_Key” column.
In order to do so, please perform the following:
This will help us to create a new “Recipe” — “F_STREAMS_DATES_WITH_TRACKS” as in the next flow image.
Step 6: Add a new aggregate transformation now
The idea is to group the records by dates and track names. Also we will create a counts column that will help us filter out records which have been played less than two times.
For achieving the above, simply go through the following image:
It is important to note that the data looks like the following now:
Step 7: Let’s remove data where counts ≤ 1
Be sure to select the “Filter” transformation this time.
Very simply, add a new filter to the dataset and return records only for counts greater than 1.
Step 8: Order the records based on dates
Lastly, we need to select the “SQL” transformation to write standard ANSI SQL on our recipes and return results ordered by Dates in ascending order.
Our flow currently looks like the below:
This is done as below:
select * from F_STREAMS_DATES_TRACKS_AGG_G1 AS FG1
ORDER BY FG1.TIME_STAMP;
The final data set of 32 records looks as below:
Now if we wanted to write the entire SQL code, we could always do the following from within the SQL input option in the same tool:
SELECT TO_DATE(FS.Time_Stamp) as Stream_Date, DT.Track_Name, COUNT(FS.Streams_Key)
FROM EDW.FACTS.F_STREAMS as FS
INNER JOIN EDW.DIMS.D_Tracks DT ON FS.Tracks_Key = DT.Tracks_Key
GROUP BY TO_DATE(FS.Time_Stamp), DT.Track_Name
HAVING COUNT(FS.Streams_Key) > 1;
This concludes our discussion on Datameer — Simple transformations. Feel free to reach out to me if you have any questions.