Data modeling is the process of defining data content and structure for a specific purpose.
This often happens in three phases.
First, a conceptual model is created -which defines the high-level data requirements for the data model.
Secondly, the conceptual model is translated into a logical model – which adds more detail and defines a destination-independent logical structure.
Finally, the logical model is translated into a physical model and can be implemented as a data source in data warehouses such as Snowflake.
The two most prominent analytics modeling approaches came from data warehouse innovators Bill Inmon and Ralph Kimball.
The former advocates a subject-oriented approach to data modeling (organized into subject areas), while the latter, the Kimball approach, focuses on organizing data into fact tables and dimensional tables in what is commonly known as a star schema.
Each approach added structure to data warehouses and models, which also required lengthy change management processes.
In both approaches, data was/is processed using an ETL model – extract, transform, and load.
The advent and popularity of cloud data warehouses have altered how we think about data transformation and modeling.
The new ELT model (extract, load, and transform) processes extracts, loads these extracts into the CDW, and then transformation takes place in the target system.
This approach allowed organizations to take advantage of the inexpensive and scalable compute and storage services of the CDW and created agility by separating data loading and data.
In cloud data warehouses, the EL and the T are separated, with the data engineer responsible for the EL and shared responsibilities between the data engineer and analytics community for the T- with much of the data modeling happening in the “T.”
In the modern data landscape, data comes from various sources. The faster-growing data sources for analytics are from SaaS applications and cloud services, and these sources have highly complex data structures and APIs.
The modern pre-built EL or “data loader” tools, such as Fivetran, Hevo, Stitch, and more, focus on eliminating complexity within source data and replicating “objects” into your cloud data warehouse in the most effective way.
Thus, the initial data model your team will work from is a set of tables in the cloud data warehouse that look like objects from your data sources, are grouped similarly, and contain all the same fields.
Once the data is in the CDW and has gone through the first phase of data wrangling, the data engineering team can transform the raw data into canonical data models representing specific subjects.
Examples would be data dimensions and facts representing user-reporting objects such as customers, contacts, leads, opportunities, activities, etc.
The primary rationale for canonical data models is to create shared, reusable components for multiple use cases. Some of the benefits of creating this semantic layer of some sort are:
To build these canonical data models, the data engineering team will gather requirements from the various business and analytics teams. These will typically be supersets of the requirements to maximize reuse and consumption.
The final step in data modeling is creating a dataset or datasets specific to the analytics use case. For modern data modeling in a cloud data warehouse, this task is typically done by the data analyst. Why? It boils down to roles and skills:
Data analysts may have varying technical skills but prefer to spend more time on what they are good at – analysis – and less on coding data transformation. This is where a low-code or no-code data transformation UI becomes essential, eliminating the need for analysts to write complex SQL code and Python-like scripts.
Use case data modeling and transformation will typically involve:
The optimal final form of the data model will be a single flattened data structure – a huge, wide table. This, along with materialization, eliminates the need for expensive JOINS to be performed each time a query is performed for the analysis.
The Datameer SaaS data transformation solution is the industry’s first collaborative, multi-persona data transformation platform integrated into Snowflake.
It brings together your entire team — data engineers, analytics engineers, analysts, and data scientists — on a single platform to collaboratively transform and model data directly in Snowflake, keeping data secure and using Snowflake’s scalable compute and storage.
Modern data stacks using cloud data warehouses and ELT processes have created the need for modernized data modeling within the data stack.
Are data modeling and transformation keeping you from modernizing your analytics? Are your data and analytics engineering costs skyrocketing?
Then Datameer is just the solution for you!
Hope to see you soon, cheers 🥂