What is Data Modeling and How Do You Do It?

Data modeling involves defining data content and structure for a specific purpose, typically done in three phases: conceptual model, logical model, and physical model. The conceptual model outlines high-level data requirements, the logical model adds more detail, and the physical model is the implementation of the data model in a specific data warehouse, such as Snowflake.

Datameer Spectrum Versus Tableau Prep
Data modeling The Cloud Data Warehouse Era icon

Data Modeling for Analytics

Data Governance frameworks

The two most prominent analytics modeling approaches came from data warehouse innovators Bill Inmon and Ralph Kimball.  

The former advocates a subject-oriented approach to data modeling (organized into subject areas), while the latter, the Kimball approach, focuses on organizing data into fact tables and dimensional tables in what is commonly known as a star schema.  

Each approach added structure to data warehouses and models, which also required lengthy change management processes.

In both approaches, data was/is processed using an ETL model – extract, transform, and load. 

Data modeling The Cloud Data Warehouse Era icon

The Cloud Data Warehouse Era

security-and-data-governance

The advent and popularity of cloud data warehouses have altered how we think about data transformation and modeling. 

 The new ELT model (extract, load, and transform) processes extracts, loads these extracts into the CDW, and then transformation takes place in the target system.

This approach allowed organizations to take advantage of the inexpensive and scalable compute and storage services of the CDW and created agility by separating data loading and data.

In cloud data warehouses, the EL and the T are separated, with the data engineer responsible for the EL and shared responsibilities between the data engineer and analytics community for the T- with much of the data modeling happening in the “T.”

collaboration tools green icon

Step 1: Getting the Data Into the CDW

In the modern data landscape, data comes from various sources.  The faster-growing data sources for analytics are from SaaS applications and cloud services, and these sources have highly complex data structures and APIs.  

The modern pre-built EL or “data loader” tools, such as Fivetran, Hevo, Stitch, and more, focus on eliminating complexity within source data and replicating “objects”  into your cloud data warehouse in the most effective way.

Thus, the initial data model your team will work from is a set of tables in the cloud data warehouse that look like objects from your data sources, are grouped similarly, and contain all the same fields. 

Data modeling low code icon

Step 2: Canonical Data Modeling

Once the data is in the CDW and has gone through the first phase of data wrangling, the data engineering team can transform the raw data into canonical data models representing specific subjects.

  Examples would be data dimensions and facts representing user-reporting objects such as customers, contacts, leads, opportunities, activities, etc.

The primary rationale for canonical data models is to create shared, reusable components for multiple use cases.  Some of the benefits of creating this semantic layer of some sort are:

  • Creating a single version of truth for each subject and field within that subject.
  • Providing shared and standardized definitions and documentation about the data for each subject,
  • Transparency into the data models and how they are built to build trust in the analytics community.

 To build these canonical data models,  the data engineering team will gather requirements from the various business and analytics teams. These will typically be supersets of the requirements to maximize reuse and consumption. 

people icon

Step 3: Use Case Data Modeling

The final step in data modeling is creating a dataset or datasets specific to the analytics use case.  For modern data modeling in a cloud data warehouse, this task is typically done by the data analyst.  Why?  It boils down to roles and skills:

  • Data engineers tend to know more about the data itself – where it resides, how it is structured and formatted, and how to get it – and less about how the business uses the data.  This makes their ideal role in getting the data into the CDW and first-pass data modeling.
  • Data analysts know less about the raw data but completely understand how the business would use the data and how it would be incorporated into analytics.  This makes their ideal role to be use case data modeling and transformation.

Data analysts may have varying technical skills but prefer to spend more time on what they are good at – analysis – and less on coding data transformation.  This is where a low-code or no-code data transformation UI becomes essential, eliminating the need for analysts to write complex SQL code and Python-like scripts.

Use case data modeling and transformation will typically involve:

  • Data cleansing that is specific for the use case, such as identifying and fixing outliers or deduping records,
  • Data shaping and reduction such as sorting and organizing the data, eliminating unneeded fields, narrowing the scope of the data to time periods or specific dimensions, and
  • Data enrichment to add new calculated fields specific to the analysis or upload local files specific to the use case, such as external or department-specific data.

The optimal final form of the data model will be a single flattened data structure – a huge, wide table.  This, along with materialization, eliminates the need for expensive JOINS to be performed each time a query is performed for the analysis.

The Transformation Saas Tool For Your Snowflake CDW

The Datameer SaaS data transformation solution is the industry’s first collaborative, multi-persona data transformation platform integrated into Snowflake.

 It brings together your entire team — data engineers, analytics engineers, analysts, and data scientists — on a single platform to collaboratively transform and model data directly in Snowflake, keeping data secure and using Snowflake’s scalable compute and storage.

Modern data stacks using cloud data warehouses and ELT processes have created the need for modernized data modeling within the data stack.  

Are data modeling and transformation keeping you from modernizing your analytics?  Are your data and analytics engineering costs skyrocketing? 

Then Datameer is just the solution for you!

Hope to see you soon, cheers 🥂

No-Code Analytics Built for Snowflake

Book Demo