The Simplest Road to a Modern Data Stack with Snowflake
- John Morrell
- September 30, 2021
The first building block of a cloud data stack starts with Snowflake. Your analytics engine and/or cloud data warehouse is always the core component by which your data stack revolves.
The shift to cloud analytics and cloud data warehouses was supposed to simplify and modernize the data stack for analytics. Yet, many cloud journeys have done quite the opposite – the data stack has gotten more complex and expensive. In the end, this drives up data engineering costs.
When building a data stack for Snowflake, customers have a myriad of options. Many new tools in new categories can fulfill a specialized role in your data stack. But this can also create a disintegrated data stack that can increase the overall complexity to create, operationalize, and manage your data pipelines.
Beyond your Snowflake analytics engine at the end of your data stack, there is a myriad of specialty tools that can be inserted into your data stack, including:
- Data ingest and loading
- Data transformation
- Data orchestration
- Data catalogs and metadata management
- Data security and governance
- Data observability
However, the plethora of choices for each of these tools can be overwhelming as the data engineering ecosystem diagram (courtesy of lakeFS) below suggests. However, too many tools often create a disintegrated architecture increasing the overall complexity to create, operationalize, and manage your data pipelines.
In some cases, these specialty tools may work with one or two tools in an adjacent category but do not work effectively with the rest of the ecosystem. With Snowflake’s increasing presence and growth in the market, most products will work well with it. But beyond Snowflake, piecing together the rest of your data stack in a piecemeal manner requires at least five additional products.
Focus on the T in your ELT data stack
If you have been using Snowflake for more than a few months, odds are you have vetted out at least the basics around the extract and load part of the ELT process – getting your data into Snowflake. This might be done with Snowflake’s utilities or any of the numerous 3rd party “data loader” tools and services on the market.
If you are loading data from specific operational data sources such as existing databases, SaaS applications, or cloud services, you might have chosen a tool such as Fivetran, Hevo Data, Xplenty, or Stitch. These tools are very good at making it easy to replicate data from SaaS or cloud services sources via their APIs and loading the data into Snowflake.
What these tools are not very good at is the T at the end of your ELT stack. The T, or transformation part, is where this raw data loaded into Snowflake is transformed into a form that is useful for analytics and can be directly consumed by analytics and BI tools.
Data transformation within your ELT stack is where the most time and effort is placed, and oftentimes organizations make this process very manual. The EL tools are a small fraction of where your data and analytics engineering costs lie. It is the T that makes up the vast majority of your data and analytics engineering costs. If you can make the T process and workflow far more efficient and effective, you can dramatically lower your data and analytics engineering costs.
Who does the T for Snowflake?
As your data platform became more complex, new roles and responsibilities emerged. In the past, data transformation was purely the domain of technical ETL developers. In data lakes, a new role called the data engineer emerged. The ELT model for cloud data warehouses such as Snowflake still often involves data engineers. But it has allowed the analytics community – data analysts and scientists – to step up and take a bigger role in data transformation. In some cases, a new role has emerged – the analytics engineer.
Each of these personas often has:
- Different skill sets ranging from being highly technical to less technical,
- Different knowledge and understanding of how data is used,
- Different focal points on their role and where they can best use their time.
Data engineers tend to know more about the data itself – where it resides, how it is structured and formatted, and how to get it – and less about how the business uses the data. Analytics engineers, data analysts, and data scientists know less about the data itself but have a complete understanding of how the business would use the data and how it would be incorporated into analytics.
What are the keys to optimizing my T for Snowflake?
Datameer is a powerful SaaS data transformation platform that runs in Snowflake – your modern, scalable cloud data warehouse – that combines to provide a highly scalable and flexible environment to transform your data into meaningful analytics. With Datameer, you can:
- Allow your non-technical analytics team members to work with your complex data without the need to write code using Datameer’s no-code and low-code data transformation interfaces,
- Easily combine large volumes of captured data with master and other data to create context-rich, meaningful datasets for analysis,
- Fully enrich analytics datasets to add even more flavor to your analysis using the diverse array of graphical formulas and functions,
- Generate rich documentation and add user-supplied attributes, comments, tags, and more to share searchable knowledge about your data across the entire analytics community,
- Use the catalog-like documentation features to crowd-source your data governance processes for greater data democratization and data literacy,
- Maintain full audit trails of how data is transformed and used by the community to further enable your governance and compliance processes,
- Deploy and execute data transformation models directly in Snowflake to gain the scalability your need over your large volumes of data while keeping compute and storage costs low.
With Snowflake providing highly optimized execution under the covers, you don’t need highly technical tuning of your data transformation models inside of Snowflake. This allows fewer technical personnel that are not likely deep database experts to get involved with the data transformation and modeling process. As such the keys to optimizing data transformation for Snowflake come down to four areas:
- Making effective use of virtual data warehouses
- How you execute your data transformation queries
- Employing specific data transformation techniques
- Having searchable, rich data documentation
Read our complete guide – How to Optimize Your Data Transformation for Snowflake – to learn more about having highly optimized and effective data transformation.