What are the options when it comes to handling data lineage in Snowflake?

  • How-Tos FAQs
  • June 7, 2021
datameer banner faq

While Snowflake is a great high-performance data warehouse, its main focus is providing data warehouse infrastructure-as-a-service. Snowflake has partners specializing in ETL or ELT, but they don’t provide robust tools for extracting, transforming, and loading data in a traditional sense. So how do you go about handling data lineage in Snowflake?

There are ways to build your solution in Snowflake for data lineage, using SQL queries against your information_schema tables and maybe some Python code or another visualization and interaction layer.

But, there is no native tool (yet) within Snowflake that provides an excellent out-of-the-box solution for data lineage.

Datameer

Datameer, could be the best fit for less technical data teams or for the end-user use in self-service data environments. Datameer makes data transformation easy, without code, without SQL. Datameer automates data lineage so you can always track where your data came from in an excellent graphical tool.

By building models within Datameer, as long as you accurately build them, which Datameer makes easy, you’ll always have up-to-date data lineage for the data sets that you build or create in Datameer.

Snowflake Native Solution

Suppose you have a hard requirement to deliver a data lineage solution without adding other tools to your data stack, or you want to use Snowflake. In that case, there is a way to get data lineage data out of Snowflake without using any other tools.

The Snowflake.account_usage.access_history table has all the data you need to build a data lineage graph.

You must have and use the accountadmin role to access this table in your Snowflake account, or you need a role that has usage for Snowflake’s shared database enabled. Instructions can be found in Snowflake documentation.

The above table will only show the data you need to build a lineage graph. The data would need to be pulled into another tool to visualize and allow end-users to work with it. Otherwise, it wouldn’t be that helpful. You could build a data application on top of this data in Snowpark, which would keep the solution within the Snowflake platform.

dbt (data build tool)

dbt is a relatively new option. dbt stands for “data build tool.” dbt is built in Python to let data and analytics engineers build production-grade data pipelines. Although dbt is written in Python, to use DBT in practice, you mainly only need to know SQL. The way DBT is designed, if you follow best practices, data lineage is entirely automated.

dbt also provides a nice-looking lineage graph. dbt creates automated documentation that enables the management of table and field descriptions to create rich documentation for end-users. dbt does have a reasonably steep learning curve, as it’s designed for analysts that already have advanced SQL skills. You absolutely need to know SQL to use dbt.

So for some organizations, dbt is not the best tool.

 

Related Posts

Top 5 Snowflake tools for Analysts- talend

Top 5 Snowflake Tools for Analysts

  • Ndz Anthony
  • February 26, 2024