How to parse a variable in Snowflake as a source reference in dbt

  • How-Tos FAQs
  • March 29, 2021

There are a number of reasons why you might want to have a dynamic source in dbt. Variables are one way to achieve this. But, there are issues with using dynamic sources instead of a model, so dynamic sources aren’t best practice. Let’s explore how to parse a variable in Snowflake as a source reference in dbt.

Variables in dbt

There are two ways to use variables in dbt.

The first is by using Jinja. Jinja allows for the use of variables. The first line in the example below shows a variable being set in Jinja, then being used later in a query.

Jinja Example

{% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %} 

select
    order_id,
    {% for payment_method in payment_methods %}
    sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount,
    {% endfor %}
    sum(amount) as total_amount
from app_data.payments
group by 1

dbt itself also has a way to use variables as well. Variables in dbt can be used in two places:

  • the dbt_project.yml file in a dbt projectHow to parse a variable in Snowflake as a source reference in dbt
  • the command lineHow to parse a variable in Snowflake as a source reference in dbt

Sources in dbt

Sources in dbt are intended to be static sources that are physical objects in your data warehouse. A source is green in the DAG because that data is simply an input to the dbt project and is not created or modified by the dbt project. A “dynamic” source goes against this premise. A dynamic source is essentially a model and best practice would be to use a model, rather than a source.

Because each source has to be defined in a .yml file, you would have to define each possible source within that file, effectively limiting dynamic sources to automatically pick up new sources. Each new source would need to be added to a source .yml file. Sources should also have a description, which is static text in the source file, so the source might be dynamic, but it is not possible to make the description dynamic.

Sources are defined in .yml files nested under a sources: key:

version: 2

sources:

  - name: jaffle_shop
    tables:
      - name: orders
      - name: customers

  - name: stripe
    tables:
      - name: payments

dbt compilation happens before query execution

In addition to the issues mentioned above, there can be no query execution in order to dynamically select a source, because dbt compilation happens first, without even connecting to your data warehouse. If you need to use a query to dynamically select your source, dbt will not populate your source until after the DAG has already been created, so the dynamic source will be missing from the DAG

Using a model instead of a dynamic source

Using a model instead of a dynamic source makes much more sense. If there is a limited set of tables or views that the dynamic model selects from, then those views and tables could be set up as sources, so that the DAG is complete from source to analytics-ready.

Using a dynamic view in Snowflake

Another option is to make a dynamic view in Snowflake. This view can be used as a source in dbt. This would hide that the source is dynamic from the end-user.

This can be implemented by creating a dataset that has all the data needed for the model and using a variable in dbt in a where clause as a dynamic filter, rather than a dynamic table selection.

Conclusion

It’s possible to have a dynamic source in dbt, but there are limitations and it’s not best practice. But, a model or dynamic Snowflake view can solve the same problem. Now we understand how to parse a variable in Snowflake as a source reference in dbt.

Continue reading:

Check out: How to create a Pivot Table in Alteryx

Related Posts

Data Collection: A Definitive Guide

  • Jeffrey Agadumo
  • February 8, 2023