How to parse a variable in Snowflake as a source reference in dbt
- How-Tos FAQs
- March 29, 2021
There are a number of reasons why you might want to have a dynamic source in dbt. Variables are one way to achieve this. But, there are issues with using dynamic sources instead of a model, so dynamic sources aren’t best practice. Let’s explore how to parse a variable in Snowflake as a source reference in dbt.
Variables in dbt
There are two ways to use variables in dbt.
The first is by using Jinja. Jinja allows for the use of variables. The first line in the example below shows a variable being set in Jinja, then being used later in a query.
Jinja Example
{% set payment_methods = ["bank_transfer", "credit_card", "gift_card"] %}
select
order_id,
{% for payment_method in payment_methods %}
sum(case when payment_method = '{{payment_method}}' then amount end) as {{payment_method}}_amount,
{% endfor %}
sum(amount) as total_amount
from app_data.payments
group by 1
dbt itself also has a way to use variables as well. Variables in dbt can be used in two places:
- the dbt_project.yml file in a dbt project
- the command line
Sources in dbt
Sources in dbt are intended to be static sources that are physical objects in your data warehouse. A source is green in the DAG because that data is simply an input to the dbt project and is not created or modified by the dbt project. A “dynamic” source goes against this premise. A dynamic source is essentially a model and best practice would be to use a model, rather than a source.
Because each source has to be defined in a .yml file, you would have to define each possible source within that file, effectively limiting dynamic sources to automatically pick up new sources. Each new source would need to be added to a source .yml file. Sources should also have a description, which is static text in the source file, so the source might be dynamic, but it is not possible to make the description dynamic.
Sources are defined in .yml files nested under a sources: key:
version: 2
sources:
- name: jaffle_shop
tables:
- name: orders
- name: customers
- name: stripe
tables:
- name: payments
dbt compilation happens before query execution
In addition to the issues mentioned above, there can be no query execution in order to dynamically select a source, because dbt compilation happens first, without even connecting to your data warehouse. If you need to use a query to dynamically select your source, dbt will not populate your source until after the DAG has already been created, so the dynamic source will be missing from the DAG
Using a model instead of a dynamic source
Using a model instead of a dynamic source makes much more sense. If there is a limited set of tables or views that the dynamic model selects from, then those views and tables could be set up as sources, so that the DAG is complete from source to analytics-ready.
Using a dynamic view in Snowflake
Another option is to make a dynamic view in Snowflake. This view can be used as a source in dbt. This would hide that the source is dynamic from the end-user.
This can be implemented by creating a dataset that has all the data needed for the model and using a variable in dbt in a where clause as a dynamic filter, rather than a dynamic table selection.
Conclusion
It’s possible to have a dynamic source in dbt, but there are limitations and it’s not best practice. But, a model or dynamic Snowflake view can solve the same problem. Now we understand how to parse a variable in Snowflake as a source reference in dbt.
Continue reading:
Check out: How to create a Pivot Table in Alteryx