How to connect DBT to multiple databases in the same project
- How-Tos FAQs
- July 15, 2021
Let’s look at how you can connect DBT to multiple databases in the same project. While this isn’t a native function to DBT, there are some workarounds to help make this connection possible.
DBT can establish connections to several databases within a single project using Trino.io. There’s an available GitHub example project that provides a complete working configuration that you can reproduce and customize to your needs.
What does Trino do?
Trino, previously PrestoSQL, is a powerful and fast distributed SQL query engine for big data analytics that combines and analyzes data from various sources.
Trino connects to multiple data sources (connectors) via a single connection and processes SQL queries at scale and speed. This tutorial will show you how to:
- Initiate a Trino server.
- Link a Trino instance to a Google BigQuery data set and a localhost instance of PostgreSQL.
- Join two databases using a DBT project: one in BigQuery and the other in an on-premises PostgreSQL instance.
To begin, you have to make a few installations first.
According to the technical guide as illustrated by Trino here, you need to follow these steps to set up a Trino server.
- Set your config and property files in the /etc directory.
- Connect your data sources to your catalogs and define them.
To complete this project, please use the conf, properties, and catalog files located in the trino_etc folder. All you need to do is paste the contents of this /trino etc folder into an existing /etc folder on your Trino server.
You must specify the data sources to which you wish to link Trino, and that’s why for this tutorial, we will connect to a dataset hosted on Google BigQuery and a PostgreSQL database hosted locally.
The postgres.properties file needs to be transferred into the etc/catalog folder you have for Trino. To specify the hostname, database, and credentials of your PostgreSQL database (Trino manual), you must perform the following:
connector.name=postgresql connection-url=jdbc:postgresql://example.net:5432/database connection-user=root connection-password=secret allow-drop-table=true
Note that you need to add the allow-drop-table=true so DBT can delete the table via Trino.
Regarding BigQuery, connect to the DBT public project known as dbt-tutorial. Copy the bigquery.properties file into the etc/catalog folder of your Trino installation, and you need to populate either the bigquery.credentials-file or the bigquery.credentials-key (Trino doc) variable with the project key for your Google Cloud account.
GitHub will handle the rest of the configuration and linking. DBT projects that link to Snowflake or BigQuery can connect to many databases.
There you have it, a way to connect DBT to multiple databases. If you are interested in a one-stop shop for your Snowflake data transformation needs, give Datameer a try with a free trial.