A Review of Dbt (Data Build Tool) and the Top Missing Features
- John Morrell
- September 21, 2021
A Review of Dbt and the Top Missing Features
Dbt has become a market darling for data transformation tools – the T in your ELT data stack and processes. The simplistic approach of dbt – SQL coding – is both its biggest strength and weakness. I have for you a review of Dbt and the top missing features.
We took a detailed look at dbt examining how it works, exploring how companies use it, and its overall ability to solve the data transformation problem. We published a more comprehensive review of dbt here and compared it to our offering, Datameer.
At the very simplest level, dbt is really an Interactive Development Environment (IDE) for SQL data modeling. Just one look at the dbt UI, and you’ll see an IDE, with items arranged into projects (folders) and workspaces where you write your code. And almost every operation you perform outside of coding is done via their Command Line Interface (CLI), just like in your typical IDE. Integration with GitHub also adds to the IDE look-and-feel.
Even though dbt claims to try to deliver the task of “analytics engineering” to everyday data analysts, the need for deep SQL expertise and mild to average Python skills to use the Jinja templating language really prevents dbt from reaching this goal. The company articulates that vision for a coding-centric approach in this quote from their website:
At dbt Labs, we have developed strong opinions on how companies should practice analytics. Specifically, we believe that code, not graphical user interfaces, is the best abstraction to express complex analytic logic.
In our review of dbt, we found five specific features the tool was missing to better faciliate the data transformation process. Let’s explore these five missing features.
You Have to Code
With dbt, you have to code your data models in SQL and write your reusable components (macros) in Jinja, a Python-like templating language. The entire user experience is an IDE focused on coding and managing code.
Most data analysts will have some SQL skills and potentially some limited Python skills. Data scientists will typically be strong in both SQL and Python. But with dbt requiring strong SQL and Python skills, this will present a barrier for data analysts to use the product. This will often leave dbt to be used by data engineers who have strong SQL and Python skills.
It’s Command Line
Although dbt provides a graphical IDE, many of the operations users perform, such as compiling and running SQL, running projects, and version control, are done via a command-line interface. This is typical for an IDE and would be good for more technical data engineers who are used to using CLI tools, especially for databases and data warehouses. But in modern analytics tools such as Tableau and Looker, which analysts use, don’t typically have a CLI – most operations are performed via the UI. Using a CLI will present another barrier to adoption by data analysts.
No Data Profiling
When you open a SQL file in dbt, it only offers a simple snapshot of the data from the last time the project was run. Dbt does not maintain detailed data profiles, with users having to blindly write SQL with limited visibility into the data. A good data transformation tool maintains a deep data profile that is expressed visually to users so they can see the full shape and contents of the data as they transform it. This easily allows users to do things like identifying invalid, missing, or outlying fields and values, which is data cleansing, an essential part of data transformation.
Documentation = Metadata + Comments in Code
Dbt claims to aid the process of documenting data to spread knowledge about the data. It claims to auto-document the information about a data model. In reality, the data documentation is a wiki-like page attached to a SQL model, and the automated documentation only captures technical metadata and information in comments left in the code by the SQL developers. This documentation does not meet the goal of sharing real information about the data among users. In addition, the data documentation is not searchable.
Complex Data Enrichment
Data enrichment is the process by which data engineers or analysts add new columns to the data to transform from one format to another or to include new calculated fields as values or dimensions. This is an essential part of the data transformation process to make data more valuable for analysis.
Since all data transformation in dbt is performed via SQL coding, any data enrichment must be written in SQL functions. This dramatically complicates the process of enriching the data for analysis.
While it is a market darling at the moment, dbt has numerous gotchas that prevent it from reaching its’ stated goal of letting data analysts perform their own data transformations without the need of data engineering teams. The fact that the product embraces software development metaphors, languages, and IDEs makes it a tool that would be highly foreign to the data analyst community.
Datameer explicitly embraces both the data analyst and data engineering communities with its’ new SaaS data transformation solution. Its hybrid code/low-code-no-code user interface, catalog-like data documentation and information sharing, and easy data enrichment deliver an easy-to-use, familiar toolset for data analysts, give data engineers the control they need, and foster collaboration between the communities.
Are you interested in seeing Datameer in action? Sign up for your free trial today!