You may be a data engineer or data quality engineer who uses the dynamic ETL tool dbt from Fishtown Analytics. If so, it's possible that you've incorporated a few automated schema tests into your dbt project. dbt's built-in testing framework is a powerful yet light way to validate data transformations, sources, and the data itself.
As you likely know, you can do even more with dbt and its related tools. Packages in dbt are a nice way to add useful and more complex functionality to your data transformation project with what are essentially dependencies.
In this case, we'll focus on the dbt_utils package, which is an aptly named utilities package officially made and maintained by the same folks who develop dbt. I'll show you how to add the package if you don't already know how, and then share some examples of dbt_utils automated schema tests I find useful.
Note: One great advantage of dbt_utils schema tests is that they can test at the model or column level.
4 Easy Steps to Install
Let's do this!
1. Add Packages.yml to your root dbt folder
Create a new file at the root level in your dbt project, and name it packages.yml.
2. Add dbt_utils to the packages
Get the latest package code from this official page, then copy and paste it into your packages.yml file.
Today the code is:
packages: - package: fishtown-analytics/dbt_utils version: 0.7.0
Save your packages.yml file. It should look something like this:
3. Run dbt deps locally
Open your Command Line or Terminal wherever you control your dbt project and send commands. Run the following command to install the dbt_utils package on your machine so you can use it.
You should see something like this:
If so, success! The package is installed and you can go about using your new tests.
4. Use Your New Tests
Much like dbt's built-in testing, dbt_utils automated schema tests should be placed under a column or model being tested under the "tests" configuration. That's essentially how they are integrated!
dbt_utils Schema Tests
Now you can use a whole host of new dbt automated schema tests and even more in the dbt_utils package. For now I'll show you a couple of my favorite dbt_utils schema tests.
Recency - Model Level Test
Recency is a configurable schema test that checks to see if the timestamp column referenced was changed within the timeframe. I.E. you can see if the model loads every day, hour, week, etc.
models: - name: model_name tests: - dbt_utils.recency: datepart: day field: created_at interval: 1
Equal_Rowcount - Model Level Test
I find equal_rowcount to be a fantastic test for checking to make sure nothing changed in my sources during ingest or transformation. If I expect two models to contain the same number of rows, it's an easy reference to both from the target model.
models: - name: model_name tests: - dbt_utils.equal_rowcount: compare_model: ref('other_table_name')
dbt_utils is a truly powerful package that can add a lot of functionality to your project, and it costs nothing to install. Other schema tests include validation for cardinality, unique with SQL conditions, values not accepted, row equality, row count, and even more.
Thanks for Reading!
I hope this was useful for you, and that you are excited by the prospect of adding more cool functions to your dbt project like I am! If you'd like to share more of your experience with dbt or data engineering, please feel free to comment or get in touch with me at any time. I have a dbtTestExamples repository on GitHub that is open for anyone to try their hand at dbt tests using Google BigQuery.
Top comments (0)