Skip to main content
Uncategorized

ABC of DBT tests

By november 19, 2023maart 5th, 2024No Comments

What is a test in DBT?

Tests are assertions that you make about your models and other resources in your DBT project, such as sources, seeds and snapshots. Defining tests in your project allows you to ensure data quality. Running the commands ‘dbt test’ or ‘dbt build’ will execute the tests, informing you of whether each of your assertions is true or false of the data it refers to; that is, whether each test has passed or failed.

Like almost everything in DBT, tests are SQL queries. They are select statements that seek to grab records that disprove your assertion. For example, if you assert that a column is unique in a model, the test query selects for duplicates; if you assert that a column is never null, the test seeks after nulls. A test passes when it returns zero failing rows, in which case your assertion has been validated. 

In a word, tests ensure that your code continues to work as expected, maintains data accuracy and saves time. 

Kinds of tests:

Singular tests: 

Singular tests are .sql files that live in the ’tests’ subfolder. If, when executed, the SQL code returns rows, then the test fails. The tests will act on any models specified using the ‘ref’ function — which is what makes them ‘singular’ as opposed to generic. Using the ref function ensures that the tests will run after the models are built, when running ‘dbt build’. These tests can apply to a single column, to multiple columns or to a whole table. They can be used to test both raw data and models.

Generic tests:

Just like specific tests, generic tests are SQL code; but, unlike them, they are reusable: the same generic test can be executed on more than one model. The user simply needs to define them in a test block before referencing them in a .yml file located in any folder throughout the project. And, like their singular counterparts, they can apply to a single column, to multiple columns or to a whole table, as well as be used to test both raw data and models.

Dbt provides users with four pre-defined generic tests:

  • unique: fails if there are duplicate values in a column.
  • not_null: fails if there are any null values in a column.
  • accepted_values: takes as input a list and fails if any of the values in a column are not contained in the list.
  • relationships: takes as input two columns and fails if any of the values in one column is not contained in the other.

There are tons of DBT packages with diverse and useful pre-defined tests. It is always worth doing a quick search before going and writing your code.

Source freshness tests: 

Freshness tests only run on sources. Hence, they can only be defined in your staging .yml files. They test when was the last time that the data was updated. They act on a declared column in the underlying raw data. This column is typically called ‘lodaded_at’ and takes the form of a timestamp, indicating when the data was loaded. 

To test freshness, use the ‘dbt source freshness’ command rather than the ‘dbt tests command’. This command takes the highest timestamp on the table — executing max(loaded_at) — and then compares that watermark against your configuration in the .yml file. 

You can set up dbt source freshness to execute at the top of your builds, so that if the test fails no downstream models will be built. This prevents DBT from rerunning a bunch of models even though the data has not changed. 

Project tests:

Project tests test the project as a whole. They are defined on the project.yml file. Contrary to other tests, project tests act on your .yml files. They test whether or not you have tests or documentation defined on .yml files for specific models. Hence, they are useful to ensure that you are actually testing everything you’d like to.

To run project tests you will usually have a package installed that will include a macro specifically designed for this purpose. Then you can run the macro citing it in the ‘dbt run-operation’ command. The macro will iterate over your project and ensure that the tests exist or not as specified. They can also be set up to run as a result of a github action, such as a merge. 

Sources:

https://docs.getdbt.com/docs/build/tests

https://docs.getdbt.com/reference/commands/test

https://docs.getdbt.com/docs/deploy/source-freshness

https://docs.getdbt.com/reference/test-configs

Auteur

Leave a Reply