Skip to main content

Writing a custom generic test in dbt allows you to test your data on your own assumptions. For a project that we worked on, I have written a test to check a column that has integers that represent a ranking. The project included many data transformations, and thus I wanted to check that in my end table the rank values made sense (i.e. no duplicates or unexpected values).

Custom generic test

Writing a test in dbt is similar to writing a macro. Since Jinja is based on Python, it also has a familiar structure if you have written Python functions before. To make sure that dbt knows we are writing a test, we specify a test block. Start your block with {% test, followed by your test name and at least one of the parameters model or column_name.

{% test my_test(model, column_name) %}

Why do we need at least one of these parameters? It is to point dbt in the right direction. This makes sense when you look at how to apply tests. In the .yml file the test is specified within the model or the column context. A test can either be only for one specific column, or for the whole model. For example, a test for a whole model could compare two columns with each other.

version: 2

models:
  - name: my_model
    tests:
      - my_test
    columns:
      - name: my_column
        tests:
          - my_test

Within the test block you can write your code. The way dbt test work is that it is expecting the test code to return no rows. If any rows are returned, dbt will count them and display them as errors. Always check that your test code returns what you expect it to with some mock data. It should not return any of the rows that are valid according to your assumption. After this check you can paste in your code and end the test block with {% endtest %}.

Check rank test

The logic that I used for my custom generic test in dbt is build up by three smaller checks. Firstly, I check that the number of distinct values in the rank column is equal to the last number of my rank. Secondly, I check that the minimum value of the column is equal to the start of my rank. Usually this would be one, but I could think of examples where you would want to start at a different number. For example, if I want to compare two different sets of data where the first set is the top 10 and the second set is the last 10. Lastly, I check that the maximum value is equal to the last number of my rank.

As I stated, dbt expects no values to be returned if my data is valid. Look at my code below. The get_metrics CTE gathers the metrics used in the logic. That CTE is then called and I filter the rows such that they are not equal to their expected value.

{% test check_rank(model, column_name, min, max) %}
    WITH get_metrics AS (
        SELECT COUNT( DISTINCT {{column_name}})  AS dist_count_rank
                , MIN({{column_name}})    AS min_rank
                , MAX({{column_name}})    AS max_rank
        FROM {{ model }}
    ) 

    SELECT * 
    FROM get_metrics
    WHERE dist_count_rank != {{max}}
        AND min_rank != {{min}}
        AND max_rank != {{max}}

{% endtest %}

Note that we do not hard code any model or column name, but instead use the model and column_name parameters. This is what makes the test generic.

Generic?

The test is generic because it is not specific for only one column in one model. If I want to check the rank in any other column or model, I can use the same test. Furthermore, I gave two additional parameters to the test. In this way, I cannot only use the test for any model or column but also for any rank. The model and column name are automatically passed along by dbt, but how do we specify these two additional parameters? We do this in the .yml file underneath the test, as can be seen below.

version: 2

models:
  - name: fct_model
    columns:
      - name: rank
        tests:
          - check_rank:
              min: 1
              max: 200

Your turn!

That was not so difficult, right? Check out the documentation on writing a custom generic test in dbt, and try to write one yourself for your next project.

Auteur