Skip to main content

Documenting a database structure is as vital as it is often overlooked. Many times the need for an extensive documentation can end up an afterthought, requiring a catching up after finishing the project to write out where everything is and what it does. Documentation in dbt aims to offer an alternative, with autogenerated documentation over the entire course of the project.

Documentation in dbt: DAG and Descriptions

Generating documentation in dbt is as straightforward as executing the command dbt docs generate, but what does it actually do? First of all, it generates a browsable structure of all models built as part of the project. On top of that, it will also create a visualization of the dependencies between models visualized as a directed acyclic graph, or DAG. While models will automatically appear in the visualization, sources (the green blocks) need to be explicitly declared.

Documentation in dbt: The DAG

In addition to the automatically generated structure, dbt will integrate all declared descriptions throughout the project in its documentation. You can define descriptions in the same YAML file you do all other configurations, like settings tests. Simply use the description field on a model of column, and the docs generate command will make sure it ends up in the right place.

models:
  - name: top_creators
    description: This table provides a creator-centric view on the metrics of their videos
    columns:
      - name: CHANNEL_ID
        description: Identification string of the creator's channel
        tests:
          - unique
Generated descriptions
A generated documentation, with model and column descriptions pulled from YAML configurations.

Docs blocks and markup

Not all descriptions fit neatly in a single line, and some (such as for columns with a discrete number of options) might have more than one use case. For these situations dbt offers the docs block, a place to define multi-line marked up descriptions. YAML configuration files can refer to a docs block much in the same way as references between models. Let’s take the following docs block defined in a centralized description file:

{% docs like_sequence %}

Contains an array with the number of likes, from the startdate to the enddate of the streak.

{% enddocs %}

We can now include this description in any model or column configuration with a doc reference.

    columns:
      - name: like_sequence
        description: '{{ doc("like_sequence") }}'

Descriptions are not just limited to plain text either. Using square-bracket markup code, dbt can include images stores both externally and internally with:

![image title](image-url)

Note that to refer to internally stored images you must first define an asset path in the main project YAML so that dbt knows where to look for images.

Auteur

  • Chris Verweij

    Molecular biologist turned analytics engineer, taking both her knowledge and frustrations about data management in a laboratory environment to the traineeship. She lives with two black cats and a thousand books in Wageningen.

Chris Verweij

Molecular biologist turned analytics engineer, taking both her knowledge and frustrations about data management in a laboratory environment to the traineeship. She lives with two black cats and a thousand books in Wageningen.

Leave a Reply