Documenting a database structure is as vital as it is often overlooked. Many times the need for an extensive documentation can end up an afterthought, requiring a catching up after finishing the project to write out where everything is and what it does. Documentation in dbt aims to offer an alternative, with autogenerated documentation over the entire course of the project.
Documentation in dbt: DAG and Descriptions
Generating documentation in dbt is as straightforward as executing the command dbt docs generate, but what does it actually do? First of all, it generates a browsable structure of all models built as part of the project. On top of that, it will also create a visualization of the dependencies between models visualized as a directed acyclic graph, or DAG. While models will automatically appear in the visualization, sources (the green blocks) need to be explicitly declared.
In addition to the automatically generated structure, dbt will integrate all declared descriptions throughout the project in its documentation. You can define descriptions in the same YAML file you do all other configurations, like settings tests. Simply use the description field on a model of column, and the docs generate command will make sure it ends up in the right place.
models:
- name: top_creators
description: This table provides a creator-centric view on the metrics of their videos
columns:
- name: CHANNEL_ID
description: Identification string of the creator's channel
tests:
- unique
Docs blocks and markup
Not all descriptions fit neatly in a single line, and some (such as for columns with a discrete number of options) might have more than one use case. For these situations dbt offers the docs block, a place to define multi-line marked up descriptions. YAML configuration files can refer to a docs block much in the same way as references between models. Let’s take the following docs block defined in a centralized description file:
{% docs like_sequence %}
Contains an array with the number of likes, from the startdate to the enddate of the streak.
{% enddocs %}
We can now include this description in any model or column configuration with a doc reference.
columns:
- name: like_sequence
description: '{{ doc("like_sequence") }}'
Descriptions are not just limited to plain text either. Using square-bracket markup code, dbt can include images stores both externally and internally with:
![image title](image-url)
Note that to refer to internally stored images you must first define an asset path in the main project YAML so that dbt knows where to look for images.