As your organization grows, so will your tables.
A full rebuild of your models might become too time-consuming for business needs. To help keep large databases running efficiently, dbt provides the incremental model materialization, that allows you to update slow models instead of fully rebuilding them.
Let’s take a look.
Basic Configuration
Basic configuration of an incremental model is easy as adding a few lines of code to a regular model.
First, update the model materialization to “incremental,” either in dbt_project.yml
or in the config block at the top of the model. Then, use a block of jinja to handle the incremental behavior.
{{
config( materialized='incremental')
}}
select ...
from ...
{% if is_incremental() %}
where event_time > (select max(event_time) from {{ this }})
{% endif %}
The is_incremental()
function returns true if the model already exists in the database the model is configured as incremental, and the --full-refresh
tag has not been passed to dbt run
.
The where
clause specifies to select only those rows that have an event_time
greater than the maximum of what it already finds in the database.
What is selected will be appended to the model that already exists in the database.
{{ this }}
is a tricky keyword that dbt uses to allow you to reference the current model without creating cycles in your DAG
.
Not Just Appending – Updating!
Optionally, incremental models allow you to define a unique key, specified in the config block of the model, which allows dbt to check replace rows that are already defined in the cached model.
{{
config(
materialized='incremental',
unique_key='date_day'
)
}}
If you need to use a composite key, dbt recommends passing the columns as a list, or defining a surrogate key using dbt utils.
When to Use an Incremental Model
dbt suggests using incremental models when source data has millions of rows or more, or when transformations are computationally expensive, such as complex regex functions, or UDFs.
Schema Changes
If the columns of an incremental model change, these materializations have default and configurable behavior. By default, the on_schema_change
parameter is set to “ignore
,” meaning that when new columns are added to a model, and only execute dbt run
, these columns won’t appear in the target table. Similarly, dropped columns won’t be automatically removed.
The on_schema_change
parameter can be configured to instead ‘fail’ and trigger an error message; to ‘append_new_columns
‘ to the existing table; and to ‘sync_all_columns
‘ and update the columns including removing newly missing columns.