Skip to main content

As your organization grows, so will your tables.

A full rebuild of your models might become too time-consuming for business needs. To help keep large databases running efficiently, dbt provides the incremental model materialization, that allows you to update slow models instead of fully rebuilding them.

Let’s take a look.

Basic Configuration

Basic configuration of an incremental model is easy as adding a few lines of code to a regular model.

First, update the model materialization to “incremental,” either in dbt_project.yml or in the config block at the top of the model. Then, use a block of jinja to handle the incremental behavior.

{{
    config( materialized='incremental')
}}
select ...
from ...
{% if is_incremental() %}
  where event_time > (select max(event_time) from {{ this }})
{% endif %}

The is_incremental() function returns true if the model already exists in the database the model is configured as incremental, and the --full-refresh tag has not been passed to dbt run.

The where clause specifies to select only those rows that have an event_time greater than the maximum of what it already finds in the database.

What is selected will be appended to the model that already exists in the database.

{{ this }} is a tricky keyword that dbt uses to allow you to reference the current model without creating cycles in your DAG.

Not Just Appending – Updating!

Optionally, incremental models allow you to define a unique key, specified in the config block of the model, which allows dbt to check replace rows that are already defined in the cached model.

{{
    config(
        materialized='incremental',
        unique_key='date_day'
    )
}}

If you need to use a composite key, dbt recommends passing the columns as a list, or defining a surrogate key using dbt utils.

When to Use an Incremental Model

dbt suggests using incremental models when source data has millions of rows or more, or when transformations are computationally expensive, such as complex regex functions, or UDFs.

Schema Changes

If the columns of an incremental model change, these materializations have default and configurable behavior. By default, the on_schema_change parameter is set to “ignore,” meaning that when new columns are added to a model, and only execute dbt run, these columns won’t appear in the target table. Similarly, dropped columns won’t be automatically removed.

The on_schema_change parameter can be configured to instead ‘fail’ and trigger an error message; to ‘append_new_columns‘ to the existing table; and to ‘sync_all_columns‘ and update the columns including removing newly missing columns.

Auteur

  • Edoardo Draetta

    Dedicated Analytics Engineer with a strong background in physics, bringing a unique blend of analytical rigor and scientific acumen to the world of data-driven decision-making. With a passion for unraveling complex phenomena and a keen eye for patterns, my solid foundation in physics equips me with a deep understanding of quantitative analysis, statistical modeling, and computational techniques, enabling me to develop innovative solutions to intricate problems.

Edoardo Draetta

Dedicated Analytics Engineer with a strong background in physics, bringing a unique blend of analytical rigor and scientific acumen to the world of data-driven decision-making. With a passion for unraveling complex phenomena and a keen eye for patterns, my solid foundation in physics equips me with a deep understanding of quantitative analysis, statistical modeling, and computational techniques, enabling me to develop innovative solutions to intricate problems.

Leave a Reply