Writing YAML files for dbt can be repetitive, time-consuming, and error-prone — especially when you’re defining sources, staging models, or initial column metadata. Much of it is boilerplate, and while it’s necessary, it often adds little value by hand.
That’s where the dbt-codegen package comes in. It helps automate the tedious parts of your workflow, so you can focus on what actually matters… generating value for stakeholders ofcourse!
Introducing: dbt-codegen
The dbt-codegen package contains a set of useful macros designed to automatically generate dbt code, dramatically reducing manual effort. Later in this post, we’ll dive into a specific use case, but for now, just know that this package is a fantastic tool for eliminating tedious coding tasks.
For a full overview, check out the dbt-codegen documentation.
Installing and setting up dbt-codegen
Installing dbt packages differs slightly from installing traditional Python packages (e.g., using pip). Instead, you’ll follow these simple steps:
Firstly, you have to create either a file called packages.yml or dependencies.yml in your dbt project folder. There are a few differences between the two, and for a better explanation check out the dbt documentation on the subject. But for now, let’s assume a packages.yml file will suffice. Make sure this file is in the same folder as your dbt_project.yml file.
Then, paste the following code into your packages.yml file.
packages:
- package: dbt-labs/codegen
version: 0.13.1
Note: If you are reading this from the future, make sure to replace 0.13.1 with the latest version.
Finally, your package will get installed after you run dbt deps in your terminal. If all goes well, your package gets installed in the dbt_packages folder in your dbt project (which is git-ignored by default).
Have a look inside that folder! You should be able to find the entire dbt project in your dbt_packages folder. As you can see, the ‘package’ looks very similar to your own dbt project. It even has it’s own dbt_project.yml file. More interestingly, the dbt-codegen package even has it’s own packages.yml file and has installed the dbt-utils package! How’s that for package-ception?
Working with the codegen dbt package
Now that you have installed the dbt-codegen package we can finally get some real work done. You can interact and use the codegen package in a few ways: (1) Using the terminal/command line and output directly to the terminal/command line, (2) Using the terminal/command line and output directly to a file and (3) Using a file and compiling your code. In this blog, I will demonstrate the different ways using the generate_source macro.
The generate_source macro is used for generating yaml code for a specific source. For example, if you have a schema for data ingestion in your data warehouse with raw data you can simply pass your schema name and codegen will generate all the code for your source tables.
Option 1: Generate code in the command line
To run a macro from the codegen package in your terminal, you use the following syntax:
dbt run-operation generate_source --args 'schema_name: <schema_name>'
Note: If you get this scary looking error: Error: Invalid value for '--args': String ''schema_name:' is not valid YAML
Just replace your single/double quotation marks with double/single quotation marks and this should fix it!
Example: Let’s say you have a schema called RAW with three tables. Running the code above will generate the following code in your terminal:
version: 2
sources:
- name: raw
tables:
- name: employees
- name: online_orders
- name: store_sales
Alternatively, you can pass lot’s of different arguments. For this, you have to pass a dict instead of a key/pair above. Have a look here for the extensive list of macros you can use, but for now we want to include the column names. Running the code below will include the column names:
dbt run-operation generate_source --args '{"schema_name": "<schema_name>", "generate_columns": true}'
Have a look at part of the output (I’m only displaying one table here):
version: 2
sources:
- name: raw
tables:
- name: employees
columns:
- name: id
data_type: number
- name: name
data_type: varchar
- name: last_name
data_type: varchar
- name: contact_number
data_type: varchar
- name: date_of_birth
data_type: date
- name: hire_date
data_type: date
- name: role
data_type: varchar
- name: hourly_rate
data_type: number
Option 2: Command line to file
In the method above, we get the yaml code returned to us in the command line. But what if we want to directly output this to a file? Saving us lot’s of manual labor of copying and pasting. You simply add the –quiet flag and a location to the terminal command above. It’s that simple!
$ dbt --quiet run-operation generate_source --args 'schema_name: <schema_name>' > models/staging/<your_folder>/_sources.yml
Option 3: Copy the macro in a file and compile your code
This method allows you to type the macro you want in a file, and after compiling the macro you will get beautiful code as return.
For example, you type or copy the following macro in a VSCODE sql file:
{{
codegen.generate_source('raw', generate_columns = true)
}}
Then simply press your compile view in the top right and you should see your code being generated with the speed of light. Saving you valuable time.
The dbt-codegen package is a small but mighty addition to your dbt toolkit. By automating repetitive tasks, it helps you move faster and focus on the fun part—building better models.
Try it in your next project and let me know what you think!