Macros in dbt are reusable pieces of code that ensure that the code is ‘DRY’ (don’t repeat yourself), making it easier to understand and quicker to write. If you’re interested in learning more about macros, I recommend checking out my previous blog post on Jinja.
Now let’s see what possibilities you have with macros in dbt.
Macros for DRY Code
What does it mean to be DRY? DRY stands for Don’t Repeat Yourself. In dbt, this means try to write reusable code whenever possible in order to make it more readable. But most of all, try to write reusable code to make it easy to maintain.
For instance, consider two tables with temperature information from two sensors, one inside the kitchen and the other outside. These sensors get the temperature in Kelvin but you want to have it in Celsius. Here is a possible query, where I join the two tables on the time and I convert Kelvin to Celsius:
</p>
SELECT
sensor_1.time
, (sensor_1.temp_kelvin - 273,15) as kitchen_temp_celsius
, (sensor_2.temp_kelvin - 273,15) as outside_temp_celsius
from sensor_1
join sensor_2
on sensor_1.time = sensor_2.time
<p>
As you can see I wrote the same piece of code twice, which is not that bad. But what if I add another sensor to my analyses? And what if at some point in time I want to round those results to one decimal point. I’ll have to change manually every line of code where the conversion is made.
This seems the right time to write a macro to handle the problem:
</p>
{% macro kelvin_to_celsius(column_name, round_digits = 2) %}
round({{ column_name }} - 273,15, {{ round_digits }})
{% endmacro %}
<p>
With that we can rewrite the sample code as:
</p>
SELECT
sensor_1.time
, {{ kelvin_to_celsius('sensor_1.temp_kelvin') }} as kitchen_temp_celsius
, {{ kelvin_to_celsius('sensor_1.temp_kelvin') }} as outside_temp_celsius
from sensor_1
join sensor_2
on sensor_1.time = sensor_2.time
<p>
Macros for Scheduled dbt Jobs
Now that we understood how to exploit macros to achieve DRY code, let’s see how macros can be used to execute repeated actions. In particular, macros can be used to run specific commands in your database. These command can be of any kind: DDLs, permission setting, etc.
Let’s imagine for example that we want to delete records from a table that satisfy certain conditions, and we want to do that periodically with a scheduled job. In the following example I will use again the sensor_1 table and my objective is to delete records older than 5 years.
In order to accomplish my goal, I require a custom macro that utilizes multi-line variable setting and the run_query macro to execute the command in my database:
</p>
{% macro delete_old_records(years=5) %}
{% set query %}
delete from sensor_1 where datediff(year, time, current_timestamp()) > {{ years }};
{% endset %}
{{ run_query(query) }}
{% endmacro %}
<p>
After doing that I could simply call this macro in a scheduled job. I can also set a different value for years:
</p>
dbt run_operation delete_old_records --args '{years: 3}'
<p>
As you can see, you can pass arguments to macros, whether they are positional or named, using the –args keyword and then passing a dictionary-like structure with the name of the variable and the value. Keep in mind to enclose this in single quotes and to add a space between the colon and the value!