Don’t test with dbt?

Piotr Sieminski · 2025

Don’t test with dbt?

Controversy

We shall NOT test everything with dbt out-of-the-box tests without any additional configuration. It does not help our data warehouse in the long run.

Rationale

We should be testing the data to improve its quality — right? Well, that’s correct. We should be doing that with intention though and not for the mere sake of doing so.

One can easily fall in a pitfall of adding simple tests everywhere the same way, as we fall in a trap of ingesting absolutely everything and then not using it to produce customer-centric insights for our business.

Out-of-the box dbt tests are simply SQL queries executed against your warehouse. Imagine a dataset with billions of rows and more than 50 columns being tested for accepted_values on every day processing — even if your model is incremental. Scanning the whole dataset and then retrieving the results is going to be a costly operation.

Alternative

Within dbt

First and foremost, if we want to stick to dbt tests, we have to:

  1. Carefully select models and columns we want to test — not to duplicate efforts or test unnecessarily.
  2. Add additional filters / configurations / incremental scanning — dbt allows for adding extra parameters such as where clause — this is a good start to limit what you want to test in the model.
  3. Consider adding custom tests, which run only on an increment of the model that was processed that day.

Third point is most interesting and let me show you an example of how we limited the scanning costs of tremendous GA4 data sets with custom testing and automated records deletion.

Goal: to test each increment processed (once per day processing of new events with 1 day lookback window; see here for custom lookback windows in dbt).

Steps:

  • Load the next increment.
-- setting incremental keys

{%- set lookback_window_in_days = config.get('lookback_window_in_days', default='1') -%}

-- starting table processing

with ga_4_source as (

{{ ga_incremental_source_loading('source', 'ga4', this, lookback_window_in_days) }}

)

-- ga_incremental_source_loading macro is quite complex in our use case
-- please assume we select from the source and add a lookback window
  • Delete events older than 5 days (can be adjusted to your needs).
-- part of the model definition

-- Conditionally delete old records if this is an incremental run
{% if is_incremental() %}
{{ ga_testing_automated_cleanup('model_name', 'schema_name', 'delete_cutoff_in_days') }}
{% endif %}

-- macro which is called above

{% macro ga_testing_automated_cleanup(model_name, schema_name, delete_cutoff_in_days) %}

{% if target.name == 'prod' %}
{% set delete_statement %}
delete from {{ schema_name }}.{{ model_name }}
where event_date_converted < (select date_trunc(
day
, dateadd(day, -{{ delete_cutoff_in_days }}::int
, max(t.event_date_converted))
) from {{ schema_name }}.{{ model_name }} t);
{% endset %}

{{ log("Executing: " ~ delete_statement, info=True) }}

{% do run_query(delete_statement) %}

{% endif %}

{% endmacro %}
  • Test the increment with 5 days history (for anomaly detection I highly recommend dbt elementary package which is open source).

This step I leave yourself to define. It can be any custom / generic test you want. In our case, we mostly tested for nulls, anomalies with dbt elementary package and uniqueness of the primary key we created.

Outside of dbt

Monitoring tools! Yes, I know, they cost money.. but hey, they are a great alternative — everything depends on your predefined testing needs.

The tools I used, are Sifflet and Monte Carlo. They come packed with testing scenarios, but you are free to use your own, custom SQL as well.

Why would you choose them over dbt testing?

Well, for starters, they are easier to mainaint for less technical people. They also allow for smooth integration with Slack, Teams or other communication tools you may use. They employ ML models to help you uncover anomalies in your datasets you did not know you had. Last but not least — they are quick to get first insights from.

Enhanced testing inside or outside of dbt?

Inside

Improving your dbt testing with custom setup and packages is for more mature teams, who have people who are able to clearly define the needs and implement custom settings for, for example, anomaly detection tests.

This helps with customisation of your projects and gives you much more freedom, which comes at the cost of having the team spend their time on custom definitions and maintenance.

Outside

External tools are a great choice for less mature teams or teams with lower headcount. They simplify the deployment. Time-to-test from scratch is reduced to minimum — within few simple clicks you can have most advanced ML models working for you. They also require little maintenance and offer custom views of usage of your tables as well as other features you might want to employ in your data observability journey.

This allows for fast implementation and requires little technical proficiency to make a good use of it. However, it comes at the cost of the licence, additional tool to setup and potentially some team training.

Summary

The conclusion is clear. Do test your data, but do it with intention — not for the mere sake of testing. If you have smaller and/or less mature team (but budget available) — going for external testing tools is likely the option for you. Whenever you have a larger and/or more mature team, which requires maintaining everything in code and high customisation — then go ahead for parameterised dbt testing enhanced with additional packages provided.

Contact Me

Thanks for reading. Are you liking the information received but lacking time or skillset to get your analytics engineering sorted? Check out my contact details.


Don’t test with dbt? was originally published in Lortech Solutions Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Also on Medium →
How we made AI analytics work smoothly?
Joachim Hodana · Mar 2026

How we made AI analytics work smoothly?

How is consulting going to make your life easier?
Piotr Sieminski · Feb 2026

How is consulting going to make your life easier?

No idea where your Data Warehouse spend goes?
Joachim Hodana · Feb 2026

No idea where your Data Warehouse spend goes?