Automating Salesforce → dbt Models: Dynamic Metadata-Driven Modeling

The Task

Client had a simple requirement: ingest Salesforce data into dbt. We asked how many objects. They sent a list of 20+.

We knew right away we didn’t want to spend the whole day writing repetitive SQL models, copying field names, and maintaining YAML files for each one. Every time Salesforce changed a field, we’d have to fix everything manually. Not an option.

The Approach

We use Airflow as orchestrator, so we started by adding a DAG responsible for schema extraction and data sync. The goal was to fetch Salesforce metadata dynamically and push it into Snowflake for dbt to consume.

Step 1 — Fetching Salesforce Schema

Salesforce provides an API endpoint that describes object metadata:

GET /services/data/v59.0/sobjects/{OBJECT_NAME}/describe/

Docs: https://developer.salesforce.com/docs/atlas.en-us.api_rest.meta/api_rest/resources_sobject_describe.htm

Response snippet:

{
  "fields": [
    {"name": "Id", "type": "id"},
    {"name": "Name", "type": "string"},
    {"name": "OwnerId", "type": "reference"},
    {"name": "CreatedDate", "type": "datetime"}
  ]
}

In Airflow, we wrapped it as a simple Python task: every DAG run iterates through the list of objects and refreshes metadata for each of the one from predefined list. Everything is then saved into sf_metadata_fields table in Snowflake database.

Step 2— Storing Raw Objects

Each Salesforce object was also stored separately in Snowflake. We didn’t want to merge all data into a single dataset, because every object had a completely different structure and it was easier to find them in snowflake this way.

During ingestion, we flattened the raw API response and stored it in per-object tables under raw_data schema. Each record contained a single column record VARIANT, holding the entire JSON payload from Salesforce:

create or replace table raw_data.account as
select
    metadata$filename as file_name,
    metadata$file_row_number as row_number,
    to_variant($1) as record,
    current_timestamp() as _load_time
from @salesforce_stage/account/*.json.gz;

That approach allowed us to keep ingestion generic — one ingestion operator, multiple objects — without defining column mappings upfront. dbt later read metadata from sf_meta_fields and casted record fields dynamically inside generated staging models.

Step 3— Dynamic dbt Model Generation

Once schema metadata is in Snowflake, dbt macros take over. We wrote a macro that generates staging models dynamically. We start with fetching metadata from the Snowflake table using run_query() inside a helper macro:

{% macro sf_get_object_fields_with_types(object_name) %} -- salesforce object name
    {% set query %}
        select field_name, field_type
        from {{ source('salesforce', 'sf_meta_fields') }}
        where object_name = '{{ object_name }}'
    {% endset %}
    {% set results = run_query(query) %} -- run_query 
    {% if execute %} -- ensure macro only runs in "execute" mode (not during dbt compile dry-run)
        {% set fields = [] %} -- iterate through each row returned by run_query()
        {% for row in results.rows %}
            -- append each (field_name, field_type) pair to the fields list
            {% do fields.append({
                "field_name": row[0],
                "field_type": row[1]
            }) %}
        {% endfor %}
        -- return the full list of dictionaries to the parent macro
        {% do return(fields) %}
    {% endif %}
{% endmacro %}

This macro runs at compile time — run_query() executes a SQL statement against Snowflake during dbt compilation, returning the list of columns and their types from our metadata table.

That result is then consumed by the generator macro to produce a fully typed model automatically:

{% macro generate_stg_model(object_name) %}
  {% set fields = sf_get_object_fields_with_types(object_name) %}
  select
    {% for f in fields %}
      record:"{{ f.field_name }}"::{{ f.field_type }} as {{ f.field_name }}{% if not loop.last %},{% endif %}
    {% endfor %}
  from {{ source('salesforce', object_name) }}
{% endmacro %}

That’s it — dbt executes run_query() inside the macro at compile time, pulling metadata directly from Snowflake and generating a full SQL model dynamically.

Example command:

{{ generate_stg_model("Account") }}

which outputs:

select
  record:"Id"::varchar as Id,
  record:"Name"::varchar as Name,
  record:"OwnerId"::varchar as OwnerId,
  record:"CreatedDate"::timestamp_ntz as CreatedDate,
  ...
from {{ source('salesforce', 'Account') }}

Step 4— Generating YAML Automatically

We later automated schema documentation using dbt-codegen:

dbt run-operation generate_model_yaml --args '{"model_name": "stg_salesforce_account"}'

and merged field descriptions from sf_meta_fields to enrich the YAMLs with Salesforce field metadata.

Keep in mind you have to add dbt-codegen to packages.yml and use dbt deps before using this command, otherwise dbt won’t recognize the macro.

The Result

20+ models built and documented automatically
Schema drift handled automatically on next Airflow run
No manual typing, no copy-paste errors, no maintenance overhead

Lesson Learned

If your task involves modelling data from external CRMs like Salesforce, HubSpot, or Dynamics — always check for “describe” or metadata endpoints. They’re not popular, but they can turn a full day of manual work into a fully automated workflow that self-updates and self-documents.

Joachim Hodana - Software & Data Engineer

Automating Salesforce → dbt Models: Dynamic Metadata-Driven Modeling was originally published in Lortech Solutions Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.