{"id":3245,"date":"2026-05-05T17:54:05","date_gmt":"2026-05-05T09:54:05","guid":{"rendered":"https:\/\/www.dpriver.com\/blog\/?p=3245"},"modified":"2026-05-05T18:50:41","modified_gmt":"2026-05-05T10:50:41","slug":"what-is-dbt-column-level-lineage","status":"publish","type":"post","link":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/","title":{"rendered":"What Is dbt Column-Level Lineage?"},"content":{"rendered":"<p><strong>Length:<\/strong> About 3,100 words \u00b7 <strong>Reading time:<\/strong> about 14\u201317 minutes<\/p>\n<h2>Short Answer<\/h2>\n<p>dbt column-level lineage maps each output column in a dbt model back to the upstream columns, expressions, filters, joins, and transformations that produced it.<\/p>\n<p>dbt already gives teams a model graph: <code>source \u2192 staging model \u2192 intermediate model \u2192 mart model<\/code>. That graph is useful, but it usually answers lineage at the model or table level. Column-level lineage answers a more precise question:<\/p>\n<blockquote>\n<p>If this dbt model outputs <code>customer_lifetime_value<\/code>, which source columns, intermediate model columns, filters, joins, and aggregations contributed to that value?<\/p>\n<\/blockquote>\n<p>This matters for impact analysis, PII tracking, metric debugging, data catalog accuracy, governance, and AI data workflows. To recover accurate dbt column-level lineage, a system usually needs more than dbt refs and sources. It needs dbt artifacts, compiled SQL, catalog context, and SQL semantic analysis.<\/p>\n<h2>Key Takeaways<\/h2>\n<ul>\n<li><strong>dbt model-level lineage<\/strong> shows how models depend on other models, sources, and exposures.<\/li>\n<li><strong>dbt column-level lineage<\/strong> shows how each output column depends on upstream columns and expressions.<\/li>\n<li>Model-level lineage can tell you that <code>fct_orders<\/code> depends on <code>stg_orders<\/code>; column-level lineage should tell you that <code>fct_orders.gross_revenue<\/code> depends on <code>stg_orders.amount<\/code>, <code>stg_orders.status<\/code>, and possibly exchange-rate or tax fields.<\/li>\n<li>Accurate dbt column lineage usually requires analyzing <strong>compiled SQL<\/strong>, not raw Jinja SQL.<\/li>\n<li>Data catalogs such as DataHub or OpenMetadata can display lineage, but complex SQL still needs a semantic lineage engine to recover field-level dependencies.<\/li>\n<li>dbt column-level lineage is useful for impact analysis, PII flow tracing, metric explanation, CI checks, and SQL governance.<\/li>\n<li>A practical sidecar approach is to read dbt artifacts, analyze compiled SQL, generate <code>column_lineage.json<\/code>, and emit lineage into DataHub, OpenMetadata, SQLFlow, or CI workflows.<\/li>\n<\/ul>\n<hr \/>\n<h2>Why dbt Column-Level Lineage Matters<\/h2>\n<p>dbt gives teams a useful model graph, but many data teams need a more precise view of how individual columns move through models, joins, filters, aggregations, and transformations.<\/p>\n<p>This matters when you need to answer questions such as:<\/p>\n<ul>\n<li>If a source column changes, which downstream dbt model columns are affected?<\/li>\n<li>Which columns contribute to a critical metric?<\/li>\n<li>Where does sensitive data flow across staging, intermediate, and mart models?<\/li>\n<li>Can a data catalog such as DataHub or OpenMetadata show field-level impact, not only model-level dependencies?<\/li>\n<\/ul>\n<p>To answer those questions, teams usually need to combine dbt artifacts, compiled SQL, catalog context, and SQL semantic analysis.<\/p>\n<hr \/>\n<h2>dbt Already Has Lineage \u2014 So What Is Missing?<\/h2>\n<p>dbt projects are built around dependencies. A typical dbt project contains sources, staging models, intermediate models, marts, tests, exposures, and documentation. dbt understands references such as:<\/p>\n<pre><code class=\"language-sql\">select *\nfrom {{ ref('stg_orders') }}\n<\/code><\/pre>\n<p>and sources such as:<\/p>\n<pre><code class=\"language-sql\">select *\nfrom {{ source('raw', 'orders') }}\n<\/code><\/pre>\n<p>From those references, dbt can build a model graph:<\/p>\n<pre><code class=\"language-text\">raw.orders\n  \u2192 stg_orders\n  \u2192 int_order_revenue\n  \u2192 fct_customer_revenue\n  \u2192 dashboard \/ exposure\n<\/code><\/pre>\n<p>That graph is valuable. It helps analytics engineers understand build order, dependency structure, and high-level impact. If <code>stg_orders<\/code> changes, the team can see which downstream models may be affected.<\/p>\n<p>But this is usually not enough for column-level questions.<\/p>\n<p>For example, suppose <code>raw.orders.discount_amount<\/code> changes type. A model graph can show that several downstream models depend on <code>raw.orders<\/code>, but it may not tell you exactly which downstream columns depend on <code>discount_amount<\/code>.<\/p>\n<p>That is the difference:<\/p>\n<ul>\n<li>Model-level lineage answers: <strong>Which dbt models depend on this model or source?<\/strong><\/li>\n<li>Column-level lineage answers: <strong>Which output columns depend on this specific input column, and how?<\/strong><\/li>\n<\/ul>\n<p>Both are useful. They solve different problems.<\/p>\n<hr \/>\n<h2>dbt Model-Level Lineage vs dbt Column-Level Lineage<\/h2>\n<table>\n<thead>\n<tr>\n<th>Question<\/th>\n<th style=\"text-align: right;\">Model-level lineage<\/th>\n<th style=\"text-align: right;\">Column-level lineage<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Which models depend on <code>stg_orders<\/code>?<\/td>\n<td style=\"text-align: right;\">Yes<\/td>\n<td style=\"text-align: right;\">Yes, indirectly<\/td>\n<\/tr>\n<tr>\n<td>Which mart columns depend on <code>orders.amount<\/code>?<\/td>\n<td style=\"text-align: right;\">No<\/td>\n<td style=\"text-align: right;\">Yes<\/td>\n<\/tr>\n<tr>\n<td>Which metrics are affected if <code>customer_email<\/code> is removed?<\/td>\n<td style=\"text-align: right;\">Usually no<\/td>\n<td style=\"text-align: right;\">Yes, if lineage is accurate<\/td>\n<\/tr>\n<tr>\n<td>Does a PII field flow into a BI-facing model?<\/td>\n<td style=\"text-align: right;\">Limited<\/td>\n<td style=\"text-align: right;\">Yes<\/td>\n<\/tr>\n<tr>\n<td>Does <code>gross_revenue<\/code> depend on a filter, join, or aggregation?<\/td>\n<td style=\"text-align: right;\">No<\/td>\n<td style=\"text-align: right;\">Yes, if influence is modeled<\/td>\n<\/tr>\n<tr>\n<td>Can a PR show which columns lost upstream lineage?<\/td>\n<td style=\"text-align: right;\">Not usually<\/td>\n<td style=\"text-align: right;\">Yes, with column-level diff support<\/td>\n<\/tr>\n<tr>\n<td>Can a catalog explain how a field was derived?<\/td>\n<td style=\"text-align: right;\">Limited<\/td>\n<td style=\"text-align: right;\">Yes<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Model-level lineage is like a map of roads between cities. Column-level lineage is like knowing which pipes, valves, and meters carry a specific flow inside each building.<\/p>\n<p>For governance, debugging, and impact analysis, teams usually need both.<\/p>\n<hr \/>\n<h2>A Simple dbt Example<\/h2>\n<p>Consider a simplified dbt model:<\/p>\n<pre><code class=\"language-sql\">-- models\/marts\/fct_customer_revenue.sql\n\nwith orders as (\n\n    select\n        order_id,\n        customer_id,\n        amount,\n        status,\n        created_at\n    from {{ ref('stg_orders') }}\n\n), customers as (\n\n    select\n        customer_id,\n        country,\n        segment\n    from {{ ref('stg_customers') }}\n\n)\n\nselect\n    c.customer_id,\n    c.country,\n    c.segment,\n    date_trunc('month', o.created_at) as revenue_month,\n    sum(o.amount) as gross_revenue,\n    count(distinct o.order_id) as order_count\nfrom orders o\njoin customers c\n    on o.customer_id = c.customer_id\nwhere o.status = 'paid'\ngroup by\n    c.customer_id,\n    c.country,\n    c.segment,\n    date_trunc('month', o.created_at)\n<\/code><\/pre>\n<p>At the model level, dbt can show:<\/p>\n<pre><code class=\"language-text\">fct_customer_revenue\n  depends on stg_orders\n  depends on stg_customers\n<\/code><\/pre>\n<p>That is correct, but incomplete.<\/p>\n<p>Column-level lineage should go further:<\/p>\n<pre><code class=\"language-text\">fct_customer_revenue.customer_id\n  &lt;- stg_customers.customer_id\n  &lt;- joined through stg_orders.customer_id = stg_customers.customer_id\n\nfct_customer_revenue.country\n  &lt;- stg_customers.country\n\nfct_customer_revenue.segment\n  &lt;- stg_customers.segment\n\nfct_customer_revenue.revenue_month\n  &lt;- stg_orders.created_at\n  &lt;- transformed by date_trunc('month', ...)\n  &lt;- filtered by stg_orders.status = 'paid'\n\nfct_customer_revenue.gross_revenue\n  &lt;- stg_orders.amount\n  &lt;- transformed by sum(...)\n  &lt;- filtered by stg_orders.status = 'paid'\n  &lt;- joined through stg_orders.customer_id = stg_customers.customer_id\n\nfct_customer_revenue.order_count\n  &lt;- stg_orders.order_id\n  &lt;- transformed by count(distinct ...)\n  &lt;- filtered by stg_orders.status = 'paid'\n<\/code><\/pre>\n<p>This output is more useful because it tells the team which columns matter, how they are transformed, and which conditions influence them.<\/p>\n<p>For a data catalog, this can improve field-level documentation. For a governance team, it can identify sensitive-field propagation. For an analytics engineer, it can explain which metrics are affected by a source change.<\/p>\n<hr \/>\n<h2>Why Compiled SQL Matters<\/h2>\n<p>A dbt model file is often not pure SQL. It may contain Jinja, macros, variables, adapter dispatch, conditional logic, and references:<\/p>\n<pre><code class=\"language-sql\">select\n    {{ dbt_utils.generate_surrogate_key(['customer_id', 'order_id']) }} as order_key,\n    amount\nfrom {{ ref('stg_orders') }}\nwhere status = '{{ var(&quot;paid_status&quot;) }}'\n<\/code><\/pre>\n<p>A normal SQL parser should not be expected to fully understand dbt Jinja. The safer path is to let dbt do what dbt does best: compile the project.<\/p>\n<p>After <code>dbt compile<\/code> or <code>dbt build<\/code>, dbt produces artifacts such as:<\/p>\n<pre><code class=\"language-text\">target\/manifest.json\ntarget\/catalog.json\ntarget\/run_results.json\ntarget\/compiled\/...\/*.sql\n<\/code><\/pre>\n<p>The compiled SQL is closer to what the warehouse will actually execute. The manifest also contains dbt metadata such as model IDs, refs, sources, resource types, adapter information, and compiled code.<\/p>\n<p>That is why a practical dbt column-lineage workflow often looks like this:<\/p>\n<pre><code class=\"language-text\">dbt build \/ dbt compile\n  \u2193\ntarget\/manifest.json + compiled SQL\n  \u2193\nSQL semantic lineage analyzer\n  \u2193\ncolumn_lineage.json\n  \u2193\nDataHub \/ OpenMetadata \/ SQLFlow \/ CI\n<\/code><\/pre>\n<p>This approach avoids pretending that raw Jinja is ordinary SQL. It also avoids replacing dbt. dbt remains responsible for compilation and project metadata. The lineage engine analyzes the resulting SQL and maps column dependencies.<\/p>\n<hr \/>\n<h2>Why Column-Level Lineage Is Harder Than Finding Column Names<\/h2>\n<p>A basic extractor can list the columns that appear in a SQL statement. Column-level lineage requires more.<\/p>\n<p>It needs to understand:<\/p>\n<ul>\n<li>aliases;<\/li>\n<li>nested CTE scopes;<\/li>\n<li>subqueries;<\/li>\n<li>joins;<\/li>\n<li>filters;<\/li>\n<li>aggregations;<\/li>\n<li>window functions;<\/li>\n<li><code>case when<\/code> expressions;<\/li>\n<li><code>union<\/code> and set operations;<\/li>\n<li>dialect-specific syntax;<\/li>\n<li>catalog metadata;<\/li>\n<li>compiled SQL generated by macros.<\/li>\n<\/ul>\n<p>For example:<\/p>\n<pre><code class=\"language-sql\">select\n    customer_id,\n    sum(case when status = 'paid' then amount else 0 end) as paid_revenue\nfrom {{ ref('stg_orders') }}\ngroup by customer_id\n<\/code><\/pre>\n<p>The output column <code>paid_revenue<\/code> does not merely depend on <code>amount<\/code>. It is also influenced by <code>status<\/code>, because <code>status<\/code> determines which rows contribute to the sum.<\/p>\n<p>A useful lineage system should distinguish at least these kinds of relationships:<\/p>\n<table>\n<thead>\n<tr>\n<th>Relationship<\/th>\n<th>Example<\/th>\n<th>Why it matters<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>Projection<\/td>\n<td><code>email as customer_email<\/code><\/td>\n<td>Direct field dependency<\/td>\n<\/tr>\n<tr>\n<td>Transformation<\/td>\n<td><code>lower(email)<\/code><\/td>\n<td>Derived field tracking<\/td>\n<\/tr>\n<tr>\n<td>Aggregation<\/td>\n<td><code>sum(amount)<\/code><\/td>\n<td>Metric explanation<\/td>\n<\/tr>\n<tr>\n<td>Filter influence<\/td>\n<td><code>where status = 'paid'<\/code><\/td>\n<td>The output depends on row selection<\/td>\n<\/tr>\n<tr>\n<td>Join influence<\/td>\n<td><code>join customers on orders.customer_id = customers.customer_id<\/code><\/td>\n<td>Output rows depend on join keys<\/td>\n<\/tr>\n<tr>\n<td>Case condition<\/td>\n<td><code>case when risk_score &gt; 80 then ...<\/code><\/td>\n<td>Conditional logic affects derived value<\/td>\n<\/tr>\n<tr>\n<td>Window dependency<\/td>\n<td><code>row_number() over (partition by customer_id order by created_at)<\/code><\/td>\n<td>Ranking and deduplication depend on partition\/order fields<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>This is why dbt column-level lineage is not just a metadata problem. It is a SQL semantics problem.<\/p>\n<hr \/>\n<h2>How dbt Column-Level Lineage Supports Impact Analysis<\/h2>\n<p>Impact analysis is one of the clearest reasons to care about dbt column-level lineage.<\/p>\n<p>Suppose a data platform team plans to rename or remove a source column:<\/p>\n<pre><code class=\"language-text\">raw.customers.email \u2192 raw.customers.email_address\n<\/code><\/pre>\n<p>Model-level lineage can show which models depend on <code>raw.customers<\/code>. But a large project may have hundreds of models downstream of that source. Not every model depends on <code>email<\/code>.<\/p>\n<p>Column-level lineage can answer more specific questions:<\/p>\n<ul>\n<li>Which dbt models use <code>raw.customers.email<\/code>?<\/li>\n<li>Which downstream columns expose or transform that field?<\/li>\n<li>Does <code>email<\/code> flow into a BI-facing mart?<\/li>\n<li>Does it flow into a hashed key, masked field, or derived segment?<\/li>\n<li>Which tests, metrics, or exposures may need review?<\/li>\n<\/ul>\n<p>This can reduce noisy reviews. Instead of telling every model owner that a source table changed, the team can identify the specific downstream columns that depend on the changed field.<\/p>\n<p>That is the difference between broad lineage and actionable lineage.<\/p>\n<hr \/>\n<h2>How dbt Column-Level Lineage Supports PII and Governance<\/h2>\n<p>Column-level lineage is also important for sensitive data governance.<\/p>\n<p>A source field may be classified as sensitive:<\/p>\n<pre><code class=\"language-text\">raw.customers.email: pii.email\nraw.customers.ssn: pii.national_id\nraw.payments.card_last4: pii.payment\n<\/code><\/pre>\n<p>The governance question is not only whether a dbt model depends on <code>raw.customers<\/code>. The real question is:<\/p>\n<blockquote>\n<p>Where do these sensitive fields flow, and are they still protected in downstream models?<\/p>\n<\/blockquote>\n<p>Column-level lineage helps answer:<\/p>\n<ul>\n<li>Does <code>email<\/code> flow into a marketing mart?<\/li>\n<li>Is <code>ssn<\/code> projected directly, hashed, masked, or excluded?<\/li>\n<li>Does a sensitive field appear only in a join or filter, or is it exposed as an output column?<\/li>\n<li>Which BI-facing models contain derived sensitive fields?<\/li>\n<li>Which role or policy should be required before a generated SQL query can access those columns?<\/li>\n<\/ul>\n<p>This is also where dbt lineage connects to SQL governance and Text-to-SQL safety. If an AI assistant generates a query against a dbt-backed semantic environment, field-level access checks need to know which columns are sensitive and where they flow.<\/p>\n<p>A model graph alone is not precise enough for that.<\/p>\n<hr \/>\n<h2>Where DataHub and OpenMetadata Fit<\/h2>\n<p>DataHub, OpenMetadata, and similar catalogs are valuable because they give teams a place to search, browse, document, and govern data assets. They can display datasets, schemas, owners, tags, glossary terms, lineage graphs, and usage metadata.<\/p>\n<p>But catalogs are only as accurate as the lineage they receive.<\/p>\n<p>For dbt projects, a catalog may ingest dbt artifacts and show model-level dependencies. It may also attempt column-level lineage extraction. The hard part is complex SQL semantics, especially when compiled SQL contains nested CTEs, macro-generated SQL, dialect-specific constructs, stored procedure patterns, or advanced warehouse syntax.<\/p>\n<p>A practical architecture is:<\/p>\n<pre><code class=\"language-text\">dbt artifacts + compiled SQL\n  \u2193\nSQL semantic lineage sidecar\n  \u2193\ncolumn_lineage.json\n  \u2193\nDataHub \/ OpenMetadata \/ SQLFlow \/ CI\n<\/code><\/pre>\n<p>In this model, the catalog remains the discovery and governance surface. The sidecar improves the quality of the SQL-derived column lineage before that lineage reaches the catalog.<\/p>\n<p>This is not about replacing DataHub or OpenMetadata. It is about giving them better column-level facts.<\/p>\n<hr \/>\n<h2>What Should a dbt Column-Lineage Output Contain?<\/h2>\n<p>A useful dbt column-lineage output should be machine-readable and honest about confidence.<\/p>\n<p>At minimum, it should include:<\/p>\n<table>\n<thead>\n<tr>\n<th>Field<\/th>\n<th>Purpose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>dbt node ID<\/td>\n<td>Connect lineage to the dbt model<\/td>\n<\/tr>\n<tr>\n<td>output column<\/td>\n<td>The column produced by the model<\/td>\n<\/tr>\n<tr>\n<td>upstream model or source<\/td>\n<td>The upstream dbt object or warehouse table<\/td>\n<\/tr>\n<tr>\n<td>upstream column<\/td>\n<td>The source field that contributes to the output<\/td>\n<\/tr>\n<tr>\n<td>transformation type<\/td>\n<td>Projection, expression, aggregation, window, case, etc.<\/td>\n<\/tr>\n<tr>\n<td>influence type<\/td>\n<td>Projection, filter, join, control, or unknown<\/td>\n<\/tr>\n<tr>\n<td>confidence<\/td>\n<td>Whether the mapping is high-confidence or partial<\/td>\n<\/tr>\n<tr>\n<td>unresolved items<\/td>\n<td>Ambiguous columns, parser errors, missing catalog metadata, unsupported syntax<\/td>\n<\/tr>\n<tr>\n<td>evidence<\/td>\n<td>SQL fragments, line numbers, parser diagnostics, or backend notes where available<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>A simplified JSON shape might look like this:<\/p>\n<pre><code class=\"language-json\">{\n  &quot;node_id&quot;: &quot;model.analytics.fct_customer_revenue&quot;,\n  &quot;output_column&quot;: &quot;gross_revenue&quot;,\n  &quot;upstream&quot;: [\n    {\n      &quot;node_id&quot;: &quot;model.analytics.stg_orders&quot;,\n      &quot;column&quot;: &quot;amount&quot;,\n      &quot;influence&quot;: &quot;projection&quot;,\n      &quot;transformation&quot;: &quot;sum&quot;,\n      &quot;confidence&quot;: &quot;high&quot;\n    },\n    {\n      &quot;node_id&quot;: &quot;model.analytics.stg_orders&quot;,\n      &quot;column&quot;: &quot;status&quot;,\n      &quot;influence&quot;: &quot;filter&quot;,\n      &quot;transformation&quot;: &quot;where status = 'paid'&quot;,\n      &quot;confidence&quot;: &quot;high&quot;\n    }\n  ],\n  &quot;unresolved&quot;: []\n}\n<\/code><\/pre>\n<p>The exact schema can vary by implementation. The important point is that column-level lineage should not be a screenshot or a vague graph only. It should produce facts that downstream systems can consume.<\/p>\n<hr \/>\n<h2>What dbt Column-Level Lineage Should Not Promise<\/h2>\n<p>It is important to set realistic expectations.<\/p>\n<p>A dbt column-lineage system should not claim that it can perfectly understand every macro, every dynamic SQL pattern, every warehouse-specific construct, and every runtime behavior automatically.<\/p>\n<p>Common limitations include:<\/p>\n<ul>\n<li>raw Jinja that has not been compiled;<\/li>\n<li>dynamic SQL generated outside dbt;<\/li>\n<li>macros whose semantics are not visible in the compiled SQL;<\/li>\n<li>missing catalog metadata;<\/li>\n<li>ambiguous unqualified column names;<\/li>\n<li>unsupported dialect features;<\/li>\n<li>runtime behavior that cannot be inferred from static SQL alone;<\/li>\n<li>partial parser failures;<\/li>\n<li>incomplete mapping through complex procedural logic.<\/li>\n<\/ul>\n<p>A trustworthy lineage system should expose these limitations instead of hiding them. It should report unresolved columns, parser diagnostics, low-confidence edges, and unsupported constructs.<\/p>\n<p>For governance and CI, an honest partial result is often better than a polished but false graph.<\/p>\n<hr \/>\n<h2>Common Use Cases<\/h2>\n<h3>1. Source column impact analysis<\/h3>\n<p>A source column is renamed, removed, or changes type. Column-level lineage identifies the downstream dbt model columns that may be affected.<\/p>\n<h3>2. PII and sensitive-field tracing<\/h3>\n<p>A sensitive field appears in a source table. Column-level lineage shows whether it is projected, transformed, hashed, joined, filtered, or exposed downstream.<\/p>\n<h3>3. Metric explanation<\/h3>\n<p>A finance or operations team asks how <code>gross_revenue<\/code>, <code>net_revenue<\/code>, or <code>active_customer_count<\/code> is calculated. Column-level lineage shows the source fields and transformations behind the metric.<\/p>\n<h3>4. Data catalog accuracy<\/h3>\n<p>A catalog displays model-level lineage, but column-level panels are empty or incomplete for complex models. A SQL semantic lineage sidecar can enrich the catalog with better field-level facts.<\/p>\n<h3>5. CI and pull-request review<\/h3>\n<p>A dbt PR changes a model. Column-level lineage can help answer whether the change removed upstream dependencies, introduced sensitive-field propagation, or affected critical downstream outputs.<\/p>\n<h3>6. AI and Text-to-SQL governance<\/h3>\n<p>An AI assistant generates SQL over governed datasets. Field-level permission checks and policy decisions need accurate column facts, including where sensitive fields flow through dbt models.<\/p>\n<hr \/>\n<h2>Quick Reference<\/h2>\n<table>\n<thead>\n<tr>\n<th>Concept<\/th>\n<th>Short definition<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>dbt model graph<\/td>\n<td>The dependency graph built from refs, sources, models, tests, and exposures<\/td>\n<\/tr>\n<tr>\n<td>Model-level lineage<\/td>\n<td>Lineage between dbt models, sources, and downstream assets<\/td>\n<\/tr>\n<tr>\n<td>Column-level lineage<\/td>\n<td>Lineage from each output column to upstream columns and expressions<\/td>\n<\/tr>\n<tr>\n<td>Compiled SQL<\/td>\n<td>SQL produced after dbt resolves Jinja, refs, sources, vars, and macros<\/td>\n<\/tr>\n<tr>\n<td>dbt artifact<\/td>\n<td>Files such as <code>manifest.json<\/code>, <code>catalog.json<\/code>, and <code>run_results.json<\/code><\/td>\n<\/tr>\n<tr>\n<td>SQL semantic analysis<\/td>\n<td>Name binding, scope resolution, expression analysis, and dialect-aware interpretation of SQL<\/td>\n<\/tr>\n<tr>\n<td>Sidecar lineage<\/td>\n<td>A separate tool that reads dbt artifacts and outputs lineage without replacing dbt<\/td>\n<\/tr>\n<tr>\n<td>DataHub \/ OpenMetadata emitter<\/td>\n<td>A connector that sends lineage facts into a catalog<\/td>\n<\/tr>\n<tr>\n<td>Column-level diff<\/td>\n<td>A CI check that compares which upstream columns were added or lost<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<hr \/>\n<h2>How This Connects to SQLFlow and GSP<\/h2>\n<p>SQLFlow and GSP are useful in this workflow because the hard part of dbt column-level lineage is SQL semantic analysis.<\/p>\n<p>A practical pattern is:<\/p>\n<pre><code class=\"language-text\">dbt project\n  \u2193\ndbt build \/ compile\n  \u2193\nmanifest.json + compiled SQL\n  \u2193\nGSP \/ SQLFlow semantic lineage engine\n  \u2193\ncolumn_lineage.json\n  \u2193\nDataHub \/ OpenMetadata \/ SQLFlow \/ CI\n<\/code><\/pre>\n<p>For dbt users, this should not feel like replacing dbt Docs, dbt Cloud Explorer, or their catalog. It should feel like a lineage enhancement layer for cases where model-level lineage is not enough.<\/p>\n<p>For data platform teams, the useful question is not \u201cwhich parser is better?\u201d The useful question is:<\/p>\n<blockquote>\n<p>Can we recover trustworthy column-level facts from the SQL our dbt project actually runs?<\/p>\n<\/blockquote>\n<p>That is the problem a SQL semantic lineage engine is designed to solve.<\/p>\n<hr \/>\n<h2>Common Questions<\/h2>\n<h3>Is dbt column-level lineage the same as dbt model lineage?<\/h3>\n<p>No. dbt model lineage shows dependencies between models, sources, and exposures. Column-level lineage shows dependencies between specific fields, such as <code>fct_orders.gross_revenue<\/code> depending on <code>stg_orders.amount<\/code> and <code>stg_orders.status<\/code>.<\/p>\n<h3>Why not analyze raw dbt model files directly?<\/h3>\n<p>Raw dbt model files often contain Jinja, macros, variables, and refs. A SQL parser is designed for SQL, not raw Jinja. The more reliable path is to analyze compiled SQL plus dbt artifacts.<\/p>\n<h3>Does dbt already provide column-level lineage?<\/h3>\n<p>dbt and related ecosystem tools can provide lineage signals, but many teams still need more precise field-level dependencies for complex SQL, macro-generated SQL, catalog integration, CI checks, and governance workflows.<\/p>\n<h3>Do DataHub or OpenMetadata solve this automatically?<\/h3>\n<p>They can display and ingest lineage, and they are valuable catalog surfaces. But complex SQL-derived column lineage still depends on the quality of the underlying SQL analysis. A sidecar can enrich those catalogs with more precise column-level facts.<\/p>\n<h3>Is column-level lineage only about selected columns?<\/h3>\n<p>No. Useful lineage may include projection dependencies, expression dependencies, filter influence, join influence, aggregation, window functions, and conditional logic. For governance, knowing that a sensitive field influenced a filter or join can matter even if it is not directly selected.<\/p>\n<h3>Can dbt column-level lineage support CI checks?<\/h3>\n<p>Yes, if the lineage output is machine-readable. A CI workflow can compare current lineage to a baseline and flag lost upstream columns, new sensitive-field flows, or unsupported SQL patterns. Per-column semantic diff is more useful than only comparing model-level edge counts.<\/p>\n<h3>How does this relate to Text-to-SQL governance?<\/h3>\n<p>Text-to-SQL systems need field-level facts to decide whether generated SQL should be allowed, denied, warned, or routed for approval. dbt column-level lineage helps explain where governed fields come from and how they flow through modeled datasets.<\/p>\n<hr \/>\n<h2>Summary<\/h2>\n<p>dbt column-level lineage is the field-level map behind a dbt project. It answers which upstream columns, expressions, filters, joins, and transformations produced each downstream model column.<\/p>\n<p>This is different from dbt model-level lineage. Model-level lineage tells you which models depend on each other. Column-level lineage tells you which fields depend on which fields.<\/p>\n<p>The practical path is to combine dbt artifacts, compiled SQL, catalog context, and SQL semantic analysis. That gives data teams a more reliable foundation for impact analysis, sensitive-field tracing, catalog enrichment, CI checks, and SQL governance.<\/p>\n<p>If your dbt project already has model lineage but still cannot explain which source columns feed critical metrics or sensitive downstream fields, the next step is to evaluate dbt column-level lineage on your compiled SQL.<\/p>\n<p><a href=\"https:\/\/www.dpriver.com\/sqlflow\/?utm_source=dpriver_blog&amp;utm_medium=blog_cta&amp;utm_campaign=dbt_column_lineage&amp;utm_content=sqlflow_lineage_demo\">Try SQLFlow&#8217;s SQL lineage demo<\/a>, <a href=\"https:\/\/www.dpriver.com\/contact\/?utm_source=dpriver_blog&amp;utm_medium=blog_cta&amp;utm_campaign=dbt_column_lineage&amp;utm_content=dbt_lineage_review\">contact DPRiver to review a representative dbt compiled SQL model<\/a>, or <a href=\"https:\/\/www.dpriver.com\/pp\/sqlformat.htm?utm_source=dpriver_blog&amp;utm_medium=blog_cta&amp;utm_campaign=llm_sql_guard&amp;utm_content=sqlguard_test\">try SQL Guard-style validation with your SQL<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>dbt column-level lineage maps each dbt model column back to the source columns, expressions, filters, joins, and transformations that produced it. Learn why compiled SQL matters and where catalogs need semantic lineage.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[172,176,93],"tags":[159,132,120,187,163,140,188,29],"blocksy_meta":{"styles_descriptor":{"styles":{"desktop":"","tablet":"","mobile":""},"google_fonts":[],"version":5}},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>What Is dbt Column-Level Lineage?<\/title>\n<meta name=\"description\" content=\"What Is dbt Column-Level Lineage?\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"What Is dbt Column-Level Lineage?\" \/>\n<meta property=\"og:description\" content=\"What Is dbt Column-Level Lineage?\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/\" \/>\n<meta property=\"og:site_name\" content=\"SQL and Data Blog\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-05T09:54:05+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-05T10:50:41+00:00\" \/>\n<meta name=\"author\" content=\"James\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"James\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"16 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#organization\",\"name\":\"SQL and Data Blog\",\"url\":\"https:\/\/www.dpriver.com\/blog\/\",\"sameAs\":[],\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png\",\"contentUrl\":\"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png\",\"width\":251,\"height\":72,\"caption\":\"SQL and Data Blog\"},\"image\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#website\",\"url\":\"https:\/\/www.dpriver.com\/blog\/\",\"name\":\"SQL and Data Blog\",\"description\":\"SQL related blog for database professional\",\"publisher\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dpriver.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/\",\"url\":\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/\",\"name\":\"What Is dbt Column-Level Lineage?\",\"isPartOf\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#website\"},\"datePublished\":\"2026-05-05T09:54:05+00:00\",\"dateModified\":\"2026-05-05T10:50:41+00:00\",\"description\":\"What Is dbt Column-Level Lineage?\",\"breadcrumb\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.dpriver.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"What Is dbt Column-Level Lineage?\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/\"},\"author\":{\"name\":\"James\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04\"},\"headline\":\"What Is dbt Column-Level Lineage?\",\"datePublished\":\"2026-05-05T09:54:05+00:00\",\"dateModified\":\"2026-05-05T10:50:41+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/\"},\"wordCount\":2589,\"publisher\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#organization\"},\"keywords\":[\"column-level-lineage\",\"datahub\",\"dbt\",\"dbt-lineage\",\"field-level-lineage\",\"openmetadata\",\"sql-semantic-analysis\",\"SQLFlow\"],\"articleSection\":[\"Data Lineage\",\"General SQL Parser\",\"SQLFlow\"],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04\",\"name\":\"James\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g\",\"caption\":\"James\"},\"sameAs\":[\"http:\/\/www.dpriver.com\"],\"url\":\"https:\/\/www.dpriver.com\/blog\/author\/james\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"What Is dbt Column-Level Lineage?","description":"What Is dbt Column-Level Lineage?","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/","og_locale":"en_US","og_type":"article","og_title":"What Is dbt Column-Level Lineage?","og_description":"What Is dbt Column-Level Lineage?","og_url":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/","og_site_name":"SQL and Data Blog","article_published_time":"2026-05-05T09:54:05+00:00","article_modified_time":"2026-05-05T10:50:41+00:00","author":"James","twitter_card":"summary_large_image","twitter_misc":{"Written by":"James","Est. reading time":"16 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/www.dpriver.com\/blog\/#organization","name":"SQL and Data Blog","url":"https:\/\/www.dpriver.com\/blog\/","sameAs":[],"logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png","contentUrl":"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png","width":251,"height":72,"caption":"SQL and Data Blog"},"image":{"@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"WebSite","@id":"https:\/\/www.dpriver.com\/blog\/#website","url":"https:\/\/www.dpriver.com\/blog\/","name":"SQL and Data Blog","description":"SQL related blog for database professional","publisher":{"@id":"https:\/\/www.dpriver.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dpriver.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/","url":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/","name":"What Is dbt Column-Level Lineage?","isPartOf":{"@id":"https:\/\/www.dpriver.com\/blog\/#website"},"datePublished":"2026-05-05T09:54:05+00:00","dateModified":"2026-05-05T10:50:41+00:00","description":"What Is dbt Column-Level Lineage?","breadcrumb":{"@id":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.dpriver.com\/blog\/"},{"@type":"ListItem","position":2,"name":"What Is dbt Column-Level Lineage?"}]},{"@type":"Article","@id":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/#article","isPartOf":{"@id":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/"},"author":{"name":"James","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04"},"headline":"What Is dbt Column-Level Lineage?","datePublished":"2026-05-05T09:54:05+00:00","dateModified":"2026-05-05T10:50:41+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dpriver.com\/blog\/what-is-dbt-column-level-lineage\/"},"wordCount":2589,"publisher":{"@id":"https:\/\/www.dpriver.com\/blog\/#organization"},"keywords":["column-level-lineage","datahub","dbt","dbt-lineage","field-level-lineage","openmetadata","sql-semantic-analysis","SQLFlow"],"articleSection":["Data Lineage","General SQL Parser","SQLFlow"],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04","name":"James","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g","caption":"James"},"sameAs":["http:\/\/www.dpriver.com"],"url":"https:\/\/www.dpriver.com\/blog\/author\/james\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts\/3245"}],"collection":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/comments?post=3245"}],"version-history":[{"count":1,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts\/3245\/revisions"}],"predecessor-version":[{"id":3247,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts\/3245\/revisions\/3247"}],"wp:attachment":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/media?parent=3245"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/categories?post=3245"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/tags?post=3245"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}