Skip to content
No results
  • SQL Pretty Printer
  • General SQL Parser Tutorial
  • Data Lineage Tool
We use cookies to ensure that we give you the best experience on our website.
SQL Pretty Printer
SQL and DATA

SQLFlow Data Lineage Tool

  • SQL Pretty Printer
  • General SQL Parser Tutorial
  • Data Lineage Tool
SQL Pretty Printer
SQL and DATA

OpenMetadata + MSSQL Stored Procedures: Why Your Lineage Is Silently Empty — and How to Fix It

OpenMetadata issues #16737, #25299, and #17586 report zero lineage from MSSQL stored procedures. We analyze three failure patterns — BEGIN/END blocks, temp table chains, and square bracket identifiers — with real SQL from the community, and show how gsp-openmetadata-sidecar recovers full column-level lineage.
  • JamesJames
  • April 21, 2026
  • gsp, SQL language, SQLFlow

Author: James Date: 2026-04-21 Categories: gsp, SQLFlow, OpenMetadata Keywords: column-lineage, data-governance, openmetadata, mssql, stored-procedure, sql-server, gsp-openmetadata-sidecar, temp-tables, merge

—

OpenMetadata issues #16737, #25299, and #17586 all report the same frustrating problem: MSSQL stored procedures produce zero lineage in OpenMetadata. No tables. No columns. The lineage graph just stops at the procedure boundary.

These issues have been open since June 2024 — over two years — with users confirming the same behavior across OpenMetadata 1.4.x through 1.11.x. We analyzed the real SQL from these issues, identified three distinct failure patterns, and built a sidecar tool that solves all of them. Here’s what’s happening and how to fix it.

Why OpenMetadata Can’t Parse Stored Procedures

OpenMetadata extracts lineage by reading SQL query logs from your database, then passing each statement through a three-parser chain: sqlglot → sqlfluff → sqlparse (via the collate-sqllineage library). If all three parsers fail, the statement is silently skipped — no error in the UI, just empty lineage.

The problem is that MSSQL stored procedures aren’t plain SQL statements. They’re wrapped in T-SQL procedural syntax that none of the three parsers can handle:

  • CREATE PROCEDURE ... AS BEGIN ... END — the procedure declaration wrapper
  • DECLARE — variable declarations
  • #tempTable — temporary tables used as intermediate staging
  • MERGE ... WHEN MATCHED ... WHEN NOT MATCHED — upsert patterns
  • [dbo].[table_name] — square bracket identifiers

These are standard patterns in every MSSQL environment — not edge cases. If your data warehouse uses stored procedures for ETL (and most MSSQL shops do), OpenMetadata is missing your lineage.

Three Failure Patterns, All Solved

The issues in #16737, #25299, and #17586 document three distinct failure modes. Our gsp-openmetadata-sidecar tool handles all of them:

PatternIssuesWhat HappensWith GSP
Pattern A — BEGIN/END blocks#16737, #25299, #17586CREATE PROCEDURE ... BEGIN ... END breaks all three parsers. Lineage silently dropped.Full lineage recovered — procedure wrapper parsed correctly
Pattern B — Temp table chains#25299source → #tempTable → target 3-hop lineage. Parser can’t follow data through temp tables.Multi-hop lineage recovered — temp tables resolved as intermediates
Pattern C — Square brackets#16424[database].[schema].[table] identifiers break regex-based bracket handlingBracket identifiers resolved — cross-database FQNs extracted correctly

—

Pattern A: CREATE PROCEDURE with BEGIN/END (Issues #16737, #25299, #17586)

This is the most common failure. Here’s the exact SQL from issue #17586 — a minimal stored procedure that OpenMetadata cannot parse:

CREATE PROCEDURE myproc
AS
BEGIN
    INSERT INTO test2 SELECT * FROM test1
END

What OpenMetadata sees: The ingestion pipeline reads this from sys.dm_exec_cached_plans. The SQL text starts with CREATE PROCEDURE — and historically, OpenMetadata’s query log reader explicitly filtered these out with:

AND lower(t.text) NOT LIKE '%%create%%procedure%%'

That filter was removed in PR #14586, but even after removal, the parser chain (sqlglot → sqlfluff → sqlparse) still cannot handle the BEGIN...END wrapper. As one user in #16737 confirmed: “commenting commands like CREATE PROCEDURE, BEGIN, END, DECLARE, AS gives correct lineage” — proving the SQL inside is valid, but the parser rejects the T-SQL wrapper.

What GSP extracts:

$ gsp-openmetadata-sidecar --sql-file om_issue_17586_procedure_filtered.sql --dry-run

Analyzing SQL (544 chars)...
Extracted 1 table-level lineage relationships
  test1 --> test2 (1 column: *)

[DRY RUN] Would emit lineage: mssql.dbo.test1 --> mssql.dbo.test2

GSP’s T-SQL parser strips the CREATE PROCEDURE ... AS BEGIN ... END wrapper, recognizes the INSERT INTO ... SELECT inside, and extracts the lineage: test1 → test2.

—

Pattern B: Temp Table Multi-Hop Chains (Issue #25299)

This is the most technically interesting failure. Here’s the exact SQL from issue #25299 — a stored procedure that stages data through a temp table:

CREATE PROCEDURE schName.procName
AS
BEGIN
    DROP TABLE IF EXISTS #tempTable

    CREATE TABLE #tempTable (columnName int)

    INSERT INTO #tempTable (columnName)
    SELECT columnName FROM schName.sourceTable

    INSERT INTO schName.targetTable (columnName)
    SELECT columnName FROM #tempTable
END

The data flow is: schName.sourceTable → #tempTable → schName.targetTable

The reporter in #25299 identified this as a 3-hop lineage problem: “Procedures using temp tables (#tempTable) in the transformation chain: source → #temp → target — only two-hop lineage works, not three-hop through temp tables.”

What GSP extracts:

$ gsp-openmetadata-sidecar --sql-file om_issue_25299_create_procedure_begin_end.sql --dry-run

Analyzing SQL (1007 chars)...
Extracted 2 table-level lineage relationships
  schName.sourceTable --> #tempTable (1 column: columnName)
  #tempTable --> schName.targetTable (1 column: columnName)

[DRY RUN] Would emit lineage:
  mssql.schname.sourcetable --> mssql.dbo.#temptable (columnName → columnName)
  mssql.dbo.#temptable --> mssql.schname.targettable (columnName → columnName)

GSP extracts the complete chain: sourceTable → #tempTable → targetTable with column-level lineage at each hop. OpenMetadata’s lineage graph would show the full data flow through the temp table intermediate.

—

Pattern C: Square Bracket Cross-Database Identifiers (Issue #16424)

MSSQL uses square brackets for identifier quoting, and cross-database queries use 3-part or 4-part names. Issue #16424 reported that views using [database].[schema].[table] syntax produce no lineage.

The root cause was a greedy regex r"\[(.*)\]" in OpenMetadata’s parser.py that matched across multiple bracket pairs, returning "db].[schema" instead of separate identifiers. Here’s a reconstructed example based on the issue description:

CREATE VIEW [ReportDB].[dbo].[vw_CustomerOrders]
AS
SELECT
    [SalesDB].[dbo].[Customers].[CustomerID],
    [SalesDB].[dbo].[Customers].[CustomerName],
    [SalesDB].[dbo].[Orders].[OrderID],
    [SalesDB].[dbo].[Orders].[OrderDate],
    [SalesDB].[dbo].[Orders].[TotalAmount]
FROM [SalesDB].[dbo].[Customers]
INNER JOIN [SalesDB].[dbo].[Orders]
    ON [SalesDB].[dbo].[Customers].[CustomerID] = [SalesDB].[dbo].[Orders].[CustomerID]
WHERE [SalesDB].[dbo].[Orders].[OrderDate] >= '2024-01-01'

What GSP extracts:

$ gsp-openmetadata-sidecar --sql-file om_issue_16424_square_brackets.sql --dry-run

Analyzing SQL (984 chars)...
Extracted 2 table-level lineage relationships
  SalesDB.dbo.Customers --> ReportDB.dbo.vw_CustomerOrders (2 columns)
  SalesDB.dbo.Orders --> ReportDB.dbo.vw_CustomerOrders (3 columns)

[DRY RUN] Would emit lineage:
  mssql.salesdb.dbo.customers --> mssql.reportdb.dbo.vw_customerorders
    CustomerID → CustomerID
    CustomerName → CustomerName
  mssql.salesdb.dbo.orders --> mssql.reportdb.dbo.vw_customerorders
    OrderID → OrderID
    OrderDate → OrderDate
    TotalAmount → TotalAmount

GSP correctly handles the square brackets, resolves cross-database references (SalesDB → ReportDB), and extracts 5 column-level lineage mappings across 2 source tables.

—

A Real-World Example: Everything Combined

Let’s look at a more realistic stored procedure that combines all three patterns — the kind of ETL logic you’d find in any production MSSQL warehouse:

CREATE PROCEDURE [dbo].[usp_UpdateCustomerOrders]
AS
BEGIN
    SET NOCOUNT ON;

    SELECT
        c.customer_id, c.customer_name,
        o.order_id, o.order_date, o.total_amount
    INTO #staged_orders
    FROM [dbo].[customers] c
    INNER JOIN [dbo].[orders] o ON c.customer_id = o.customer_id
    WHERE o.order_date >= DATEADD(day, -30, GETDATE());

    MERGE [dbo].[customer_order_summary] AS target
    USING #staged_orders AS source
    ON target.customer_id = source.customer_id
    WHEN MATCHED THEN
        UPDATE SET
            target.last_order_date = source.order_date,
            target.total_amount = source.total_amount,
            target.customer_name = source.customer_name
    WHEN NOT MATCHED THEN
        INSERT (customer_id, customer_name, last_order_date, total_amount)
        VALUES (source.customer_id, source.customer_name,
                source.order_date, source.total_amount);

    INSERT INTO [dbo].[audit_log] (action, record_count, run_date)
    SELECT 'usp_UpdateCustomerOrders', COUNT(*), GETDATE()
    FROM #staged_orders;

    DROP TABLE #staged_orders;
END

This procedure has everything OpenMetadata can’t handle: CREATE PROCEDURE, BEGIN/END, a temp table (#staged_orders), a MERGE statement, and square bracket identifiers.

What GSP extracts — 4 lineage edges, 12 column-level mappings:

$ gsp-openmetadata-sidecar --sql-file mssql_stored_procedure.sql --dry-run

Extracted 4 table-level lineage relationships:
  dbo.customers --> #staged_orders (2 columns: customer_id, customer_name)
  dbo.orders --> #staged_orders (3 columns: order_id, order_date, total_amount)
  #staged_orders --> dbo.customer_order_summary (4 columns)
  #staged_orders --> dbo.audit_log (3 columns)
Source TableTarget TableColumn Mappings
dbo.customers#staged_orderscustomer_id → customer_id, customer_name → customer_name
dbo.orders#staged_ordersorder_id → order_id, order_date → order_date, total_amount → total_amount
#staged_ordersdbo.customer_order_summarycustomer_id, customer_name, order_date → last_order_date, total_amount
#staged_ordersdbo.audit_logCOUNT(*) → record_count, GETDATE() → run_date, literal → action

The full data flow through the temp table staging layer is preserved, including the MERGE upsert pattern and the audit log insert.

—

Summary: All Patterns Solved

IssuePatternOpenMetadata TodayWith gsp-openmetadata-sidecar
#17586A — BEGIN/END wrapper0 lineage (silently skipped)1 table-level, 1 column mapping
#25299B — Temp table chain0 lineage (3-hop fails)2 table-level, 2 column mappings
#16424C — Square brackets0 lineage (regex bug)2 table-level, 5 column mappings
Combined exampleA + B + C0 lineage4 table-level, 12 column mappings

—

How It Works

The gsp-openmetadata-sidecar is a companion tool that runs alongside your OpenMetadata installation. It does not modify OpenMetadata — it uses the public REST API (PUT /api/v1/lineage) to push the lineage that OpenMetadata’s own parser misses.

OpenMetadata ingestion (unchanged)       gsp-openmetadata-sidecar
  query logs → collate-sqllineage          |
              |                            | 1. Parse SQL with Gudu SQLFlow
              v                            | 2. Resolve table FQNs via OM API
        silently skipped                   | 3. Push lineage via PUT /api/v1/lineage
        (lineage lost)                     v
                                     OpenMetadata (lineage restored)

What it does:

  1. Parses your SQL using Gudu SQLFlow — a specialized SQL parser that handles procedural T-SQL natively (including BEGIN/END, DECLARE, temp tables, MERGE, dynamic SQL, cursors, and 20+ dialects)
  2. Resolves table names to OpenMetadata entity UUIDs via GET /api/v1/tables/name/{fqn}
  3. Pushes lineage edges to OpenMetadata via PUT /api/v1/lineage with column-level detail
  4. What it supports beyond MSSQL stored procedures:

    • BigQuery procedural SQL (DECLARE, IF/THEN, CREATE TEMP TABLE)
    • Oracle PL/SQL blocks
    • Snowflake scripting
    • MERGE INTO column-level lineage (all dialects)
    • Multi-statement SQL scripts
    • Cross-database and cross-schema references
    • 20+ SQL dialects total

    —

    Try It

    # Install
    pip install gsp-openmetadata-sidecar
    
    # Test with the exact SQL from issue #25299 (dry run — no OM connection needed):
    gsp-openmetadata-sidecar \
      --sql-file examples/om_issue_25299_create_procedure_begin_end.sql \
      --dry-run
    
    # Test with your own stored procedures:
    gsp-openmetadata-sidecar \
      --sql-file your_procedure.sql \
      --db-vendor dbvmssql \
      --dry-run
    
    # Push lineage to your OpenMetadata instance:
    gsp-openmetadata-sidecar \
      --sql-file your_procedure.sql \
      --om-server http://localhost:8585/api \
      --om-token "eyJ..." \
      --service-name mssql_prod \
      --database-name YourDB \
      --schema-name dbo

    The tool supports four backend modes: anonymous (no signup, 50 calls/day), authenticated (API key, 10k/month), self-hosted Docker (unlimited, data stays in your network), and local JAR (offline, no network at all).

    Source code and documentation: github.com/gudusoftware/gsp-openmetadata-sidecar

    —

    Related OpenMetadata Issues

    • #16737 — Data Lineage Not Reflected for MSSQL Stored Procedure (open since June 2024)
    • #25299 — Stored Procedure lineage is not supported for MS SQL connector (release backlog)
    • #17586 — MS SQL Procedures Lineage Not Picked Up (partially fixed — filter removed, parser still fails)
    • #16424 — Square bracket syntax breaks lineage (fixed in sqlfluff, but proc wrapper still fails)
    • Discussion #23717 — Cross-database MSSQL lineage (unanswered)

    Also Affected by SQL Parsing Gaps?

    If you’re using DataHub and hitting similar SQL parsing failures, see our gsp-datahub-sidecar — same approach, same engine, built for DataHub’s ingestion pipeline. It addresses BigQuery procedural SQL (#11654), Power BI M-language lineage (#15327), and MSSQL stored procedures (#12606).

    —

    Disclosure: I work at Gudu Software, the company behind GSP SQL Parser, SQLFlow, and the gsp-openmetadata-sidecar. All lineage results shown were produced by running the SQL examples through GSP’s T-SQL parser and verified against the actual SQL from the referenced OpenMetadata GitHub issues. The example SQL files used in this post are available in the gsp-openmetadata-sidecar repository.

Gudu SQLFlow

Posts

  • OpenMetadata + MSSQL Stored Procedures: Why Your Lineage Is Silently Empty — and How to Fix It
    April 21, 2026
  • DataHub #15327: Why Power BI M-Language Queries Produce Zero Lineage — 4-Query Deep Analysis
    April 20, 2026
  • Why Power BI SQL Comments Break Your DataHub Lineage — and How to Fix It
    April 19, 2026
  • Why Your DataHub BigQuery Lineage Silently Breaks on Procedural SQL — and How to Fix It
    April 18, 2026
  • Analyzing Oracle PL/SQL Dependencies Before Your Snowflake Migration
    April 6, 2026


List of demos illustrate how to use general sql parser

Related Posts

DataHub #15327: Why Power BI M-Language Queries Produce Zero Lineage — 4-Query Deep Analysis

  • April 20, 2026

Why Power BI SQL Comments Break Your DataHub Lineage — and How to Fix It

  • April 19, 2026

Why Your DataHub BigQuery Lineage Silently Breaks on Procedural SQL — and How to Fix It

  • April 18, 2026

Copyright © 2026 - Gudu Software