OpenMetadata + MSSQL Stored Procedures: Why Your Lineage Is Silently Empty — and How to Fix It
Author: James Date: 2026-04-21 Categories: gsp, SQLFlow, OpenMetadata Keywords: column-lineage, data-governance, openmetadata, mssql, stored-procedure, sql-server, gsp-openmetadata-sidecar, temp-tables, merge
—
OpenMetadata issues #16737, #25299, and #17586 all report the same frustrating problem: MSSQL stored procedures produce zero lineage in OpenMetadata. No tables. No columns. The lineage graph just stops at the procedure boundary.
These issues have been open since June 2024 — over two years — with users confirming the same behavior across OpenMetadata 1.4.x through 1.11.x. We analyzed the real SQL from these issues, identified three distinct failure patterns, and built a sidecar tool that solves all of them. Here’s what’s happening and how to fix it.
Why OpenMetadata Can’t Parse Stored Procedures
OpenMetadata extracts lineage by reading SQL query logs from your database, then passing each statement through a three-parser chain: sqlglot → sqlfluff → sqlparse (via the collate-sqllineage library). If all three parsers fail, the statement is silently skipped — no error in the UI, just empty lineage.
The problem is that MSSQL stored procedures aren’t plain SQL statements. They’re wrapped in T-SQL procedural syntax that none of the three parsers can handle:
CREATE PROCEDURE ... AS BEGIN ... END— the procedure declaration wrapperDECLARE— variable declarations#tempTable— temporary tables used as intermediate stagingMERGE ... WHEN MATCHED ... WHEN NOT MATCHED— upsert patterns[dbo].[table_name]— square bracket identifiers
These are standard patterns in every MSSQL environment — not edge cases. If your data warehouse uses stored procedures for ETL (and most MSSQL shops do), OpenMetadata is missing your lineage.
Three Failure Patterns, All Solved
The issues in #16737, #25299, and #17586 document three distinct failure modes. Our gsp-openmetadata-sidecar tool handles all of them:
| Pattern | Issues | What Happens | With GSP |
|---|---|---|---|
| Pattern A — BEGIN/END blocks | #16737, #25299, #17586 | CREATE PROCEDURE ... BEGIN ... END breaks all three parsers. Lineage silently dropped. | Full lineage recovered — procedure wrapper parsed correctly |
| Pattern B — Temp table chains | #25299 | source → #tempTable → target 3-hop lineage. Parser can’t follow data through temp tables. | Multi-hop lineage recovered — temp tables resolved as intermediates |
| Pattern C — Square brackets | #16424 | [database].[schema].[table] identifiers break regex-based bracket handling | Bracket identifiers resolved — cross-database FQNs extracted correctly |
—
Pattern A: CREATE PROCEDURE with BEGIN/END (Issues #16737, #25299, #17586)
This is the most common failure. Here’s the exact SQL from issue #17586 — a minimal stored procedure that OpenMetadata cannot parse:
CREATE PROCEDURE myproc
AS
BEGIN
INSERT INTO test2 SELECT * FROM test1
END
What OpenMetadata sees: The ingestion pipeline reads this from sys.dm_exec_cached_plans. The SQL text starts with CREATE PROCEDURE — and historically, OpenMetadata’s query log reader explicitly filtered these out with:
AND lower(t.text) NOT LIKE '%%create%%procedure%%'
That filter was removed in PR #14586, but even after removal, the parser chain (sqlglot → sqlfluff → sqlparse) still cannot handle the BEGIN...END wrapper. As one user in #16737 confirmed: “commenting commands like CREATE PROCEDURE, BEGIN, END, DECLARE, AS gives correct lineage” — proving the SQL inside is valid, but the parser rejects the T-SQL wrapper.
What GSP extracts:
$ gsp-openmetadata-sidecar --sql-file om_issue_17586_procedure_filtered.sql --dry-run
Analyzing SQL (544 chars)...
Extracted 1 table-level lineage relationships
test1 --> test2 (1 column: *)
[DRY RUN] Would emit lineage: mssql.dbo.test1 --> mssql.dbo.test2
GSP’s T-SQL parser strips the CREATE PROCEDURE ... AS BEGIN ... END wrapper, recognizes the INSERT INTO ... SELECT inside, and extracts the lineage: test1 → test2.
—
Pattern B: Temp Table Multi-Hop Chains (Issue #25299)
This is the most technically interesting failure. Here’s the exact SQL from issue #25299 — a stored procedure that stages data through a temp table:
CREATE PROCEDURE schName.procName
AS
BEGIN
DROP TABLE IF EXISTS #tempTable
CREATE TABLE #tempTable (columnName int)
INSERT INTO #tempTable (columnName)
SELECT columnName FROM schName.sourceTable
INSERT INTO schName.targetTable (columnName)
SELECT columnName FROM #tempTable
END
The data flow is: schName.sourceTable → #tempTable → schName.targetTable
The reporter in #25299 identified this as a 3-hop lineage problem: “Procedures using temp tables (#tempTable) in the transformation chain: source → #temp → target — only two-hop lineage works, not three-hop through temp tables.”
What GSP extracts:
$ gsp-openmetadata-sidecar --sql-file om_issue_25299_create_procedure_begin_end.sql --dry-run
Analyzing SQL (1007 chars)...
Extracted 2 table-level lineage relationships
schName.sourceTable --> #tempTable (1 column: columnName)
#tempTable --> schName.targetTable (1 column: columnName)
[DRY RUN] Would emit lineage:
mssql.schname.sourcetable --> mssql.dbo.#temptable (columnName → columnName)
mssql.dbo.#temptable --> mssql.schname.targettable (columnName → columnName)
GSP extracts the complete chain: sourceTable → #tempTable → targetTable with column-level lineage at each hop. OpenMetadata’s lineage graph would show the full data flow through the temp table intermediate.
—
Pattern C: Square Bracket Cross-Database Identifiers (Issue #16424)
MSSQL uses square brackets for identifier quoting, and cross-database queries use 3-part or 4-part names. Issue #16424 reported that views using [database].[schema].[table] syntax produce no lineage.
The root cause was a greedy regex r"\[(.*)\]" in OpenMetadata’s parser.py that matched across multiple bracket pairs, returning "db].[schema" instead of separate identifiers. Here’s a reconstructed example based on the issue description:
CREATE VIEW [ReportDB].[dbo].[vw_CustomerOrders]
AS
SELECT
[SalesDB].[dbo].[Customers].[CustomerID],
[SalesDB].[dbo].[Customers].[CustomerName],
[SalesDB].[dbo].[Orders].[OrderID],
[SalesDB].[dbo].[Orders].[OrderDate],
[SalesDB].[dbo].[Orders].[TotalAmount]
FROM [SalesDB].[dbo].[Customers]
INNER JOIN [SalesDB].[dbo].[Orders]
ON [SalesDB].[dbo].[Customers].[CustomerID] = [SalesDB].[dbo].[Orders].[CustomerID]
WHERE [SalesDB].[dbo].[Orders].[OrderDate] >= '2024-01-01'
What GSP extracts:
$ gsp-openmetadata-sidecar --sql-file om_issue_16424_square_brackets.sql --dry-run
Analyzing SQL (984 chars)...
Extracted 2 table-level lineage relationships
SalesDB.dbo.Customers --> ReportDB.dbo.vw_CustomerOrders (2 columns)
SalesDB.dbo.Orders --> ReportDB.dbo.vw_CustomerOrders (3 columns)
[DRY RUN] Would emit lineage:
mssql.salesdb.dbo.customers --> mssql.reportdb.dbo.vw_customerorders
CustomerID → CustomerID
CustomerName → CustomerName
mssql.salesdb.dbo.orders --> mssql.reportdb.dbo.vw_customerorders
OrderID → OrderID
OrderDate → OrderDate
TotalAmount → TotalAmount
GSP correctly handles the square brackets, resolves cross-database references (SalesDB → ReportDB), and extracts 5 column-level lineage mappings across 2 source tables.
—
A Real-World Example: Everything Combined
Let’s look at a more realistic stored procedure that combines all three patterns — the kind of ETL logic you’d find in any production MSSQL warehouse:
CREATE PROCEDURE [dbo].[usp_UpdateCustomerOrders]
AS
BEGIN
SET NOCOUNT ON;
SELECT
c.customer_id, c.customer_name,
o.order_id, o.order_date, o.total_amount
INTO #staged_orders
FROM [dbo].[customers] c
INNER JOIN [dbo].[orders] o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATEADD(day, -30, GETDATE());
MERGE [dbo].[customer_order_summary] AS target
USING #staged_orders AS source
ON target.customer_id = source.customer_id
WHEN MATCHED THEN
UPDATE SET
target.last_order_date = source.order_date,
target.total_amount = source.total_amount,
target.customer_name = source.customer_name
WHEN NOT MATCHED THEN
INSERT (customer_id, customer_name, last_order_date, total_amount)
VALUES (source.customer_id, source.customer_name,
source.order_date, source.total_amount);
INSERT INTO [dbo].[audit_log] (action, record_count, run_date)
SELECT 'usp_UpdateCustomerOrders', COUNT(*), GETDATE()
FROM #staged_orders;
DROP TABLE #staged_orders;
END
This procedure has everything OpenMetadata can’t handle: CREATE PROCEDURE, BEGIN/END, a temp table (#staged_orders), a MERGE statement, and square bracket identifiers.
What GSP extracts — 4 lineage edges, 12 column-level mappings:
$ gsp-openmetadata-sidecar --sql-file mssql_stored_procedure.sql --dry-run
Extracted 4 table-level lineage relationships:
dbo.customers --> #staged_orders (2 columns: customer_id, customer_name)
dbo.orders --> #staged_orders (3 columns: order_id, order_date, total_amount)
#staged_orders --> dbo.customer_order_summary (4 columns)
#staged_orders --> dbo.audit_log (3 columns)
| Source Table | Target Table | Column Mappings |
|---|---|---|
dbo.customers | #staged_orders | customer_id → customer_id, customer_name → customer_name |
dbo.orders | #staged_orders | order_id → order_id, order_date → order_date, total_amount → total_amount |
#staged_orders | dbo.customer_order_summary | customer_id, customer_name, order_date → last_order_date, total_amount |
#staged_orders | dbo.audit_log | COUNT(*) → record_count, GETDATE() → run_date, literal → action |
The full data flow through the temp table staging layer is preserved, including the MERGE upsert pattern and the audit log insert.
—
Summary: All Patterns Solved
| Issue | Pattern | OpenMetadata Today | With gsp-openmetadata-sidecar |
|---|---|---|---|
| #17586 | A — BEGIN/END wrapper | 0 lineage (silently skipped) | 1 table-level, 1 column mapping |
| #25299 | B — Temp table chain | 0 lineage (3-hop fails) | 2 table-level, 2 column mappings |
| #16424 | C — Square brackets | 0 lineage (regex bug) | 2 table-level, 5 column mappings |
| Combined example | A + B + C | 0 lineage | 4 table-level, 12 column mappings |
—
How It Works
The gsp-openmetadata-sidecar is a companion tool that runs alongside your OpenMetadata installation. It does not modify OpenMetadata — it uses the public REST API (PUT /api/v1/lineage) to push the lineage that OpenMetadata’s own parser misses.
OpenMetadata ingestion (unchanged) gsp-openmetadata-sidecar
query logs → collate-sqllineage |
| | 1. Parse SQL with Gudu SQLFlow
v | 2. Resolve table FQNs via OM API
silently skipped | 3. Push lineage via PUT /api/v1/lineage
(lineage lost) v
OpenMetadata (lineage restored)
What it does:
- Parses your SQL using Gudu SQLFlow — a specialized SQL parser that handles procedural T-SQL natively (including
BEGIN/END,DECLARE, temp tables,MERGE, dynamic SQL, cursors, and 20+ dialects) - Resolves table names to OpenMetadata entity UUIDs via
GET /api/v1/tables/name/{fqn} - Pushes lineage edges to OpenMetadata via
PUT /api/v1/lineagewith column-level detail - BigQuery procedural SQL (
DECLARE,IF/THEN,CREATE TEMP TABLE) - Oracle PL/SQL blocks
- Snowflake scripting
MERGE INTOcolumn-level lineage (all dialects)- Multi-statement SQL scripts
- Cross-database and cross-schema references
- 20+ SQL dialects total
- #16737 — Data Lineage Not Reflected for MSSQL Stored Procedure (open since June 2024)
- #25299 — Stored Procedure lineage is not supported for MS SQL connector (release backlog)
- #17586 — MS SQL Procedures Lineage Not Picked Up (partially fixed — filter removed, parser still fails)
- #16424 — Square bracket syntax breaks lineage (fixed in sqlfluff, but proc wrapper still fails)
- Discussion #23717 — Cross-database MSSQL lineage (unanswered)
What it supports beyond MSSQL stored procedures:
—
Try It
# Install
pip install gsp-openmetadata-sidecar
# Test with the exact SQL from issue #25299 (dry run — no OM connection needed):
gsp-openmetadata-sidecar \
--sql-file examples/om_issue_25299_create_procedure_begin_end.sql \
--dry-run
# Test with your own stored procedures:
gsp-openmetadata-sidecar \
--sql-file your_procedure.sql \
--db-vendor dbvmssql \
--dry-run
# Push lineage to your OpenMetadata instance:
gsp-openmetadata-sidecar \
--sql-file your_procedure.sql \
--om-server http://localhost:8585/api \
--om-token "eyJ..." \
--service-name mssql_prod \
--database-name YourDB \
--schema-name dbo
The tool supports four backend modes: anonymous (no signup, 50 calls/day), authenticated (API key, 10k/month), self-hosted Docker (unlimited, data stays in your network), and local JAR (offline, no network at all).
Source code and documentation: github.com/gudusoftware/gsp-openmetadata-sidecar
—
Related OpenMetadata Issues
Also Affected by SQL Parsing Gaps?
If you’re using DataHub and hitting similar SQL parsing failures, see our gsp-datahub-sidecar — same approach, same engine, built for DataHub’s ingestion pipeline. It addresses BigQuery procedural SQL (#11654), Power BI M-language lineage (#15327), and MSSQL stored procedures (#12606).
—
Disclosure: I work at Gudu Software, the company behind GSP SQL Parser, SQLFlow, and the gsp-openmetadata-sidecar. All lineage results shown were produced by running the SQL examples through GSP’s T-SQL parser and verified against the actual SQL from the referenced OpenMetadata GitHub issues. The example SQL files used in this post are available in the gsp-openmetadata-sidecar repository.

