Why Most SQL Lineage Tools Skip Stored Procedures (And What You Can Do About It)

Most SQL lineage tools silently skip stored procedures. Learn why PL/SQL, T-SQL, and PL/pgSQL procedural code breaks typical parsers, and how General SQL Parser extracts column-level lineage through stored procedures -- not just around them.

If you work with enterprise databases, you already know: stored procedures are everywhere. They hold critical business logic, enforce data transformations, and move data between tables in ways that no view or SELECT statement can capture. PL/SQL packages in Oracle, T-SQL procedures in SQL Server, PL/pgSQL functions in PostgreSQL — these are the workhorses of data processing in large organizations.

And yet, when you fire up a SQL lineage tool and point it at your database, those stored procedures are often quietly ignored. The lineage diagram shows tables and views connected by SELECT and INSERT statements, but the procedural code that actually drives data flow? It is either missing entirely or represented as a black box with no column-level detail.

This is not a minor gap. In regulated industries, auditors want to know exactly how data moves from source to target. In migration projects, you need to understand what a stored procedure does before you can rewrite it. And for impact analysis, a schema change that breaks a stored procedure can cascade into production failures that no lineage tool warned you about.

Let us look at why this happens, what the evidence looks like in practice, and what a proper solution requires.

Why Tools Skip Stored Procedures

The core challenge is architectural. Most SQL parsers are built to handle declarative SQL — SELECT, INSERT, UPDATE, DELETE, CREATE VIEW. These statements describe what data to retrieve or modify. Stored procedures, on the other hand, contain procedural logic — IF/ELSE branches, loops, exception handlers, cursor operations, dynamic SQL construction, and vendor-specific control flow.

There are three main reasons tools struggle:

1. Preprocessing vs. Native Parsing

Many lineage tools use a preprocessing approach: they extract the SQL statements embedded inside a procedure body and feed them into a standard SQL parser one at a time. This loses context. A variable assigned in one statement and used three lines later in an INSERT becomes invisible. A cursor that iterates over a result set and inserts rows conditionally cannot be resolved without understanding the control flow.

Native parsing means the parser understands the procedural language itself — DECLARE blocks, BEGIN/END scoping, cursor declarations, loop constructs, exception handlers — as first-class syntax. This is significantly harder to build, which is why most tools take the shortcut.

2. Control Flow Complexity

A stored procedure is not a single data flow — it is a program with branches. Consider a PL/SQL procedure that uses BULK COLLECT to fetch 10,000 rows into a collection, then uses FORALL to insert them into a target table with an exception handler that logs failures to an error table. The lineage from source columns to target columns passes through collection variables, loop indices, and conditional branches. Tracking column-level lineage through this requires something closer to program analysis than SQL parsing.

3. Vendor-Specific Syntax

Each database vendor has its own procedural language with unique syntax. Oracle’s PL/SQL has packages, nested procedures, autonomous transactions, and pipelined table functions. SQL Server’s T-SQL has TRY/CATCH, table variables, the GO batch separator, and MERGE with OUTPUT. PostgreSQL’s PL/pgSQL has PERFORM (execute and discard), RAISE (logging), RETURN QUERY, and dollar-quoted strings. Supporting all of these at production quality means maintaining separate, deep grammars for each dialect.

Evidence From the Field

The pattern is consistent across the ecosystem. Without naming and shaming, here is what we see:

Open-source metadata platforms have acknowledged procedural SQL gaps. One major platform has had open issues for over five months related to stored procedure lineage extraction, with community members requesting support for PL/SQL and T-SQL procedure parsing that the platform’s SQL parser cannot handle.

Python-based SQL transpilers that are popular in the dbt ecosystem work well for SELECT statement analysis but struggle with procedural constructs. T-SQL’s GO batch delimiter, complex PL/SQL blocks with nested BEGIN/END, and vendor-specific syntax like Oracle’s EXECUTE IMMEDIATE are known limitations. The transpiler approach works for the 80% case (queries and views) but hits a wall with procedural code.

Large-scale migration tools at Fortune 500 companies have encountered this directly. One major technology company’s open-source migration utility (issue #2098) documented how their SQL parser could not propagate column-level lineage tokens through procedural logic, and the team discussed replacing their parser entirely due to dialect limitations in handling stored procedures.

Database migration tools — both commercial (cloud provider schema conversion tools) and open-source (Oracle-to-PostgreSQL converters) — frequently fail on specific PL/SQL patterns. BULK COLLECT combined with FORALL, which is one of the most common performance optimization patterns in Oracle, is a known pain point. The migration tool either errors out or produces incorrect target code because it does not fully parse the procedural context.

Data catalog platforms have added stored procedure support in their UI — you can view and edit procedure code — but the actual parsing for lineage extraction is not there. Showing the code is not the same as understanding the data flow within it.

What Good Stored Procedure Parsing Looks Like

To extract accurate column-level lineage from stored procedures, a parser needs to handle the full procedural language, not just the SQL statements embedded in it. Here are three real-world examples that illustrate the complexity.

PL/SQL: Package With Cursor Loops and BULK COLLECT

CREATE OR REPLACE PACKAGE BODY etl_pkg AS
  PROCEDURE load_customer_summary IS
    TYPE t_cust_rec IS RECORD (
      customer_id   customers.customer_id%TYPE,
      total_amount  NUMBER,
      order_count   NUMBER
    );
    TYPE t_cust_tab IS TABLE OF t_cust_rec;
    l_batch t_cust_tab;

    CURSOR c_active IS
      SELECT c.customer_id,
             SUM(o.amount) AS total_amount,
             COUNT(o.order_id) AS order_count
      FROM customers c
      JOIN orders o ON o.customer_id = c.customer_id
      WHERE c.status = 'ACTIVE'
      GROUP BY c.customer_id;
  BEGIN
    OPEN c_active;
    LOOP
      FETCH c_active BULK COLLECT INTO l_batch LIMIT 5000;
      EXIT WHEN l_batch.COUNT = 0;

      FORALL i IN 1..l_batch.COUNT
        MERGE INTO customer_summary cs
        USING (SELECT l_batch(i).customer_id AS cid,
                      l_batch(i).total_amount AS amt,
                      l_batch(i).order_count AS cnt FROM dual) src
        ON (cs.customer_id = src.cid)
        WHEN MATCHED THEN
          UPDATE SET cs.total_amount = src.amt,
                     cs.order_count = src.cnt,
                     cs.updated_at = SYSDATE
        WHEN NOT MATCHED THEN
          INSERT (customer_id, total_amount, order_count, updated_at)
          VALUES (src.cid, src.amt, src.cnt, SYSDATE);

      COMMIT;
    END LOOP;
    CLOSE c_active;
  EXCEPTION
    WHEN OTHERS THEN
      ROLLBACK;
      INSERT INTO etl_error_log (procedure_name, error_msg, error_time)
      VALUES ('load_customer_summary', SQLERRM, SYSDATE);
      COMMIT;
      RAISE;
  END load_customer_summary;
END etl_pkg;

The lineage here flows from customers.customer_id, orders.amount, and orders.order_id through a cursor, into a PL/SQL collection variable, through a FORALL + MERGE statement, and finally into customer_summary.customer_id, customer_summary.total_amount, and customer_summary.order_count. The error path also creates lineage into etl_error_log. A parser that only sees the MERGE statement without understanding the cursor and collection context will miss the source columns entirely.

General SQL Parser handles this by parsing the full PL/SQL package body, resolving variable types through %TYPE and record declarations, tracking data flow through BULK COLLECT into collection variables, and connecting those variables through the FORALL loop to the final MERGE target.

T-SQL: Procedure With Temp Tables, Dynamic SQL, and TRY/CATCH

CREATE PROCEDURE dbo.usp_refresh_product_metrics
    @CategoryID INT,
    @StartDate DATE
AS
BEGIN
    SET NOCOUNT ON;
    BEGIN TRY
        -- Stage data in temp table
        SELECT p.product_id,
               p.product_name,
               p.category_id,
               SUM(s.quantity) AS total_qty,
               SUM(s.quantity * s.unit_price) AS total_revenue
        INTO #product_staging
        FROM products p
        INNER JOIN sales s ON s.product_id = p.product_id
        WHERE p.category_id = @CategoryID
          AND s.sale_date >= @StartDate
        GROUP BY p.product_id, p.product_name, p.category_id;

        -- Apply business rules
        UPDATE #product_staging
        SET total_revenue = total_revenue * 0.9
        WHERE total_qty > 1000;

        -- Merge into target
        MERGE product_metrics AS tgt
        USING #product_staging AS src
        ON tgt.product_id = src.product_id
        WHEN MATCHED THEN
            UPDATE SET tgt.product_name = src.product_name,
                       tgt.total_qty = src.total_qty,
                       tgt.total_revenue = src.total_revenue,
                       tgt.last_updated = GETDATE()
        WHEN NOT MATCHED THEN
            INSERT (product_id, product_name, category_id,
                    total_qty, total_revenue, last_updated)
            VALUES (src.product_id, src.product_name, src.category_id,
                    src.total_qty, src.total_revenue, GETDATE());

        DROP TABLE #product_staging;
    END TRY
    BEGIN CATCH
        IF OBJECT_ID('tempdb..#product_staging') IS NOT NULL
            DROP TABLE #product_staging;

        INSERT INTO dbo.error_log (proc_name, error_message, error_time)
        VALUES ('usp_refresh_product_metrics', ERROR_MESSAGE(), GETDATE());

        THROW;
    END CATCH
END
GO

This procedure creates a temp table via SELECT INTO, modifies it with an UPDATE, then merges into the final target. The column-level lineage must trace products.product_name and sales.quantity through #product_staging and into product_metrics. The GO batch delimiter, TRY/CATCH error handling, and temp table lifecycle all need to be understood. General SQL Parser resolves the temp table schema from the SELECT INTO statement, tracks the UPDATE transformation, and maps columns through the MERGE into the final target — including the error path to error_log.

PL/pgSQL: Function With PERFORM, RAISE, and RETURN QUERY

CREATE OR REPLACE FUNCTION analytics.refresh_user_engagement(
    p_days_back INTEGER DEFAULT 30
)
RETURNS TABLE (
    user_id BIGINT,
    engagement_score NUMERIC,
    segment TEXT
)
LANGUAGE plpgsql
AS $$
DECLARE
    v_threshold NUMERIC;
    v_count INTEGER;
BEGIN
    -- Cleanup old data
    PERFORM analytics.archive_old_engagement(p_days_back * 2);

    SELECT AVG(login_count)::NUMERIC INTO v_threshold
    FROM analytics.user_activity
    WHERE activity_date >= CURRENT_DATE - p_days_back;

    INSERT INTO analytics.engagement_daily (
        user_id, engagement_score, segment, calc_date
    )
    SELECT ua.user_id,
           (ua.login_count * 0.4 + ua.page_views * 0.3
            + ua.actions_taken * 0.3)::NUMERIC AS engagement_score,
           CASE
             WHEN (ua.login_count * 0.4 + ua.page_views * 0.3
                   + ua.actions_taken * 0.3) > v_threshold THEN 'high'
             ELSE 'standard'
           END AS segment,
           CURRENT_DATE
    FROM analytics.user_activity ua
    WHERE ua.activity_date >= CURRENT_DATE - p_days_back
    ON CONFLICT (user_id, calc_date) DO UPDATE
    SET engagement_score = EXCLUDED.engagement_score,
        segment = EXCLUDED.segment;

    GET DIAGNOSTICS v_count = ROW_COUNT;
    RAISE NOTICE 'Processed % user engagement records', v_count;

    RETURN QUERY
    SELECT ed.user_id, ed.engagement_score, ed.segment
    FROM analytics.engagement_daily ed
    WHERE ed.calc_date = CURRENT_DATE;
END;
$$;

Here, PERFORM calls another function (with side effects), RAISE NOTICE is PostgreSQL-specific logging, and RETURN QUERY makes this a table-returning function whose output can feed into other queries. The lineage traces user_activity.login_count, user_activity.page_views, and user_activity.actions_taken through computed expressions into engagement_daily.engagement_score and engagement_daily.segment, and also through the function’s return type. General SQL Parser understands all of these PL/pgSQL constructs natively.

Column-Level Lineage Through Stored Procedures, Not Around Them

The key distinction is whether a tool extracts lineage through stored procedures or just around them. Working around them means treating the procedure as a black box — you know it reads from table A and writes to table B, but you do not know which specific columns are involved or how they are transformed. Working through them means the parser understands the procedural logic well enough to trace individual columns from source to target, including through variables, cursors, collections, temp tables, and control flow branches.

General SQL Parser takes the “through” approach. It parses the full procedural language for each supported dialect, builds an internal representation of variable declarations, assignments, and data flow, and resolves column-level lineage across procedural boundaries. When you feed it a PL/SQL package or a T-SQL stored procedure, the output is not just “this procedure touches these tables” — it is a complete column-to-column lineage map that shows exactly how data flows from source to target.

You can see this in action with SQLFlow, which visualizes the lineage extracted by General SQL Parser. Upload a stored procedure and you will see the column-level connections rendered as a data flow diagram, with each transformation step visible.

Try It Yourself

If stored procedure lineage is a gap in your current tooling, here is what you can do:

  • Visualize your procedures: Upload a stored procedure to the SQLFlow online demo and see the column-level lineage diagram it produces.
  • Test your edge cases: Try the patterns that break other tools — BULK COLLECT + FORALL, temp tables in T-SQL, PL/pgSQL RETURN QUERY. These are the cases where parser quality shows.
  • Analyze in your IDE: If you use VS Code, Gudu SQL Omni brings column-level lineage, ER diagrams, and impact analysis directly into your editor — 100% offline, 34 dialects, free trial. Install from the VS Code Marketplace.
  • Evaluate against your codebase: General SQL Parser supports Oracle PL/SQL, SQL Server T-SQL, PostgreSQL PL/pgSQL, MySQL stored procedures, and DB2 SQL PL. Details and downloads are at gudusoft.com.
  • Stored procedures are not going away. If anything, the push toward database-side logic for performance and security is growing. A lineage solution that skips them is a lineage solution with blind spots — and in an era of data governance and regulatory compliance, blind spots are not acceptable.