Enterprises should not let LLMs execute SQL directly because generated queries need deterministic validation, permission checks, risk scoring, and audit before reaching a database.
A deep technical guide to SQL parser ASTs, bound ASTs, Logical IR, Semantic IR, logical plans, and relational algebra for parser, lineage, and governance engineers.
DataHub silently drops column-level lineage on the dbt-utils deduplicate macro because of how sqlglot's column resolver handles ARRAY_AGG + struct unpack. Here's why — and an open-source post-processor that recovers the missing lineage.
How a BigQuery-aware SQL parser can extract column-level lineage from ARRAY<STRUCT>, SELECT AS STRUCT, and array_agg(row) patterns without any catalog metadata — and where generic parsers silently fail.
OpenMetadata issues #16737, #25299, and #17586 report zero lineage from MSSQL stored procedures. We analyze three failure patterns — BEGIN/END blocks, temp table chains, and square bracket identifiers — with real SQL from the community, and show how gsp-openmetadata-sidecar recovers full column-level lineage.
Deep analysis of 4 Power BI M-language queries from DataHub issue #15327. Both patterns solved: Pattern A (M navigation chains) produces table-level lineage; Pattern B (Value.NativeQuery with embedded SQL) produces column-level lineage with 6 column mappings traced through expressions.
Power BI encodes newlines as #(lf) in M-language SQL. When DataHub parses queries with -- comments, it silently drops all subsequent JOINs from lineage. The gsp-datahub-sidecar recovers every missing relationship.
DataHub's BigQuery ingestion silently loses lineage on procedural SQL (DECLARE, IF/THEN, CALL). This post explains why, and shows how to recover the missing lineage using an open-source sidecar tool.
Oracle-to-Snowflake conversion tools handle syntax translation, but skip dependency analysis. Learn how to use GSP and SQLFlow to map PL/SQL call graphs, column-level lineage, and vendor-specific constructs before you start converting.