DataHub #15327: Why Power BI M-Language Queries Produce Zero Lineage — 4-Query Deep Analysis

Deep analysis of 4 Power BI M-language queries from DataHub issue #15327. Both patterns solved: Pattern A (M navigation chains) produces table-level lineage; Pattern B (Value.NativeQuery with embedded SQL) produces column-level lineage with 6 column mappings traced through expressions.

DataHub issue #15327 reports a common Power BI + DataHub problem: four Snowflake-connected Power BI datasets produce zero upstream lineage. No tables. No columns. The lineage graph just stops at the Power BI boundary.

We analyzed all four M-language queries from the issue, identified two distinct failure patterns, and ran real lineage extraction against each one using GSP’s Power Query M-language parser. Here’s what we found — and how we solve both patterns.

Why DataHub Can’t Parse These Queries

The root cause is that Power BI doesn’t generate SQL — it generates M-language (Power Query Formula Language). M is a fully functional language (closer to F# than SQL) that Power BI uses behind the scenes for all “Get Data” and “Transform Data” operations.

DataHub’s ingestion pipeline expects SQL. When it receives M-language, no SQL parser (sqlglot, sqllineage, or any other) can extract lineage from it because it’s not SQL.

Two Patterns, Both Solved

The four queries in #15327 fall into two fundamentally different categories. GSP’s Power Query M-language extractor (dbvpowerquery) handles both:

Pattern Queries What Happens GSP Result
Pattern A — M Navigation Q1, Q3 Table reference encoded in M’s record-access syntax: {[Name="X", Kind="Database"]}[Data]. No SQL anywhere. Table-level lineage recovered via navigation chain parsing
Pattern B — Value.NativeQuery() Q2, Q4 M wraps actual SQL inside a Value.NativeQuery() call. SQL has M-encoded escapes (#(lf), ""). Column-level lineage recovered via SQL extraction + parsing

Pattern A: Pure M Navigation (Queries 1 & 3)

These queries use M-language’s navigation syntax to reach Snowflake objects — no SQL at all. GSP’s M-language parser walks the navigation chain {[Name="X", Kind="Y"]}[Data] to extract the full database.schema.table path.

Query 1: Navigation to a Snowflake View

Original M-Language:

let
    Source = Snowflake.Databases(SnowFlakeConnector, SnowflakeWarehouse),
    my_Database = Source{[Name="PROD201_DB_redacted", Kind="Database"]}[Data],
    CONSUMPTION_Schema = my_Database{[Name="CONSUMPTION", Kind="Schema"]}[Data],
    Details_View = CONSUMPTION_Schema{[Name="details", Kind="View"]}[Data],
    #"Changed Type1" = Table.TransformColumnTypes(Details_View, ...),
    #"Filtered Rows" = Table.SelectRows(#"Changed Type1", ...)
in #"Filtered Rows"

GSP analysis: The parser identifies Snowflake.Databases as the connector, then walks the navigation chain through three steps:

  • my_Database: {[Name="PROD201_DB_redacted", Kind="Database"]} → Database level
  • CONSUMPTION_Schema: {[Name="CONSUMPTION", Kind="Schema"]} → Schema level
  • Details_View: {[Name="details", Kind="View"]} → View level

GSP result:

Navigation resolved:
  Step: Details_View | Vendor: dbvsnowflake
  Path: PROD201_DB_redacted.CONSUMPTION.details
  Synthetic SQL: SELECT * FROM "PROD201_DB_redacted"."CONSUMPTION"."details"

Upstream table recovered: PROD201_DB_redacted.CONSUMPTION.details (table-level lineage)

Query 3: Navigation to a Snowflake Table + Column Renaming

Original M-Language:

let
    Source = Snowflake.Databases("redacted.snowflakecomputing.com", #"Warehouse name"),
    my_Database = Source{[Name="PROD201_DB_redacted", Kind="Database"]}[Data],
    CONSUMPTION_Schema = my_Database{[Name="CONSUMPTION", Kind="Schema"]}[Data],
    MY_View = CONSUMPTION_Schema{[Name="MESSAGES", Kind="Table"]}[Data],
    #"Renamed Columns" = Table.RenameColumns(MY_View, {{"DATE","Date"}, ...}),
    #"Added Custom" = Table.AddColumn(#"Renamed Columns", "Message Type", ...)
in #"Added Custom"

GSP result:

Navigation resolved:
  Step: MY_View | Vendor: dbvsnowflake
  Path: PROD201_DB_redacted.CONSUMPTION.MESSAGES
  Synthetic SQL: SELECT * FROM "PROD201_DB_redacted"."CONSUMPTION"."MESSAGES"

Upstream table recovered: PROD201_DB_redacted.CONSUMPTION.MESSAGES (table-level lineage). The Table.RenameColumns and Table.AddColumn steps are M-level transformations that don’t change the upstream source.

Pattern B: Embedded SQL (Queries 2 & 4)

These queries wrap real Snowflake SQL inside M’s Value.NativeQuery() function. GSP extracts the SQL string, decodes M-specific escapes, infers the SQL dialect from the inline connector call, and parses the SQL for full column-level lineage.

Query 2: Simple SELECT with #(lf) escape

Original M-Language:

let
    Source = Value.NativeQuery(
        Snowflake.Databases(SnowFlakeServer, SnowFlakeDWH)
            {[Name="PROD201_redacted"]}[Data],
        "SELECT *#(lf)FROM PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY",
        null, [EnableFolding=true])
in Source

Step 1 — Decode M escapes: #(lf) → newline. Step 2 — Infer vendor: The inline Snowflake.Databases() tells GSP this is Snowflake SQL. Step 3 — Parse SQL: SELECT * produces a wildcard column mapping.

GSP result:

NativeQuery resolved:
  Step: Source | Vendor: dbvsnowflake | Parse RC: 0
  Decoded SQL: SELECT *
               FROM PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY
  Column lineage: * → * (all columns flow through)

Upstream: PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY — 1 table-level + 1 column-level (* → *) lineage.

Query 4: Complex Snowflake SQL with GROUP BY ALL, quoted aliases, expressions

Original M-Language (abbreviated):

let
    Source = Value.NativeQuery(
        Snowflake.Databases("redacted.snowflakecomputing.com",
            WAREHOUSE, [Implementation="2.0"])
            {[Name="PROD201_DB_redacted"]}[Data],
        "
SELECT
  SERVICE_START_DATE AS ""Service Date"",
  TO_NUMBER(OPERATIVE_OFFICE) || '_' || statement || '_'
    || VENDOR_CODE || '_' || TO_CHAR(SERVICE_START_DATE, 'YYYYMMDD')
    AS OFFICE_STATEMENT_VENDOR_DATE,
  VENDOR_CODE
FROM PROD201_DB_redacted_DATAMARTS.SALES_MARKETING
     .CV_COMMDX_SALES_TRANSFERS
WHERE SERVICE_START_DATE >= '2023-10-01' ...
GROUP BY ALL
HAVING SUM(COST_OF_SALES_CUR) = 0 AND COUNT(*) > 1
        ", null, [EnableFolding=true]),
    #"TimeFilter" = Table.SelectRows(Source, ...)
in #"TimeFilter"

Step 1 — Decode M escapes: ""Service Date"""Service Date" (Snowflake quoted identifier). Step 2 — Infer vendor: Inline Snowflake.Databases() → Snowflake. Step 3 — Analyze SQL:

  • Column 1: SERVICE_START_DATE AS "Service Date" — simple alias (1 mapping)
  • Column 2: TO_NUMBER(OPERATIVE_OFFICE) || '_' || statement || '_' || VENDOR_CODE || '_' || TO_CHAR(SERVICE_START_DATE, 'YYYYMMDD') AS OFFICE_STATEMENT_VENDOR_DATE — concatenation with 4 source columns traced through TO_NUMBER() and TO_CHAR() (4 mappings)
  • Column 3: VENDOR_CODE — direct pass-through (1 mapping)

GSP result — 6 column-level lineage mappings:

Source Column Target Column Why
SERVICE_START_DATE SERVICE DATE Direct alias: AS "Service Date"
OPERATIVE_OFFICE OFFICE_STATEMENT_VENDOR_DATE Via TO_NUMBER() in concat
STATEMENT OFFICE_STATEMENT_VENDOR_DATE Direct in || concat
VENDOR_CODE OFFICE_STATEMENT_VENDOR_DATE Direct in || concat
SERVICE_START_DATE OFFICE_STATEMENT_VENDOR_DATE Via TO_CHAR() in concat
VENDOR_CODE VENDOR_CODE Direct pass-through

Upstream: PROD201_DB_redacted_DATAMARTS.SALES_MARKETING.CV_COMMDX_SALES_TRANSFERS — 1 table-level + 6 column-level lineages.

Summary: All 4 Queries Solved

Query Pattern Upstream Table DataHub Today With GSP
Q1 A (M Navigation) PROD201_DB_redacted.CONSUMPTION.details 0 lineage Table-level lineage
Q2 B (NativeQuery) PROD201_DB_redacted.CONSUMPTION.CV_OFFER_redacted_CITY 0 lineage 1 table + 1 column (*)
Q3 A (M Navigation) PROD201_DB_redacted.CONSUMPTION.MESSAGES 0 lineage Table-level lineage
Q4 B (NativeQuery) ...DATAMARTS.SALES_MARKETING.CV_COMMDX_SALES_TRANSFERS 0 lineage 1 table + 6 columns

How It Works

GSP’s Power Query module (dbvpowerquery) is a lightweight M-language extractor — not a full M parser. It answers exactly 6 questions about any M document:

  1. What are the let bindings?
  2. Which binding is returned by in?
  3. Does a binding contain Value.NativeQuery()? → Extract SQL, decode escapes, delegate to vendor SQL parser
  4. Does a binding represent a navigation chain? → Extract database.schema.table
  5. What connector function is at the root? → Infer SQL dialect (Snowflake, SQL Server, Oracle, PostgreSQL, MySQL, BigQuery, Redshift, Databricks, Hive)
  6. Which bindings reference other bindings? → Follow step references

Everything else in M (closures, each, Table.* transforms, type system) is parsed permissively — unknown constructs produce warnings, never wrong lineage.

Try It

# Install the sidecar
pip install git+https://github.com/gudusoftware/gsp-datahub-sidecar.git

# Test with your Power BI queries
gsp-datahub-sidecar --mode authenticated \
  --user-id YOUR_USER_ID \
  --secret-key YOUR_SECRET_KEY \
  --sql-file your_query.sql \
  --db-vendor dbvsnowflake \
  --dry-run

Sign up for a free API key at docs.gudusoft.com/sign-up. For a full interactive analysis, see our dedicated analysis page.

Related Issues

  • #15327 — No upstream lineage tracked for specific Power BI tables (this analysis)
  • #11251 — Power BI SQL comments break lineage (#(lf) encoding) — solved
  • #11654 — BigQuery procedural SQL drops lineage — solved

Disclosure: I work at Gudu Software, the company behind GSP SQL Parser, SQLFlow, and the gsp-datahub-sidecar. All lineage results shown were produced by running the M scripts through GSP’s Power Query parser (dbvpowerquery) and verified against the actual M-language queries from DataHub issue #15327.