{"id":3208,"date":"2026-05-01T10:19:12","date_gmt":"2026-05-01T02:19:12","guid":{"rendered":"https:\/\/www.dpriver.com\/blog\/?p=3208"},"modified":"2026-05-01T10:39:48","modified_gmt":"2026-05-01T02:39:48","slug":"bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types","status":"publish","type":"post","link":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/","title":{"rendered":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone"},"content":{"rendered":"\n<p>If you are searching for a <strong>column-level lineage solution for BigQuery<\/strong>, you have probably already discovered the part of the problem nobody talks about up front: BigQuery is a <em>typed<\/em>, <em>nested<\/em> SQL dialect. The moment your warehouse adopts <code>STRUCT<\/code>, <code>ARRAY&lt;STRUCT&gt;<\/code>, <code>SELECT AS STRUCT<\/code>, or <code>array_agg(table_alias)<\/code>, a parser that does not know the schema can stop dead, give up on the column, or \u2014 worse \u2014 silently produce wrong lineage.<\/p>\n\n\n\n<p>This post walks through two real BigQuery patterns and shows what a SQL parser can actually infer from the SQL text alone, with <strong>no access to the BigQuery catalog, no INFORMATION_SCHEMA, no dbt manifest<\/strong>. We will use it to explain why generic SQL parsers struggle with BigQuery lineage, and what it takes to get column-level lineage right when metadata is missing.<\/p>\n\n\n\n<p>The two patterns we will analyze:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n\n<li><code>ARRAY(SELECT AS STRUCT ...)<\/code> \u2014 building a nested <code>ARRAY&lt;STRUCT&gt;<\/code> column from a flat input row.<\/li>\n\n<li><code>array_agg(original ORDER BY ... LIMIT 1)[OFFSET(0)] unique<\/code> followed by <code>SELECT unique.*<\/code> \u2014 the dbt-utils-style deduplication pattern that flattens a row STRUCT back into top-level columns.<\/li>\n<\/ol>\n\n\n\n\n<p>Both produce <em>correct<\/em> BigQuery lineage even without metadata \u2014 but only if the parser understands BigQuery semantics, not just BigQuery grammar.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why BigQuery lineage is harder than &#8220;normal&#8221; SQL lineage<\/h2>\n\n\n\n<p>A column-level lineage extractor for plain ANSI SQL has one job per output column: walk the <code>SELECT<\/code> list, resolve each expression back to a base table column. That breaks down in BigQuery for three reasons:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>Output columns are not always scalar.<\/strong> A <code>SELECT AS STRUCT<\/code> expression produces a single output column whose type is itself a record. A downstream view&#8217;s <code>column_name<\/code> may actually expand into a dozen physical fields.<\/li>\n\n<li><strong>Row variables behave like STRUCTs.<\/strong> BigQuery lets you write <code>array_agg(table_alias)<\/code> and get back <code>ARRAY&lt;STRUCT&gt;<\/code> \u2014 without ever naming the columns in the SQL.<\/li>\n\n<li><strong><code>STRUCT.<em><\/code> is field expansion, not column projection.<\/strong> The same syntax (<code>alias.<\/em><\/code>) means very different things depending on whether <code>alias<\/code> is a table or a STRUCT-typed expression.<\/li>\n<\/ul>\n\n\n\n\n<p>If your lineage tool was originally built for Postgres or MySQL, every one of those is a hole. Most popular open-source parsers \u2014 JSQLParser, sqlglot, sqlparse \u2014 handle two of the three at best, and even then only with metadata. Without metadata, generic parsers either drop the column from lineage entirely or attribute it to the wrong source.<\/p>\n\n\n\n<p>The good news: <strong>a lot more is recoverable from the SQL alone than people assume<\/strong>, as long as the parser knows BigQuery&#8217;s nested-type rules.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Pattern 1: <code>ARRAY(SELECT AS STRUCT ...)<\/code> builds a nested column<\/h2>\n\n\n\n<p>Here is a typical BigQuery view that flattens an <code>orders_raw<\/code> table into a single nested <code>orders<\/code> column:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE VIEW orders_view AS\nSELECT\n    ARRAY(SELECT AS STRUCT\n        customer.name AS customer_name,\n        order_id      AS order_id\n    ) AS orders\nFROM\n    orders_raw;<\/code><\/pre>\n\n\n\n<p>Read by a metadata-free parser, this SQL says quite a lot.<\/p>\n\n\n\n<p><strong>Step 1 \u2014 the inner query is a STRUCT constructor.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT AS STRUCT\n    customer.name AS customer_name,\n    order_id      AS order_id<\/code><\/pre>\n\n\n\n<p><code>SELECT AS STRUCT<\/code> is a BigQuery-specific keyword. It is <em>not<\/em> &#8220;select two columns&#8221; \u2014 it is &#8220;build one row of type <code>STRUCT<\/code>&#8220;. The parser does not need to know whether <code>customer.name<\/code> is <code>STRING<\/code> or <code>BYTES<\/code>; it knows for certain that <code>customer_name<\/code> and <code>order_id<\/code> are <em>fields of a STRUCT<\/em>. That&#8217;s a semantic guarantee from the grammar itself.<\/p>\n\n\n\n<p><strong>Step 2 \u2014 <code>ARRAY(...)<\/code> wraps the STRUCT in an ARRAY.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>ARRAY(SELECT AS STRUCT ...)<\/code><\/pre>\n\n\n\n<p>The result type is <code>ARRAY&lt;STRUCT&gt;<\/code>. Again, the leaf scalar types are unknown without metadata, but the <em>shape<\/em> is fully determined by the SQL.<\/p>\n\n\n\n<p><strong>Step 3 \u2014 the outer alias names a single column.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>) AS orders<\/code><\/pre>\n\n\n\n<p>So <code>orders_view<\/code> has exactly one top-level column named <code>orders<\/code>. Its type form is:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>orders ARRAY&lt;STRUCT&lt;customer_name ?, order_id ?&gt;&gt;<\/code><\/pre>\n\n\n\n<h3 class=\"wp-block-heading\">What lineage can we extract without metadata?<\/h3>\n\n\n\n<p>A BigQuery-aware parser can produce this column-level lineage:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>orders_raw.customer.name -&gt; orders_view.orders.customer_name\norders_raw.order_id      -&gt; orders_view.orders.order_id<\/code><\/pre>\n\n\n\n<p>That is <em>real<\/em>, <em>navigable<\/em>, <em>nested<\/em> column-level lineage from the SQL text \u2014 no schema lookup required. The only thing the parser cannot prove from SQL alone is the leaf scalar type (<code>STRING<\/code>? <code>INT64<\/code>?) and whether <code>orders_raw.customer<\/code> is itself a <code>STRUCT<\/code> in the source table. Those need either metadata or BigQuery&#8217;s own type checker at execution time.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What a generic parser typically gets wrong<\/h3>\n\n\n\n<p>Without BigQuery semantics, a generic parser tends to do one of these:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li>Treat <code>ARRAY(SELECT AS STRUCT ...)<\/code> as an unknown function and drop the lineage.<\/li>\n\n<li>Emit a single edge from <code>orders_raw -&gt; orders_view.orders<\/code> with no internal field structure, hiding the fact that <code>customer_name<\/code> and <code>order_id<\/code> are <em>separate<\/em> fields downstream.<\/li>\n\n<li>Flatten the STRUCT incorrectly, producing two top-level columns <code>customer_name<\/code> and <code>order_id<\/code> on <code>orders_view<\/code> \u2014 which is wrong: there is only one top-level column called <code>orders<\/code>.<\/li>\n<\/ul>\n\n\n\n\n<p>If your data catalog tells you <code>orders_view<\/code> has columns <code>customer_name<\/code> and <code>order_id<\/code>, the parser feeding it has misread the SQL.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Pattern 2: <code>array_agg(row)[OFFSET(0)] unique<\/code> followed by <code>unique.*<\/code><\/h2>\n\n\n\n<p>This is the canonical dbt-utils deduplication pattern, which is one of the most common SQL shapes in modern BigQuery warehouses:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>CREATE TABLE deduplicated_articles AS\n  SELECT unique.*\n  FROM (\n       SELECT\n           array_agg(\n               original\n                   ORDER BY article_name DESC\n                   LIMIT 1\n           )[OFFSET(0)] unique\n       FROM all_articles original\n       GROUP BY id\n  );<\/code><\/pre>\n\n\n\n<p>This SQL is harder for a generic parser because <em>no individual columns are ever named<\/em> in the projection. And yet, a BigQuery-aware parser can still extract correct lineage from the structure.<\/p>\n\n\n\n<p><strong>Step 1 \u2014 <code>original<\/code> is a row variable, not a column.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>FROM all_articles original<\/code><\/pre>\n\n\n\n<p><code>original<\/code> is the table alias for <code>all_articles<\/code>. When passed as a value expression (instead of with a <code>.column<\/code> access), it represents the entire row, which BigQuery treats as a STRUCT containing every column of <code>all_articles<\/code>.<\/p>\n\n\n\n<p><strong>Step 2 \u2014 <code>array_agg(original)<\/code> aggregates whole rows.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>array_agg(original ORDER BY article_name DESC LIMIT 1)<\/code><\/pre>\n\n\n\n<p>This produces <code>ARRAY&lt;STRUCT&gt;<\/code>. The <code>ORDER BY ... LIMIT 1<\/code> keeps the top row per group.<\/p>\n\n\n\n<p><strong>Step 3 \u2014 <code>[OFFSET(0)]<\/code> extracts a STRUCT.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>array_agg(...)[OFFSET(0)] unique<\/code><\/pre>\n\n\n\n<p>Now <code>unique<\/code> is a single STRUCT containing every column of <code>all_articles<\/code>. Crucially, the parser knows this <em>without<\/em> knowing what those columns are.<\/p>\n\n\n\n<p><strong>Step 4 \u2014 <code>unique.*<\/code> flattens the STRUCT into top-level columns.<\/strong><\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>SELECT unique.*<\/code><\/pre>\n\n\n\n<p>This is the syntactic twist that breaks naive parsers. <code>unique.<em><\/code> does <\/em>not* mean &#8220;select the column named <code>unique<\/code>&#8220;. It means &#8220;expand every field of the STRUCT <code>unique<\/code> into its own top-level column&#8221;. So <code>deduplicated_articles<\/code> ends up with the same top-level column set as <code>all_articles<\/code>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What lineage can we extract without metadata?<\/h3>\n\n\n\n<p>The parser can confidently emit:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>all_articles.* -&gt; deduplicated_articles.*<\/code><\/pre>\n\n\n\n<p>That is, &#8220;every column of <code>all_articles<\/code> flows into <code>deduplicated_articles<\/code>, picking the row with the largest <code>article_name<\/code> per <code>id<\/code>&#8220;. The parser can also note that <code>article_name<\/code> participates in the row-selection logic (for impact analysis), even though it does not change the <em>value<\/em> lineage.<\/p>\n\n\n\n<p>What the parser <em>cannot<\/em> do without metadata is enumerate the exact column names. If you need <code>deduplicated_articles.article_name -&gt; all_articles.article_name<\/code> as a concrete edge, you need either the schema or a metadata-aware second pass.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Why this pattern defeats most generic parsers<\/h3>\n\n\n\n<p>Three reasons:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li>They treat <code>original<\/code> (a bare table alias used as a value) as an error or as an unresolved column.<\/li>\n\n<li>They treat <code>[OFFSET(0)]<\/code> after a function call as unknown syntax.<\/li>\n\n<li>They treat <code>unique.<em><\/code> like <code>table_alias.<\/em><\/code> and look for a <em>table<\/em> called <code>unique<\/code>, fail to find one, and drop the projection.<\/li>\n<\/ul>\n\n\n\n\n<p>Any one of those failures collapses the entire lineage edge.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What a BigQuery-aware parser needs to know<\/h2>\n\n\n\n<p>Stepping back, here is the irreducible set of BigQuery semantics a metadata-free lineage tool must implement:<\/p>\n\n\n\n<p><figure class=\"wp-block-table\"><table><thead><tr><th style=\"padding:8px 12px;border:1px solid #ddd;background:#f0f4f8;text-align:left\">Syntax<\/th><th style=\"padding:8px 12px;border:1px solid #ddd;background:#f0f4f8;text-align:left\">Semantic<\/th><th style=\"padding:8px 12px;border:1px solid #ddd;background:#f0f4f8;text-align:left\">What lineage gains<\/th><\/tr><\/thead><tbody><tr><td style=\"padding:8px 12px;border:1px solid #ddd\"><code>SELECT AS STRUCT<\/code><\/td><td style=\"padding:8px 12px;border:1px solid #ddd\">Build a single STRUCT row, not multiple columns<\/td><td style=\"padding:8px 12px;border:1px solid #ddd\">Output is one column whose fields are named in the SELECT list<\/td><\/tr><tr><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\"><code>ARRAY(SELECT AS STRUCT ...)<\/code><\/td><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\">Wrap STRUCT rows into ARRAY&lt;STRUCT&lt;&#8230;&gt;&gt;<\/td><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\">Output column type form is fully recoverable<\/td><\/tr><tr><td style=\"padding:8px 12px;border:1px solid #ddd\"><code>array_agg(table_alias)<\/code><\/td><td style=\"padding:8px 12px;border:1px solid #ddd\">Aggregate whole rows; result is ARRAY&lt;row STRUCT&gt;<\/td><td style=\"padding:8px 12px;border:1px solid #ddd\">Whole-row flow is detectable even without column enumeration<\/td><\/tr><tr><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\"><code>arr[OFFSET(n)]<\/code><\/td><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\">Extract one element of an ARRAY<\/td><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\">Result type is the array element type (often STRUCT)<\/td><\/tr><tr><td style=\"padding:8px 12px;border:1px solid #ddd\"><code>struct_alias.*<\/code><\/td><td style=\"padding:8px 12px;border:1px solid #ddd\">Expand STRUCT fields into top-level columns<\/td><td style=\"padding:8px 12px;border:1px solid #ddd\">Disambiguates &#8220;select all from table&#8221; vs &#8220;explode this STRUCT&#8221;<\/td><\/tr><tr><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\"><code>nested.path.access<\/code><\/td><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\">Walk into nested STRUCT fields<\/td><td style=\"padding:8px 12px;border:1px solid #ddd;background:#f9fafb\">Edges target the specific leaf field, not the parent column<\/td><\/tr><\/tbody><\/table><\/figure><\/p>\n\n\n\n<p>If a parser does not implement at least these six semantics, BigQuery column-level lineage will be either incomplete or misleading whenever your team uses nested types \u2014 which, in modern BigQuery warehouses, is most of the time.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">A simple memory aid for reading BigQuery SQL<\/h2>\n\n\n\n<p>When eyeballing BigQuery SQL for lineage, three patterns are worth memorizing:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li>See <code>SELECT AS STRUCT<\/code> \u2192 the SQL is <em>constructing<\/em> a STRUCT.<\/li>\n\n<li>See <code>ARRAY(SELECT AS STRUCT ...)<\/code> \u2192 the result column is <code>ARRAY&lt;STRUCT&gt;<\/code>.<\/li>\n\n<li>See <code>table_alias<\/code> used as a <em>value<\/em> (e.g. <code>array_agg(original)<\/code>) \u2192 the value is the entire row as a STRUCT.<\/li>\n\n<li>See <code>alias.<em><\/code> where <code>alias<\/code> is a STRUCT-typed expression \u2192 fields are being <\/em>flattened* into top-level columns, not selecting from a table.<\/li>\n<\/ul>\n\n\n\n\n<p>Anywhere those patterns appear, a metadata-free parser can still recover the lineage <em>shape<\/em>. What it cannot recover is the leaf scalar types, the full column list of a wildcard-flattened STRUCT, or the schema of the underlying table \u2014 those still need metadata or a second resolution pass.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How Gudu&#8217;s SQL engine handles it<\/h2>\n\n\n\n<p>All three of Gudu Software&#8217;s products \u2014 <a href=\"https:\/\/www.sqlparser.com\/?utm_source=dpriver_blog&amp;utm_medium=referral&amp;utm_campaign=bigquery_struct_lineage\">General SQL Parser (GSP)<\/a>, <a href=\"https:\/\/www.gudusoft.com\/?utm_source=dpriver_blog&amp;utm_medium=referral&amp;utm_campaign=bigquery_struct_lineage\">SQLFlow<\/a>, and <a href=\"https:\/\/gudu-sql-omni.gudusoft.com\/?utm_source=dpriver_blog&amp;utm_medium=referral&amp;utm_campaign=bigquery_struct_lineage\">SQL Omni<\/a> \u2014 share one column-level lineage engine. They differ in how you use it:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li><strong>GSP<\/strong> is the SDK: embed lineage extraction in a Java application or data platform.<\/li>\n\n<li><strong>SQLFlow<\/strong> is the hosted lineage platform with REST APIs and visualization.<\/li>\n\n<li><strong>SQL Omni<\/strong> is the VS Code extension for individual developers, 100% offline.<\/li>\n<\/ul>\n\n\n\n\n<p>The engine implements BigQuery&#8217;s nested-type semantics \u2014 <code>SELECT AS STRUCT<\/code>, <code>ARRAY(...)<\/code>, row-as-STRUCT in <code>array_agg<\/code>, <code>[OFFSET(n)]<\/code> array indexing, and <code>STRUCT.<em><\/code> field expansion \u2014 so that the two patterns in this post produce correct lineage with or without metadata. When metadata <\/em>is<em> available (via SQLFlow&#8217;s database integrations or a supplied schema), the engine fills in concrete column names where wildcards appeared. When metadata is <\/em>not* available, you still get the structural lineage shown above, rather than silent gaps.<\/p>\n\n\n\n<p>Disclosure: I work at Gudu Software.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Try it yourself<\/h2>\n\n\n\n<p>Paste either of the SQL samples in this post into the <a href=\"https:\/\/sqlflow.gudusoft.com\/?utm_source=dpriver_blog&amp;utm_medium=referral&amp;utm_campaign=bigquery_struct_lineage\">SQLFlow online demo<\/a> and pick <em>BigQuery<\/em> as the dialect. You&#8217;ll see column-level lineage rendered as a graph, including the nested <code>orders.customer_name<\/code> and <code>orders.order_id<\/code> edges from Pattern 1, and the <code>all_articles.<em> -&gt; deduplicated_articles.<\/em><\/code> whole-row flow from Pattern 2.<\/p>\n\n\n\n<p>If you&#8217;d rather embed the engine in your own pipeline, the <a href=\"https:\/\/www.sqlparser.com\/sql-parser-java.php?utm_source=dpriver_blog&amp;utm_medium=referral&amp;utm_campaign=bigquery_struct_lineage\">GSP BigQuery quickstart<\/a> shows how to extract the same lineage as JSON in Java. For VS Code users on regulated stacks, <a href=\"https:\/\/marketplace.visualstudio.com\/items?itemName=gudusoftware.gudu-sql-omni\">SQL Omni<\/a> runs the same analysis 100% offline.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Takeaways<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n\n<li>BigQuery column-level lineage is harder than ANSI lineage because of <code>STRUCT<\/code>, <code>ARRAY&lt;STRUCT&gt;<\/code>, row-as-STRUCT, and <code>STRUCT.*<\/code> field expansion.<\/li>\n\n<li>A <em>lot<\/em> is recoverable from SQL text alone, including the type <em>shape<\/em> of nested output columns, the field names of <code>SELECT AS STRUCT<\/code>, and whole-row flows from <code>array_agg(row)<\/code>.<\/li>\n\n<li>A <em>little<\/em> is genuinely impossible without metadata: leaf scalar types and the column list of any wildcard-flattened STRUCT.<\/li>\n\n<li>A generic SQL parser that does not implement these BigQuery semantics will produce either silent gaps or wrong attributions on any modern BigQuery warehouse \u2014 which is why a BigQuery-specialized parser matters when you&#8217;re choosing a column-level lineage solution.<\/li>\n<\/ul>\n\n","protected":false},"excerpt":{"rendered":"<p>How a BigQuery-aware SQL parser can extract column-level lineage from ARRAY&lt;STRUCT&gt;, SELECT AS STRUCT, and array_agg(row) patterns without any catalog metadata \u2014 and where generic parsers silently fail.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[66,14,8,10,93],"tags":[148,117,152,119,151,150,149,29,147],"blocksy_meta":{"styles_descriptor":{"styles":{"desktop":"","tablet":"","mobile":""},"google_fonts":[],"version":5}},"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v19.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone<\/title>\n<meta name=\"description\" content=\"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone\" \/>\n<meta property=\"og:description\" content=\"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/\" \/>\n<meta property=\"og:site_name\" content=\"SQL and Data Blog\" \/>\n<meta property=\"article:published_time\" content=\"2026-05-01T02:19:12+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-05-01T02:39:48+00:00\" \/>\n<meta name=\"author\" content=\"James\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"James\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#organization\",\"name\":\"SQL and Data Blog\",\"url\":\"https:\/\/www.dpriver.com\/blog\/\",\"sameAs\":[],\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png\",\"contentUrl\":\"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png\",\"width\":251,\"height\":72,\"caption\":\"SQL and Data Blog\"},\"image\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#website\",\"url\":\"https:\/\/www.dpriver.com\/blog\/\",\"name\":\"SQL and Data Blog\",\"description\":\"SQL related blog for database professional\",\"publisher\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.dpriver.com\/blog\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/\",\"url\":\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/\",\"name\":\"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone\",\"isPartOf\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#website\"},\"datePublished\":\"2026-05-01T02:19:12+00:00\",\"dateModified\":\"2026-05-01T02:39:48+00:00\",\"description\":\"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone\",\"breadcrumb\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.dpriver.com\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/\"},\"author\":{\"name\":\"James\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04\"},\"headline\":\"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone\",\"datePublished\":\"2026-05-01T02:19:12+00:00\",\"dateModified\":\"2026-05-01T02:39:48+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/\"},\"wordCount\":1567,\"publisher\":{\"@id\":\"https:\/\/www.dpriver.com\/blog\/#organization\"},\"keywords\":[\"array\",\"bigquery\",\"bigquery-lineage\",\"column-lineage\",\"data-lineage\",\"gsp\",\"nested-types\",\"SQLFlow\",\"struct\"],\"articleSection\":[\"Data Governance\",\"gsp\",\"SQL language\",\"SQL skills\",\"SQLFlow\"],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04\",\"name\":\"James\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g\",\"caption\":\"James\"},\"sameAs\":[\"http:\/\/www.dpriver.com\"],\"url\":\"https:\/\/www.dpriver.com\/blog\/author\/james\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone","description":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/","og_locale":"en_US","og_type":"article","og_title":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone","og_description":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone","og_url":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/","og_site_name":"SQL and Data Blog","article_published_time":"2026-05-01T02:19:12+00:00","article_modified_time":"2026-05-01T02:39:48+00:00","author":"James","twitter_card":"summary_large_image","twitter_misc":{"Written by":"James","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/www.dpriver.com\/blog\/#organization","name":"SQL and Data Blog","url":"https:\/\/www.dpriver.com\/blog\/","sameAs":[],"logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png","contentUrl":"https:\/\/www.dpriver.com\/blog\/wp-content\/uploads\/2022\/07\/sqlpp-character.png","width":251,"height":72,"caption":"SQL and Data Blog"},"image":{"@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/logo\/image\/"}},{"@type":"WebSite","@id":"https:\/\/www.dpriver.com\/blog\/#website","url":"https:\/\/www.dpriver.com\/blog\/","name":"SQL and Data Blog","description":"SQL related blog for database professional","publisher":{"@id":"https:\/\/www.dpriver.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.dpriver.com\/blog\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/","url":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/","name":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone","isPartOf":{"@id":"https:\/\/www.dpriver.com\/blog\/#website"},"datePublished":"2026-05-01T02:19:12+00:00","dateModified":"2026-05-01T02:39:48+00:00","description":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone","breadcrumb":{"@id":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.dpriver.com\/blog\/"},{"@type":"ListItem","position":2,"name":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone"}]},{"@type":"Article","@id":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/#article","isPartOf":{"@id":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/"},"author":{"name":"James","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04"},"headline":"BigQuery Column-Level Lineage Without Metadata: Inferring STRUCT and ARRAY Types from SQL Alone","datePublished":"2026-05-01T02:19:12+00:00","dateModified":"2026-05-01T02:39:48+00:00","mainEntityOfPage":{"@id":"https:\/\/www.dpriver.com\/blog\/2026\/05\/bigquery-column-level-lineage-without-metadata-inferring-struct-and-array-types\/"},"wordCount":1567,"publisher":{"@id":"https:\/\/www.dpriver.com\/blog\/#organization"},"keywords":["array","bigquery","bigquery-lineage","column-lineage","data-lineage","gsp","nested-types","SQLFlow","struct"],"articleSection":["Data Governance","gsp","SQL language","SQL skills","SQLFlow"],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/7bbdbb6e79c5dd9747d08c59d5992b04","name":"James","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.dpriver.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/eeddf4ca7bdafa37ab025068efdc7302?s=96&d=mm&r=g","caption":"James"},"sameAs":["http:\/\/www.dpriver.com"],"url":"https:\/\/www.dpriver.com\/blog\/author\/james\/"}]}},"_links":{"self":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts\/3208"}],"collection":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/comments?post=3208"}],"version-history":[{"count":1,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts\/3208\/revisions"}],"predecessor-version":[{"id":3210,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/posts\/3208\/revisions\/3210"}],"wp:attachment":[{"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/media?parent=3208"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/categories?post=3208"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dpriver.com\/blog\/wp-json\/wp\/v2\/tags?post=3208"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}