Understanding SQLFlow Job: Automate Your Data Lineage Analysis at Scale

When dealing with large-scale data environments, analyzing data lineage manually becomes impractical. You might have hundreds of SQL files across different repositories, databases, and ETL processes. This is where SQLFlow Job comes into play – a powerful feature that enables batch processing of SQL files for automated data lineage discovery.

In this article, we’ll explore two ways to create and manage SQLFlow Jobs: through the intuitive Web UI and programmatically via the REST API.

Part 1: Creating Jobs Through the SQLFlow UI

The SQLFlow web interface provides a user-friendly way to create and manage jobs without writing any code. This is perfect for ad-hoc analysis, exploration, and users who prefer a visual approach.

Step 1: Access the Job Creation Panel

After logging into SQLFlow, navigate to the Job List section and click the Job Creation button.

Step 2: Choose Your Job Source

SQLFlow supports multiple data sources for job creation:

  • Upload File: Upload SQL files or a ZIP archive containing multiple SQL files (up to 200MB). This is ideal for analyzing DDL scripts, stored procedures, or any SQL codebase.
  • From Database: Connect directly to your database server. SQLFlow will read metadata and SQL objects to generate lineage automatically.
  • Upload File + Database Metadata: Combine uploaded SQL files with database metadata to resolve column ambiguities for more accurate lineage.
  • dbt: Import lineage from your dbt transformation projects using manifest.json and catalog.json files.
  • Snowflake Query History: Analyze lineage from Snowflake query history automatically.
  • Redshift Log: Parse Amazon Redshift logs (.gz or .zip format) for lineage discovery.

Step 3: Configure Job Parameters

Set up your job with these options:

  • Database Vendor: Select your SQL dialect (Oracle, SQL Server, MySQL, PostgreSQL, Snowflake, etc.)
  • Default Server/Database/Schema: Set default values for unqualified object references
  • Job Type: Choose between regular job or summary mode for large datasets
  • Advanced Settings: Filter specific schemas, stored procedures, or views to include/exclude

Step 4: View and Manage Results

Once the job completes, you can:

  • Lineage Overview: View table-level lineage across your entire database
  • Lineage Detail: Drill down into column-level relationships
  • Job Info: Check metadata like creation time, execution time, and job status
  • Export: Download results in JSON, CSV, or GraphML formats

Part 2: Creating Jobs Through the REST API

For automation, CI/CD integration, and programmatic access, SQLFlow provides a comprehensive REST API. This approach is ideal for scheduled lineage scans, integration with data pipelines, and enterprise automation scenarios.

API Authentication

First, obtain your API credentials (userId and token) from your SQLFlow account settings. See the prerequisites documentation for details.

Submit a Job

Use the /submitUserJob endpoint to upload SQL files:

curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/submitUserJob" \
  -H "Content-Type:multipart/form-data" \
  -F "userId=YOUR_USER_ID" \
  -F "token=YOUR_TOKEN" \
  -F "sqlfiles=@/path/to/queries.sql" \
  -F "sqlfiles=@/path/to/procedures.sql" \
  -F "dbvendor=dbvoracle" \
  -F "jobName=my-lineage-analysis"

The API returns a jobId for tracking:

{
  "code": 200,
  "data": {
    "jobId": "c359aef4bd9641d697732422debd8055",
    "jobName": "my-lineage-analysis",
    "status": "create"
  }
}

Check Job Status

Monitor progress with /displayUserJobSummary:

curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/displayUserJobSummary" \
  -F "jobId=c359aef4bd9641d697732422debd8055" \
  -F "userId=YOUR_USER_ID" \
  -F "token=YOUR_TOKEN"

When status becomes success, your lineage is ready.

Export Lineage Results

Download results in your preferred format:

# JSON format
curl -X POST ".../exportLineageAsJson" -F "jobId=..." --output lineage.json

# CSV format  
curl -X POST ".../exportFullLineageAsCsv" -F "jobId=..." --output lineage.csv

# GraphML format (for yEd visualization)
curl -X POST ".../exportLineageAsGraphml" -F "jobId=..." --output lineage.graphml

Incremental Jobs

For evolving codebases, use incremental analysis with /submitPersistJob:

curl -X POST ".../submitPersistJob" \
  -F "incremental=true" \
  -F "jobId=EXISTING_JOB_ID" \
  -F "sqlfiles=@/path/to/updated_queries.sql" \
  ...

This updates the existing lineage graph without re-analyzing everything.

When to Use UI vs API

ScenarioRecommended Approach
Ad-hoc analysis and explorationUI
One-time database lineage scanUI
CI/CD pipeline integrationAPI
Scheduled nightly lineage scansAPI
Integration with data catalog toolsAPI
Quick visualization needsUI

Real-World Use Cases

  1. Data Governance Automation: Schedule nightly API jobs to scan your data warehouse SQL and detect lineage changes
  2. CI/CD Integration: Trigger lineage analysis when SQL changes are committed to your repository
  3. Migration Projects: Use the UI to batch analyze legacy stored procedures before modernization
  4. Compliance Audits: Generate lineage reports in CSV format for regulatory requirements
  5. Impact Analysis: Before modifying a table, understand all downstream dependencies

Getting Started

Ready to try SQLFlow Job? Here’s how to get started:

  1. Sign up at sqlflow.gudusoft.com for a free trial
  2. Try the UI: Upload a SQL file and explore the interactive lineage visualization
  3. Explore the API: Check the Job API documentation for code samples

Whether you prefer the visual approach or need full automation, SQLFlow Job provides the flexibility to handle data lineage analysis at any scale.

Try SQLFlow today – your data lineage journey starts here.