Understanding SQLFlow Job: Automate Your Data Lineage Analysis at Scale
When dealing with large-scale data environments, analyzing data lineage manually becomes impractical. You might have hundreds of SQL files across different repositories, databases, and ETL processes. This is where SQLFlow Job comes into play – a powerful feature that enables batch processing of SQL files for automated data lineage discovery.
In this article, we’ll explore two ways to create and manage SQLFlow Jobs: through the intuitive Web UI and programmatically via the REST API.
Part 1: Creating Jobs Through the SQLFlow UI
The SQLFlow web interface provides a user-friendly way to create and manage jobs without writing any code. This is perfect for ad-hoc analysis, exploration, and users who prefer a visual approach.
Step 1: Access the Job Creation Panel
After logging into SQLFlow, navigate to the Job List section and click the Job Creation button.
Step 2: Choose Your Job Source
SQLFlow supports multiple data sources for job creation:
- Upload File: Upload SQL files or a ZIP archive containing multiple SQL files (up to 200MB). This is ideal for analyzing DDL scripts, stored procedures, or any SQL codebase.
- From Database: Connect directly to your database server. SQLFlow will read metadata and SQL objects to generate lineage automatically.
- Upload File + Database Metadata: Combine uploaded SQL files with database metadata to resolve column ambiguities for more accurate lineage.
- dbt: Import lineage from your dbt transformation projects using manifest.json and catalog.json files.
- Snowflake Query History: Analyze lineage from Snowflake query history automatically.
- Redshift Log: Parse Amazon Redshift logs (.gz or .zip format) for lineage discovery.
Step 3: Configure Job Parameters
Set up your job with these options:
- Database Vendor: Select your SQL dialect (Oracle, SQL Server, MySQL, PostgreSQL, Snowflake, etc.)
- Default Server/Database/Schema: Set default values for unqualified object references
- Job Type: Choose between regular job or summary mode for large datasets
- Advanced Settings: Filter specific schemas, stored procedures, or views to include/exclude
Step 4: View and Manage Results
Once the job completes, you can:
- Lineage Overview: View table-level lineage across your entire database
- Lineage Detail: Drill down into column-level relationships
- Job Info: Check metadata like creation time, execution time, and job status
- Export: Download results in JSON, CSV, or GraphML formats
Part 2: Creating Jobs Through the REST API
For automation, CI/CD integration, and programmatic access, SQLFlow provides a comprehensive REST API. This approach is ideal for scheduled lineage scans, integration with data pipelines, and enterprise automation scenarios.
API Authentication
First, obtain your API credentials (userId and token) from your SQLFlow account settings. See the prerequisites documentation for details.
Submit a Job
Use the /submitUserJob endpoint to upload SQL files:
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/submitUserJob" \
-H "Content-Type:multipart/form-data" \
-F "userId=YOUR_USER_ID" \
-F "token=YOUR_TOKEN" \
-F "sqlfiles=@/path/to/queries.sql" \
-F "sqlfiles=@/path/to/procedures.sql" \
-F "dbvendor=dbvoracle" \
-F "jobName=my-lineage-analysis"
The API returns a jobId for tracking:
{
"code": 200,
"data": {
"jobId": "c359aef4bd9641d697732422debd8055",
"jobName": "my-lineage-analysis",
"status": "create"
}
}
Check Job Status
Monitor progress with /displayUserJobSummary:
curl -X POST "https://api.gudusoft.com/gspLive_backend/sqlflow/job/displayUserJobSummary" \
-F "jobId=c359aef4bd9641d697732422debd8055" \
-F "userId=YOUR_USER_ID" \
-F "token=YOUR_TOKEN"
When status becomes success, your lineage is ready.
Export Lineage Results
Download results in your preferred format:
# JSON format
curl -X POST ".../exportLineageAsJson" -F "jobId=..." --output lineage.json
# CSV format
curl -X POST ".../exportFullLineageAsCsv" -F "jobId=..." --output lineage.csv
# GraphML format (for yEd visualization)
curl -X POST ".../exportLineageAsGraphml" -F "jobId=..." --output lineage.graphml
Incremental Jobs
For evolving codebases, use incremental analysis with /submitPersistJob:
curl -X POST ".../submitPersistJob" \
-F "incremental=true" \
-F "jobId=EXISTING_JOB_ID" \
-F "sqlfiles=@/path/to/updated_queries.sql" \
...
This updates the existing lineage graph without re-analyzing everything.
When to Use UI vs API
| Scenario | Recommended Approach |
|---|---|
| Ad-hoc analysis and exploration | UI |
| One-time database lineage scan | UI |
| CI/CD pipeline integration | API |
| Scheduled nightly lineage scans | API |
| Integration with data catalog tools | API |
| Quick visualization needs | UI |
Real-World Use Cases
- Data Governance Automation: Schedule nightly API jobs to scan your data warehouse SQL and detect lineage changes
- CI/CD Integration: Trigger lineage analysis when SQL changes are committed to your repository
- Migration Projects: Use the UI to batch analyze legacy stored procedures before modernization
- Compliance Audits: Generate lineage reports in CSV format for regulatory requirements
- Impact Analysis: Before modifying a table, understand all downstream dependencies
Getting Started
Ready to try SQLFlow Job? Here’s how to get started:
- Sign up at sqlflow.gudusoft.com for a free trial
- Try the UI: Upload a SQL file and explore the interactive lineage visualization
- Explore the API: Check the Job API documentation for code samples
Whether you prefer the visual approach or need full automation, SQLFlow Job provides the flexibility to handle data lineage analysis at any scale.
Try SQLFlow today – your data lineage journey starts here.

