Gudu SQLFlow supports two kinds of file as the input:
- SQL file, including comments. DDL file for an example, sql such as create table can be used as metadata.
- Json files which contain the database metadata
All other kinds of the input will be no longer supported and users shall convert the inputs to the above two formats. As a result, an out-of-box tool is needed to complete that conversion and automatically submit SQLFlow task.
SQLFlow-Ingester is a tool that helps you to extract metadata from various databases and create SQLFlow job based on your database. The extracted metadata file can also be used by dlineage tool to generate data lineage.
SQLFlow-Ingester download address: https://github.com/sqlparser/sqlflow_public/releases
SQLFlow-Ingester has three different parts:
- sqlflow-exporter: getting metadata from database
- sqlflow-extractor: processing raw data files such as log files, various script files (from which SQL statements and metadata to be processed are extracted), CSV files containing SQL statements, etc.
- sqlflow-submitter: submitting sql and metadata to the sqlflow server, creating jobs, generating data lineage, and having the results in the UI.
SQLFlow-exporter is the main functionality of the SQLFlow-Ingester. It can be used to get the metadata from different databases.
Under Linux & Mac:
./exporter.sh -host 127.0.0.1 -port 1521 -db orcl -user scott -pwd tiger -save /tmp/sqlflow-ingester -dbVendor dbvoracle
exporter.bat -host 127.0.0.1 -port 1521 -db orcl -user scott -pwd tiger -save c:\tmp\sqlflow-ingester -dbVendor dbvoracle
Or you can directly execute the .jar package, the jar packages are under the
java -jar sqlflow-exporter-1.0.jar -host 106.54.xx.xx -port 1521 -db orcl -user username -pwd password -save d:/ -dbVendor dbvoracle
A metadata.json file will be generated after the success export of metadta from the database.
Example for the success output:
exporter metadata success: <-save>/metadata.json
-dbVendor: Database type. Use colon to split dbVendor and version if specific version is required. (:, such as dbvmysql:5.7)
-host: Database host name (ip address or domain name)
-port: Port number
-db: Database name
-user: User name
-pwd: User password
-save: Destination folder path where we put the exported metadata json file. The exported file will be in name as metadata.json.
-extractedDbsSchemas: Export metadata under the specific schema. Use comma to split if multiple schema required (such as ,). We can use this flag to improve the export performance.
-excludedDbsSchemas: Exclude metadata under the specific schema during the export. Use comma to split if multiple schema required (such as ,). We can use this flag to improve the export performance.
-extractedViews: Export metadata under the specific view. Use comma to split if multiple views required (such as ,). We can use this flag to improve the export performance.
-merge: Merge the metadata results which are exported in different process. Use comma to split files to merge.
Improve Export Performance
Time consumed during the export from database could be very long if the data volume is huge. Following actions are made to improve the Ingester export performance:
fetchsizeis set to 1000 for table and view. For sql of view and process the
fetchsizeis set to 500. (check this oracle doc for what is jdbc
extractedViewsare improved for oracle and postgresql. A plan has already been made to improve the implementation of these parameters for other databases.
-merge can be used to merge the metadata results. Use comma to split files to merge. All files under the folder will be merged if the given value is the folder path.
exporter.bat -dbVendor dbvpostgresql -extractedDbsSchemas kingland.dbt%,kingland.pub% -host 184.108.40.206 -port 5432 -db kingland -user bigking -pwd Cat_*** -save D:/out/1.json exporter.bat -dbVendor dbvpostgresql -extractedDbsSchemas kingland.sqlflow -host 220.127.116.11 -port 5432 -db kingland -user bigking -pwd Cat_*** -save D:/out/2.json exporter.bat -merge D:/out/1.json,D:/out/2.json -save D:/out/merge.json or exporter.bat -merge D:/out/ -save D:/out/merge.json
After decompressing the package, you will find
submitter.bat for Windows and
submitter.sh for Linux & Mac.
bash submitter.sh -f <path_to_config_file> note: path_to_config_file: the full path to the config file eg: bash submitter.sh -f /home/workspace/gudu_ingester/submitter_config.json
submitter.bat -f D:/mssql-winuser-config.json
After successfully executing the Submitter, check the Job list on UI and you will find the job submitted by the Submitter, with the Job name configured in the configuration file.
If you are on Linux or Mac, you can schedule the submitter with
crontab to create a cron job.
In the editor opened by the above command, let’s say now we want to schedule a daily cron job, add the following code:
0 0 * * * bash submitter.sh -f <path_to_config_file> <lib_path> note: path_to_config_file: config file path lib_path: lib directory absolute path
The logs of the submitter are persisted under