Integrate SQLFlow into Datahub

Datahub is an open-source metadata platform for the modern data stack. We have integrated the SQLFlow into Datahub so that the SQLFlow data lineage is enabled in the Datahub UI. This integration is for Datahub v0.10.4. You will need to replace the datahub-web-react-datahub-web-react-assets.jar in datahub with the SQLFlow adapted jar file. The file to replace may change with different Datahub versions and please check this blog regularly to get the corresponding file name for the latest Datahub.

Contact to get the corresponding SQLFlow adapted jar file.

1.Install SQLFlow

SQLFlow Regular version is required for the Datahub. You can either directly install the regular version of SQLFlow on your server or you can launch one container with the docker image.

a. Install Directly

Check our product documentation for the direct Install

b. Using Docker Image

Pull the image:

docker pull gudusqlflow/sqlflow-regular-trial:5.7.5

Launch the image with

docker run -itd -p 7090:80 --name mysqlflow gudusqlflow/sqlflow-regular-trial:5.7.4

The 7090 in the above command will be the port to visit SQLFlow UI. You can change the port if 7090 is occupied in your machine.

The mysqlflow is the name of the container. For more information of the container creation, you can check the official Docker doc.

2. Update the API URL

The first thing to do is to update the GuduSqlFlowUrl in the config.js of datahub-web-react-datahub-web-react-assets.jar:

Open the datahub-web-react-datahub-web-react-assets.jar (please do not extract the jar file and re-compress it, it may cause encoding issues)

Find the public/sqlflow/config.js and copy the file.

Create a new config.js, update the GuduSqlFlowUrl to your actual SQLFlow API URL.

Replace the old public/config.js in datahub-web-react-datahub-web-react-assets.jar with your new config.js.

Hint: Do not use localhost as the GuduSqlFlowUrl because the localhost IS NOT the server address when browser loads web

3. Replace the jar file

a. Upload the updated datahub-web-react-datahub-web-react-assets.jar to your server.

b. Stop datahub-frontend-react container

Find the container ID:

docker ps -a

Stop the container:

docker stop <container ID>

c. Backup the old datahub-web-react-datahub-web-react-assets.jar

docker cp datahub-frontend-react:/datahub-frontend/lib/datahub-web-react-datahub-web-react-assets.jar <Backup Address>

d. Replace the jar file in the datahub-frontend-react

docker cp <Your_Updated_datahub-web-react-datahub-web-react-assets.jar> datahub-frontend-react:/datahub-frontend/lib/

Start the datahub-frontend-react:

docker start <container ID>

4. Check your Datahub

Open your DataHub UI again and you will find SQLFlow features are enabled for the table level and field level data.

a. You can get the upstream and the downstream lineage in the SQLFlow tab of table level view

b. Click the Schema tab, you will have a column of Gudu SQLFlow which has been added to the field information list. Click the Lineage corresponding to a field to view the data lineage of the field.

Checking this document to get more details on how to create a Datahub task with the SQLFlow plugin.

Newsletter Updates

Enter your email address below to subscribe to our newsletter