SQL Lineage Tools for Apache Spark

Dataedo

Dataedo allows you to extract lineage automatically or design flows manually and visualize how data moves through the system with interactive diagrams.

Automatic discovery: Yes
Data flow visualization: Yes
Environment: On-premises
Free edition: No
Metadata management: Yes
Version control integration: Yes
Dataedo Data Lineage

Monte Carlo

Monte Carlo is a data observability platform that offers field-level data lineage functionality, making it faster and easier to conduct root cause and impact analysis for critical data issues. With field-level lineage fully automated, data engineers and analysts can confidently make changes to tables without losing trust and visibility in their data at each stage of its life cycle.

Automatic discovery: Yes
Data flow visualization: Yes
Environment: Online
Free edition: No
Metadata management: Yes
Version control integration: No

Informatica Data Lineage

Informatica Data Lineage tool provides automated end-to-end data lineage with detailed and summary views of data movement across data pipelines. With Informatica, you can derive lineage from code in SQL scripts, stored procedures and AI/ML code. It streamlines tracking data flow from system- to column-level for detailed impact analysis.

Automatic discovery: Yes
Data flow visualization: Yes
Environment: Online
Free edition: No
Metadata management: Yes
Version control integration: Yes

OpenLineage

OpenLineage is an open platform for collection and analysis of data lineage. It tracks metadata about datasets, jobs, and runs. Pipeline components - like schedulers, warehouses, analysis tools, and SQL engines - can use a standard API for capturing lineage events to send data about runs, jobs, and datasets to a compatible OpenLineage backend for further study.

Automatic discovery: Yes
Data flow visualization: No
Environment: On-premises
Free edition: Yes
Metadata management: Yes
Version control integration: No