Data lineage tools for Apache Hive

Data lineage tools are software that allows to extract, view and analyze data lineage. Data lineage is the process of understanding and visualizing data flow from the source to different destinations. It allows to create a map of the data journey through the entire ecosystem.

Truedat

Truedat is an open source data governance business solution tool that lets you have an end to end vision of your data from a business and technical point of view. Truedat data lineage module allows the visualization of the information life cycle, as well as the interconnection between each system of the organization, which allows to have a complete traceability of the data, as well as impact analysis in the event of possible changes in data structures or processes.

BI Tools lineage: Yes
Commercial: Free
Data migration tools lineage: No
Data warehouses lineage: No
ETLs: No
Free edition: Yes
Hadoop: Yes
NoSQL: No
Pipelines lineage: No
RDBMS: Yes
Truedat data lineage

ER/Studio

ER/Studio is an enterprise data modeling, architecture, and governance tool. It comes with a data lineage tab which is used to primarily document ETL processes from scratch. With visual data lineage support, you can visually document source/target mapping and sourcing rules for data movement across systems.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: No
RDBMS: Yes

Talend Data Catalog

Talend Data Catalog gives your organization a single, secure point of control for your data. Its data flow lineage feature allows you to narrow in on specific objects and shows you how these objects are related to each other, within a model, an external metadata repository, or a configuration. The data flow lineage is based upon connection definitions to data stores and physical transformation rules which transform and move the data.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: No
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes

Dremio

Dremio is a SQL Lakehouse Platform built from the ground up to deliver high-performing BI dashboards and interactive analytics directly on the data lake. It offers effective data lineage support, as the relationships between your data sources, virtual datasets, and all your queries are maintained in Dremio’s data graph, telling you exactly where each dataset came from.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: No
ETLs: No
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes

Secoda

Secoda is a data discovery tool that offers an intuitive, collaborative, and easy to implement data discovery built. It automatically extracts queries to generate data lineage. Currently, it is supporting table lineage for Snowflake, dbt, Redshift, and BigQuery, with support for Postgres, MySQL, and Microsoft SQL Server coming soon. Secoda data lineage can help data teams identify the downstream and upstream dependencies of a table easily. On each dependency, you will be able to see how many levels away a particular table is, with the ability to view the data in a visual form coming soon.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: No
Free edition: No
Hadoop: No
NoSQL: No
Pipelines lineage: No
RDBMS: Yes
Secoda data lineage tool

Cloudera Navigator

Cloudera Navigator is the complete data governance solution for Hadoop. Cloudera Navigator automatically collects audit logs from across the entire platform and maintains a full history, with a unified, searchable audit dashboard for simple, point-in-time visibility. With automatic collection and visualization of column-level lineage, users can also quickly identify the origin, usage, and impact of a dataset.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: No
Free edition: No
Hadoop: Yes
NoSQL: No
Pipelines lineage: Yes
RDBMS: No

erwin Data Catalog

erwin Data Catalog automates enterprise metadata management, data mapping, code generation, and data lineage for faster time to value and greater accuracy for data movement and deployment projects. It lets you generate on-demand lineage down to the column level and visualize data flows from source systems all the way to the reporting layers, including all transformations. Fully configurable and navigable lineage diagrams provide high-level business views as well as detailed technical depictions.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: Yes
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: No
NoSQL: Yes
Pipelines lineage: No
RDBMS: Yes

IBM InfoSphere Information Governance Catalog

IBM InfoSphere Information Governance Catalog is a web-based tool that allows you to explore, understand, and analyze information. It lets you run data lineage to create trusted information that supports data governance and compliance efforts. You can perform lineage analysis to understand where data comes from or goes to by using shared table information, job design information, or operational metadata from job runs.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: No
Pipelines lineage: No
RDBMS: Yes

Kylo

Kylo is an open source enterprise-ready data lake management software platform. It lets you search and explore data and metadata, view lineage, and profile statistics. Visual process lineage and provenance provide confidence in the origin of data. Automatic data profiling provides capabilities for data scientists and assurance in data quality.

BI Tools lineage: No
Commercial: Free
Data migration tools lineage: No
Data warehouses lineage: No
ETLs: No
Free edition: Yes
Hadoop: Yes
NoSQL: No
Pipelines lineage: No
RDBMS: Yes

Alation Data Catalog

Alation is a powerful data lineage tool that helps organizations visualize and understand the flow of data across various systems and processes. It automatically captures metadata, tracks how data moves and transforms from source to destination, and offers intuitive visualizations that make it easier for users to trace data origins, transformations, and usage.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: Yes
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes
Alation-Business-lineage

Data lineage forms the foundation for accurate data analytics and management. The core features of data lineage focuses on:

• Identifying data quality issues.
• Performing root cause analysis.
• Enabling to understand which data sources are outdated or which datasets are relevant.
• Minimizing the risk of migration projects.
• Providing transparency over the life cycle of data.

Data lineage tools map the data flow and help you understand where the data originated, how it flows and transforms. To help you find the right tool for your company, we prepared a list that includes some of the best data lineage tools.