Data lineage tools for Apache Spark
Data lineage tools are software that allows to extract, view and analyze data lineage. Data lineage is the process of understanding and visualizing data flow from the source to different destinations. It allows to create a map of the data journey through the entire ecosystem.
Dataedo
Dataedo allows you to extract lineage automatically or design flows manually and visualize how data moves through the system with interactive diagrams. Dataedo supports object and column-level data lineage.
BI Tools lineage: | |
---|---|
Commercial: | Commercial |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
Cloudera Navigator
Cloudera Navigator is the complete data governance solution for Hadoop. Cloudera Navigator automatically collects audit logs from across the entire platform and maintains a full history, with a unified, searchable audit dashboard for simple, point-in-time visibility. With automatic collection and visualization of column-level lineage, users can also quickly identify the origin, usage, and impact of a dataset.
BI Tools lineage: | |
---|---|
Commercial: | Commercial |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
Kylo
Kylo is an open source enterprise-ready data lake management software platform. It lets you search and explore data and metadata, view lineage, and profile statistics. Visual process lineage and provenance provide confidence in the origin of data. Automatic data profiling provides capabilities for data scientists and assurance in data quality.
BI Tools lineage: | |
---|---|
Commercial: | Free |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
IBM Watson Knowledge Catalog
IBM Watson Knowledge Catalog is an open and intelligent data catalog for managing enterprise data that also lets you ensure well-structured and maintained data lineage. It lets you track where data originated and how it’s consumed, increasing trust when accessing data across many sources and destinations
BI Tools lineage: | |
---|---|
Commercial: | Commercial |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
Talend Data Catalog
Talend Data Catalog gives your organization a single, secure point of control for your data. Its data flow lineage feature allows you to narrow in on specific objects and shows you how these objects are related to each other, within a model, an external metadata repository, or a configuration. The data flow lineage is based upon connection definitions to data stores and physical transformation rules which transform and move the data.
BI Tools lineage: | |
---|---|
Commercial: | Commercial |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
Alteryx Connect
Alteryx Connect uses powerful search capabilities to find and reuse information contained in data files, databases, visualizations, dashboards, workflows, analytic apps, and more. It lets you automatically capture and visualize data lineage between assets, improving the overall quality and reliability of shared information between data, process, and people. You can get technical data lineage by loading metadata from source and target systems and interpreting Alteryx workflows.
BI Tools lineage: | |
---|---|
Commercial: | Commercial |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
OpenLineage
OpenLineage is an open platform for collection and analysis of data lineage. It tracks metadata about datasets, jobs, and runs, giving users the information required to identify the root cause of complex issues and understand the impact of changes. OpenLineage contains an open standard for lineage data collection, a metadata repository reference implementation (Marquez), libraries for common languages, and integrations with data pipeline tools.
BI Tools lineage: | |
---|---|
Commercial: | Free |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
Informatica Metadata Management
Informatica Metadata Manager is a web-based metadata management tool. You can view data lineage for objects in the Metadata Manager warehouse. Data lineage shows the origin of the data, describes the path, and shows how it arrives at the target. Use data lineage to analyze data flow and troubleshoot data transformation errors.
BI Tools lineage: | |
---|---|
Commercial: | Commercial |
Data migration tools lineage: | |
Data warehouses lineage: | |
ETLs: | |
Free edition: | |
Hadoop: | |
NoSQL: | |
Pipelines lineage: | |
RDBMS: |
Data lineage forms the foundation for accurate data analytics and management. The core features of data lineage focuses on:
• Identifying data quality issues.
• Performing root cause analysis.
• Enabling to understand which data sources are outdated or which datasets are relevant.
• Minimizing the risk of migration projects.
• Providing transparency over the life cycle of data.
Data lineage tools map the data flow and help you understand where the data originated, how it flows and transforms. To help you find the right tool for your company, we prepared a list that includes some of the best data lineage tools.