Data lineage tools for Cloudera Impala

Data lineage tools are software that allows to extract, view and analyze data lineage. Data lineage is the process of understanding and visualizing data flow from the source to different destinations. It allows to create a map of the data journey through the entire ecosystem.

MetaCenter

MetaCenter automates data lineage analysis across Databases, ETL, Business Intelligence, Cloud, and Hadoop environments. It lets you reduce data management costs by automating data lineage and impact analysis documentation.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: No
RDBMS: Yes

Keboola

Keboola is a cloud-based data integration platform that helps clients combine, enhance, and publish crucial information for their internal analytics projects and data products in a quick and easy fashion. It collects all kinds of operational metadata, describing user activity, job activity, data flow, schema evolution, data pipeline performance, compliance with a client’s security rules, etc. Based on the metadata, we are able to build data lineage on the fly and automatically. This makes it possible to understand where the data is coming from and how it is used, both for analytical and regulatory purposes.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: Yes
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes

Informatica Metadata Management

Informatica Metadata Manager is a web-based metadata management tool. You can view data lineage for objects in the Metadata Manager warehouse. Data lineage shows the origin of the data, describes the path, and shows how it arrives at the target. Use data lineage to analyze data flow and troubleshoot data transformation errors.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: Yes
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: No
RDBMS: Yes

IBM Watson Knowledge Catalog

IBM Watson Knowledge Catalog is an open and intelligent data catalog for managing enterprise data that also lets you ensure well-structured and maintained data lineage. It lets you track where data originated and how it’s consumed, increasing trust when accessing data across many sources and destinations

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes

Alation Data Catalog

Alation is a powerful data lineage tool that helps organizations visualize and understand the flow of data across various systems and processes. It automatically captures metadata, tracks how data moves and transforms from source to destination, and offers intuitive visualizations that make it easier for users to trace data origins, transformations, and usage.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: Yes
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes
Alation-Business-lineage

SQLFlow

SQLFlow is an online SQL data lineage tool that visually represents the overall flow of data. It offers automated SQL data lineage analysis across Databases, ETL, Business Intelligence, Cloud, and Hadoop environments by parsing SQL Script and stored procedure. It enables impact analysis at a granular level, drilling down into table, column, and query-level lineage.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: No
Pipelines lineage: No
RDBMS: Yes
Gudu SQLFlow data lineage

OvalEdge

OvalEdge offers a comprehensive lineage solution to show a complete the complete data cycle. OvalEdge algorithms parse various kinds of source code to build the lineage automatically and then it is enhanced by experts with proper descriptions.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes

Alteryx Connect

Alteryx Connect uses powerful search capabilities to find and reuse information contained in data files, databases, visualizations, dashboards, workflows, analytic apps, and more. It lets you automatically capture and visualize data lineage between assets, improving the overall quality and reliability of shared information between data, process, and people. You can get technical data lineage by loading metadata from source and target systems and interpreting Alteryx workflows.

BI Tools lineage: Yes
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: No
RDBMS: Yes

Talend Data Catalog

Talend Data Catalog gives your organization a single, secure point of control for your data. Its data flow lineage feature allows you to narrow in on specific objects and shows you how these objects are related to each other, within a model, an external metadata repository, or a configuration. The data flow lineage is based upon connection definitions to data stores and physical transformation rules which transform and move the data.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: No
ETLs: Yes
Free edition: No
Hadoop: Yes
NoSQL: Yes
Pipelines lineage: Yes
RDBMS: Yes

erwin Data Catalog

erwin Data Catalog automates enterprise metadata management, data mapping, code generation, and data lineage for faster time to value and greater accuracy for data movement and deployment projects. It lets you generate on-demand lineage down to the column level and visualize data flows from source systems all the way to the reporting layers, including all transformations. Fully configurable and navigable lineage diagrams provide high-level business views as well as detailed technical depictions.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: Yes
Data warehouses lineage: Yes
ETLs: Yes
Free edition: No
Hadoop: No
NoSQL: Yes
Pipelines lineage: No
RDBMS: Yes

Cloudera Navigator

Cloudera Navigator is the complete data governance solution for Hadoop. Cloudera Navigator automatically collects audit logs from across the entire platform and maintains a full history, with a unified, searchable audit dashboard for simple, point-in-time visibility. With automatic collection and visualization of column-level lineage, users can also quickly identify the origin, usage, and impact of a dataset.

BI Tools lineage: No
Commercial: Commercial
Data migration tools lineage: No
Data warehouses lineage: Yes
ETLs: No
Free edition: No
Hadoop: Yes
NoSQL: No
Pipelines lineage: Yes
RDBMS: No

Data lineage forms the foundation for accurate data analytics and management. The core features of data lineage focuses on:

• Identifying data quality issues.
• Performing root cause analysis.
• Enabling to understand which data sources are outdated or which datasets are relevant.
• Minimizing the risk of migration projects.
• Providing transparency over the life cycle of data.

Data lineage tools map the data flow and help you understand where the data originated, how it flows and transforms. To help you find the right tool for your company, we prepared a list that includes some of the best data lineage tools.