Data catalogs

List of data catalogs tools

Data catalog is a structured collection of data used by an organization. It is a kind of data library where data is indexed, well-organized, and securely stored. Most data catalog tools contain information about the source, data usage, relationships between entities as well as data lineage. This provides a description of the origin of the data and tracks changes in the data to its final form.

Google Cloud Data Catalog

Google Cloud Data catalog is a fully managed and highly scalable data discovery and metadata management service that empowers organizations to quickly discover, manage, and understand all their data in Google Cloud. It offers a simple and easy-to-use search interface for data discovery, a flexible and powerful cataloging system for capturing both technical and business metadata.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: No
Commercial: Commercial
Data Classification: Yes
Data Lineage: No
Data Profiling: No
Export: -
Free edition: Yes
Rating of assets: No

Octopai

Octopai provides an efficient data catalog functionality that significantly reduces the manual overhead of this process. It automates the process of metadata discovery and data lineage, and quickly assemble the metadata needed to create an effective data catalog.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: No
Commercial: Commercial
Data Classification: No
Data Lineage: Yes
Data Profiling: No
Export: -
Free edition: No
Rating of assets: No

Azure Data Catalog

Azure Data Catalog is an enterprise-wide metadata catalog that makes data asset discovery straightforward. It’s a fully-managed cloud service that lets any user (analyst, data scientist, or developer) register, enrich, discover, understand, and consume data sources.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: Yes
Commercial: Commercial
Data Classification: No
Data Lineage: No
Data Profiling: Yes
Export: JSON
Free edition: Yes
Rating of assets: No

Qlik Data Catalyst

Qlik Data Catalyst is a metadata driven data catalog that has technical and business descriptions, data profiles, data lineage, and data tags that make data search and delivery simple. It builds a secure, enterprise-scale catalog of all the data your organization has available for analytics, no matter where it is.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: Yes
Commercial: Commercial
Data Classification: Yes
Data Lineage: Yes
Data Profiling: Yes
Export: CSV,JSON,XML
Free edition: No
Rating of assets: Yes

erwin Data Catalog

erwin Data Catalog (erwin DC) automates the processes involved in harvesting, integrating, activating and governing enterprise data according to business requirements. To summarize, erwin DC creates and maintains a sustainable metadata foundation for data preparation, management, governance and consumption, automating manual tasks to increase efficiencies, quality and time to value for data development and deployment.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: Yes
Commercial: Commercial
Data Classification: Yes
Data Lineage: Yes
Data Profiling: Yes
Export: MS Excel,PDF
Free edition: No
Rating of assets: No

Tree Schema

The Tree Schema data catalog provides all of the essential catalog capabilities including rich-text documentation, data lineage, assigning data stewards and technical owners to your data assets, tagging your assets and much more. You can point Tree Schema to your database and fully populate your catalog in under 5 minutes. Tree Schema also supports non-traditional data sources including S3, Kafka and DynamoDB.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: Yes
Commercial: Commercial
Data Classification: Yes
Data Lineage: Yes
Data Profiling: Yes
Export: Plain text
Free edition: Yes
Rating of assets: No
Data schema overview
Comments
Data Asset Expert

Atlan

Atlan is a modern, cloud native data catalog. It's ease of use and intuitive interface enables diverse personas including engineers, data stewards and business users to discover, understand and trust data. Atlan leverages machine learning and a bots ecosystem to automate documentation and stewardship tasks such as automatic data profiling, data quality alerts and glossary tagging. It is built on an Open API architecture, and has a pay as you go pricing model, making it a good fit for teams of all sizes.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: Yes
Commercial: Commercial
Data Classification: Yes
Data Lineage: Yes
Data Profiling: Yes
Export: -
Free edition: Yes
Rating of assets: Yes

Sidecar

Sidecar data catalog is a modern, intuitive and comprehensive tool to discover, classify and enrich all your data assets. Automatically index all the metadata in a centralized repository and provide a clear view of the complex and fragmented data management environments to your data consumer. With the Business Glossary capabilties, you can easily enrich all your data assets with business context, clear definitions, rules and transformations. Foster team work among the user community by exchanging ideas on the data assets with the collaboration tool. Add custom tags and classifcations to organize your assets and manage your sensitive data to comply with GDPR, HIPAA, CPRA, etc... Data consumers will spend less time searching for the right data and focus on delivering faster insights.

Automated Cataloging: Yes
Business Glossary: Yes
Commenting/Community: Yes
Commercial: Commercial
Data Classification: Yes
Data Lineage: Yes
Data Profiling: No
Export: CSV,XML
Free edition: No
Rating of assets: Yes

Data catalogs are part of data management tools. They enable automatic metadata management with user-friendly form that makes data easy to understand even for non-IT members of the organisation.

The key feature of data catalogs is to provide metadata context to the user in a way that allows different teams within the organization (both IT and Non-IT) to discover and understand relevant data.

From the organization's perspective, the important functions of data catalog tools are also:
• storage of data resources from different repositories as well as from different engine systems - compatibility with multiple connectors,
• automation of data management processes,
• advanced resource search by name, type, date of change, owner, etc.
• data lineage,
• automated data Classification,
• Discovering data relationship and dependencies between objects,
• Business Glossary, unifying nomenclature and definitions of terms,
• Data Profiling,

Data stewards, business teams, and data analysts often struggle with the problem of what specific data means, where it comes from, and which elements it is directly related to. These are just a few problems for which Data catalog tools have been created. Based on the imported repositories, data catalogs enable automated cataloging and organizing of data, solving the problem of time-consuming querying of the resources.

To avoid misunderstandings data catalog tools provide a Business Glossary, through which the nomenclature is systematized. It contains business terms along with their definition, relationship to each other, as well as its location in the hierarchy of all data assets.

There are many apps for data catalog tasks on the market. We have listed complex data cataloging software that can also solve data profiling, data lineage, and data classification problems, as well as open-source data catalog tools.