Data Governance tools for Apache Hive

List of data governance tools

Data governance is a strategy of handling data within an organization. It is a set of rules, policies, standards, practices etc. which main purpose is to ensure data has a high quality and integrity, is safely stored and there are no ambiguities in meaning of common terms. Applying this strategy is a long process, engaging whole organization, especially IT and data consuming departments. There are certain data governance tools which helps applying these theoretical plans in real life.

Dataedo

Dataedo is a perfect solution for supporting data governance initiatives within organizations. With its advanced features, Dataedo allows you to create comprehensive data catalogs, analyze data lineage, address metadata management, and regulatory compliance. Dataedo enables your organization to manage data assets efficiently and maintain trustworthiness across all data processes.

Access control: Yes
Business Glossary: Yes
Change history: Yes
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: Yes
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: No
Web Access: Yes
Dataedo Report Catalog
Dataedo ERD
Dataedo Data Lineage
Dataedo Data Profiling
Dataedo Business Glossary
Dataedo Data Community
Dataedo Data Catalog - list of data sources
Dataedo Referenece Data Management
Dataedo Data Search
Dataedo Data Catalog
Dataedo Sensitive Data Discovery

Collibra Data Governance

Collibra Data Governance helps organizations understand their ever-growing amounts of data in a way that scales with growth and change so that teams can trust and use their data to improve their business. Through automating many governance and stewardship tasks, Collibra Governance enables businesses to establish a true governance foundation and trust their data while they grow.

Access control: Yes
Business Glossary: Yes
Change history: No
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: Yes
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: Yes
Web Access: Yes

Alation Data Catalog

Alation’s Active data governance is a people-first approach that focuses on business needs, delivering trusted data, and driving adoption by embedding governance in the users’ day-to-day activities — effectively facilitating compliance and managing risks. By inventorying, classifying, and curating data and knowledge, Alation provides unparalleled visibility into enterprise data assets. It combines machine learning and crowdsourcing to automate and accelerate data stewardship, data classification, business glossary, and data quality documentation.

Access control: Yes
Business Glossary: Yes
Change history: No
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: No
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: Yes
Web Access: Yes

IBM InfoSphere Information Governance Catalog

IBM InfoSphere Information Governance Catalog provides the entry point for an organization to understand and govern its information. It is a web-based tool that allows you to discover, manage and analyze assets in your enterprise. Together with other components of IBM InfoSphere Information Server, it delivers unified governance by enabling you to explore, cleanse, analyze, and govern your data.

Access control: Yes
Business Glossary: Yes
Change history: Yes
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: Yes
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: Yes
Web Access: Yes

Cloudera Navigator

Cloudera Navigator is the only integrated data management and governance solution for big data and Apache Hadoop, offering critical capabilities such as data discovery, continuous optimization, audit, lineage, metadata management, and policy enforcement. As part of Cloudera Enterprise, Cloudera Navigator is critical to enabling high-performance agile analytics, supporting continuous data architecture optimization, and meeting regulatory compliance requirements.

Access control: Yes
Business Glossary: No
Change history: No
Commenting/Community: Yes
Data Catalog: No
Data Classification: Yes
Data Lineage: Yes
Data Mapping: No
Data Profiling: No
Data Quality Management: No
Support for workflow: No
Web Access: Yes

OneTrust

OneTrust DataGovernance is built on a unified platform that helps organizations transform existing compliance initiatives into data intelligence. Create business value with a central solution to establish a trusted data foundation and empower insights-led decision-making in your business. OneTrust facilitates data governance with the amalgam of its different products, so organizations can pick the products and ensure a comprehensive data governance.

Access control: Yes
Business Glossary: Yes
Change history: No
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: Yes
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: Yes
Web Access: Yes

Io-Tahoe

Io-Tahoe is an enterprise smart data discovery and AI-driven data catalog product that enables enterprises to accelerate to next-generation data management practices, radically improving data governance and regulatory compliance. It ensures that business and technology understand critical data elements and have control over the enterprise data landscape.

Access control: Yes
Business Glossary: Yes
Change history: Yes
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: Yes
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: Yes
Web Access: Yes

Talend Data Fabric

Talend Data Fabric combines data integration, integrity, and governance in a single, unified platform. It simplifies data quality and security with built-in functionality for making sure your insights are trusted, governed, and actionable.

Access control: No
Business Glossary: Yes
Change history: No
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: Yes
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: Yes
Web Access: Yes

Truedat

Truedat is an open-source data governance business solution tool that lets you organize & enrich information through configurable workflows, data quality controls & execution, business glossary, semantic mapping, data Catalog & profiling, lineage & impact analysis, and similar other functionalities.

Access control: No
Business Glossary: Yes
Change history: No
Commenting/Community: Yes
Data Catalog: Yes
Data Classification: Yes
Data Lineage: Yes
Data Mapping: Yes
Data Profiling: Yes
Data Quality Management: Yes
Support for workflow: Yes
Web Access: No

Data governance software is a broad category of solutions which main purpose is to:

• Develop and maintain common data language,
• Ensure compliance with law regulations,
• Track data lineage,
• Assign roles and responsibilities,
• Improve data quality.

As data governance is more of a framework which should be adjusted to each organization separately, there is no one tool that suits all companies. Some may require business glossary or data catalog, while others will benefit more from advanced data lineage. Although there are vendors that provide all-in-one data governance tools platform, in some cases it may be better to select more specialized software that will cover only part of the mentioned requirements. Some examples of such tools are:

• Data dictionaries,
• Data modelers,
• Business glossaries,
• Data policies managers,
• Sensitive data discovery tools,
• Law compliance software,
• Data quality tools.

Implementing data governance framework is not an easy process yet benefits coming from it make it good investment. Common data understanding is one of the greatest outcomes which improves clear communication between departments and leads to precise analytics. It is not uncommon for IT data teams to not understand what they are working on, what KPIs are or what some terms mean. With data governance software, business users can easily share their knowledge with engineering teams.

Better data quality and assigned responsibilities for each piece of information is another game-changer for many organizations – any data discrepancy can be quickly identified and reported to the right person. From the formal point of view, data governance tools will help complying with law regulations such as European GDPR or Indian PDPB.

In the end, an outstanding example of poor data governance. In 1998/99 NASA launched Mars Climate Orbiter worth $125 million. Sadly, when approaching Mars orbit, team on the Earth lost contact with the satellite. The reason was dull - one team used metric system, while another used imperial system of inches, feet and pounds what led to wrong calculations of satellite trajectory. Although most of organizations will not send a robot to Mars it is still very important to ensure that common practices and terminology are followed within an organization in what listed data governance software can help.