Data profiling tools for Apache Orc
Data Profiling tools allow analyzing, monitoring, and reviewing data from existing databases in order to provide critical insights. Data profiling can help organizations improve data quality and decision-making process by identifying problems and addressing them before they arise.
Dataedo
Dataedo is a metadata management & data catalog tool with a data profiling feature. It allows you to use sample data to learn what data is stored in your data assets. You can browse min, max, average and median values, see top values, as well as value and row distribution to understand the data better before using it.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: | - |
Talend Data Fabric
Talend Data Fabric combines data integration, integrity, and governance in a single, unified platform. Talend Data Fabric's capabilities allow you to extract, process, and profile data from virtually any source to your data warehouse. Data profiling lets you quickly identify data quality issues, discover hidden patterns, and spot anomalies through summary statistics and graphical representations.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Mac OS,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
The use of data profiling tools can lead to higher-quality, more reliable data or eliminating errors that add costs to data-driven projects. Eliminating these costly errors involve processes such as:
• Collecting descriptive statistics.
• Collecting data types, length and recurring patterns.
• Tagging data with keywords, descriptions or categories.
• Performing data quality assessment.
• Discovering metadata and assessing its accuracy.
The most efficient way of handling the data profiling process is to automate it with a data management solution. We prepared a list of open-source data profiling tools that help you carry out the analysis of your data and identify the issues.