Data profiling tools
Data Profiling tools allow analyzing, monitoring, and reviewing data from existing databases in order to provide critical insights. Data profiling can help organizations improve data quality and decision-making process by identifying problems and addressing them before they arise.
iWay Data Quality Suite
iWay Data Profiler (iDP) is a tool for business users that is designed as a Web 2.0 application leveraging the Profiler functionality in iWay Data Quality Server (iWay DQS) and the strength of WebFOCUS business intelligence capabilities. It allows business users to view, analyze, distribute, and monitor how much of their data follows the rules defined by their company. iDP also enables steady improvement in data quality towards following defined business rules and standards.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Max,Min |
Tagging data: |
OpenDQ
OpenDQ integrates data profiling, standardization, enhancement, fuzzy matching, and de-duplication components with enterprise-class data extraction, transformation, and loading software, to create a comprehensive and complete view of enterprise data. It lets you identify your data’s current state, resolve missing values/erroneous values, discover formats and patterns, reveal hidden business rules, report on column minimums, averages, and maximums, measure business rule compliance across data sets, and provide point in time data profiling history.
Access control: | |
---|---|
Commercial: | Free |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Linux,Mac OS,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min |
Tagging data: |
MIOvantage
MIOvantage is a single solution platform that lets you profile data, run rules, deduplicate data, identify entities, generate reports, and more. From entity resolution to complex deduplication, MIOvantage builds a better, clearer picture from your data.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
Trifacta
Trifacta is an open and interactive cloud platform for data engineers and analysts to collaboratively profile, prepare, and pipeline data for analytics and machine learning. For ease of data profiling, Trifacta automatically identifies dataset formats, schemas, specific attributes, and relationships across attributes and datasets, along with associated metadata for each dataset.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: |
DQLabs
DQLabs platform has a data profiling platform that is AI-driven and accepts data from multiple sources in different formats if necessary. The user interface is user-friendly and will allow the user to track the data profiling process and make adjustments where they feel it’s necessary. The platform algorithms will detect deep insight into the source data and increase the quality of the profiled data.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
SAS Data Quality
SAS Data Quality gives you a single interface to manage the entire data quality life cycle: profiling, standardizing, matching, and monitoring. It lets you validate data against standard measures and customized business rules. Uncover relationships across tables, databases, and source applications. Verify that the data in your tables matches the appropriate description. Establish trends and commonalities in business information and examine numerical trends via mean, median, mode, and standard deviation.
It makes it easy to profile and identify problems, preview data, and set up repeatable processes to maintain a high level of data quality.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Stdev |
Tagging data: |
StarDQ
StarDQ is a powerful enterprise solution for profiling, cleansing, augmenting, and standardizing the data to significantly improve returns on corporate intelligence initiatives.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
Alation Data Catalog
Alation’s data profiling capabilities help reduce the time spent in the data exploration phase. With Alation’s data profile, data consumers have the metrics they need to easily discern the quality of any data object. Alation displays important characteristics, statistics, and numerical graphs about the data — enabling data scientists and data engineers to quickly take action. The data profiling now also includes new charts and customizations.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
The use of data profiling tools can lead to higher-quality, more reliable data or eliminating errors that add costs to data-driven projects. Eliminating these costly errors involve processes such as:
• Collecting descriptive statistics.
• Collecting data types, length and recurring patterns.
• Tagging data with keywords, descriptions or categories.
• Performing data quality assessment.
• Discovering metadata and assessing its accuracy.
The most efficient way of handling the data profiling process is to automate it with a data management solution. We prepared a list of open-source data profiling tools that help you carry out the analysis of your data and identify the issues.