Data profiling tools

Data Profiling tools allow analyzing, monitoring, and reviewing data from existing databases in order to provide critical insights. Data profiling can help organizations improve data quality and decision-making process by identifying problems and addressing them before they arise.

Talend Data Fabric

Talend Data Fabric combines data integration, integrity, and governance in a single, unified platform. Talend Data Fabric's capabilities allow you to extract, process, and profile data from virtually any source to your data warehouse. Data profiling lets you quickly identify data quality issues, discover hidden patterns, and spot anomalies through summary statistics and graphical representations.

Access control: No
Commercial: Commercial
Desktop/Cloud: Desktop
Excel workbooks: Yes
Flat files: Yes
Free edition: No
Metadata identification: Yes
NoSQL sources: Yes
Runs on: (for desktop): Mac OS,Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: -
Tagging data: Yes

Toad Data Point

Toad Data Point is a multi-platform database query, data prep, and reporting tool. It lets you visually profile and sample database tables and data sets for patterns, unique values, duplicates, missing information, min./max. values and more.

Access control: No
Commercial: Commercial
Desktop/Cloud: Desktop
Excel workbooks: Yes
Flat files: Yes
Free edition: No
Metadata identification: Yes
NoSQL sources: Yes
Runs on: (for desktop): Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: Avg,Max,Min,Stdev
Tagging data: Yes

Informatica Data Profiling

Informatica’s data profiling solution, Data Explorer, is available in two editions—Standard and Advanced—that employ powerful data profiling capabilities to scan every single data record, from any source, to find anomalies and hidden relationships. It works regardless of complexity or of the relationship between your data sources.

Access control: No
Commercial: Commercial
Desktop/Cloud: Cloud
Excel workbooks: Yes
Flat files: Yes
Free edition: No
Metadata identification: Yes
NoSQL sources: Yes
Runs on: (for desktop): -
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: Avg,Max,Min,Stdev
Tagging data: Yes

TIBCO Clarity

TIBCO Clarity is data preparation, profiling, and cleansing tool. It detects data patterns and data types for auto-metadata generation. You can profile row and column data for completeness, uniqueness, and variation. Predefined facets categorize data based on text occurrences and text patterns. You can use the numeric distributions to identify variations and outliers in the data.

Access control: No
Commercial: Commercial
Desktop/Cloud: Cloud
Excel workbooks: Yes
Flat files: Yes
Free edition: No
Metadata identification: Yes
NoSQL sources: No
Runs on: (for desktop): -
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: Avg,Max,Min,Stdev
Tagging data: No

dbForge Event Profiler for SQL Server

dbForge Event Profiler for SQL Server is a FREE tool that allows you to capture and analyze SQL Server events. The events and data columns are stored in a physical trace file for later examination. You can use this information to identify and troubleshoot many SQL Server-related problems to make it a smooth database sailing. Whether it's high server load or certain queries impacting database stability, Event Profiler is a handy solution for inspecting and analyzing its performance for SQL Server.

Access control: No
Commercial: Free
Desktop/Cloud: Desktop
Excel workbooks: No
Flat files: No
Free edition: Yes
Metadata identification: No
NoSQL sources: No
Runs on: (for desktop): Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: -
Tagging data: No

Idera SQL Data Profiler

SQL Data Profiler analyzes and summarizes data to produce valuable insights into data patterns. It lets you profile data in SQL server tables, analyze subsets of data types at a time, adjust profiling thresholds to customize the analysis, display summary of data in selected table and its columns, receive recommendations based on data per column, view summary of value distribution per column, do many other functions.

Access control: No
Commercial: Free
Desktop/Cloud: Desktop
Excel workbooks: No
Flat files: No
Free edition: Yes
Metadata identification: No
NoSQL sources: No
Runs on: (for desktop): Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: -
Tagging data: No

WinPure Clean & Match

The Data Profiling / Statistics module within WinPure Clean & Match is a user-friendly and powerful data profiling tool that can help your business to discover patterns and meaning in your data and to check the quality of your data by analyzing formats, types, completeness, and value counts. It presents you with a complete set of statistics that you can use to help clean and correct your data, and to prepare it better for data matching.

Access control: No
Commercial: Commercial
Desktop/Cloud: Desktop
Excel workbooks: Yes
Flat files: Yes
Free edition: Yes
Metadata identification: No
NoSQL sources: No
Runs on: (for desktop): Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: -
Tagging data: No

Trillium Discovery

Trillium Discovery provides industry-leading data profiling at scale, designed specifically to meet the challenges presented by today’s data environments, with native connectivity to cloud and big data sources to execute data profiling tasks. It lets you visually assess the quality of your data and support data governance with comprehensive profiling, customized to your business

Access control: No
Commercial: Commercial
Desktop/Cloud: Cloud
Excel workbooks: No
Flat files: Yes
Free edition: No
Metadata identification: Yes
NoSQL sources: Yes
Runs on: (for desktop): -
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: -
Tagging data: No

Synchronos MDM

Synchronos MDM provides business-friendly profiling and robust data discovery functionality that enable your organization to access and analyze 100% of your data set. It provides organizations with the ability to comprehensively assess data content, structure, relationships, and quality. You can discover complex relationships and dependencies between data records across disparate systems

Access control: No
Commercial: Commercial
Desktop/Cloud: Cloud
Excel workbooks: No
Flat files: Yes
Free edition: No
Metadata identification: Yes
NoSQL sources: No
Runs on: (for desktop): -
Sensitive data discovery: No
SQL sources: No
Statistics of data: -
Tagging data: No

Kylo

Kylo is an open source enterprise-ready data lake management software platform. It lets you search and explore data and metadata, view lineage, and profile statistics. In addition, it offers self-service data ingest with data cleansing, validation, and automatic profiling.

Access control: No
Commercial: Free
Desktop/Cloud: Cloud
Excel workbooks: No
Flat files: Yes
Free edition: Yes
Metadata identification: Yes
NoSQL sources: No
Runs on: (for desktop): -
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: Avg,Max,Min,Stdev
Tagging data: Yes

Astera Centerprise

Astera Centerprise is an end-to-end data integration software that enables you to integrate, cleanse, and transform data in a code-free environment. Its built-in data profiling feature lets you easily examine your source data and get detailed information about its structure, quality, and integrity. Custom data integration and quality rules can also be defined to validate incoming data and identify missing or invalid records.

Access control: Yes
Commercial: Commercial
Desktop/Cloud: Desktop
Excel workbooks: Yes
Flat files: Yes
Free edition: No
Metadata identification: No
NoSQL sources: No
Runs on: (for desktop): Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: Avg,Max,Min
Tagging data: No

Datameer

Datameer is a SaaS solution for data transformation in Snowflake. It provides a rich array of data profiling features to give your users a comprehensive view on their data, including automated visual data profiling, system-generated recommendations, and system- and user-generated data profile information, which includes documentation, properties, comments, tags, and more to provide further context and profile information on the data.

Access control: No
Commercial: Commercial
Desktop/Cloud: Cloud
Excel workbooks: No
Flat files: No
Free edition: No
Metadata identification: Yes
NoSQL sources: No
Runs on: (for desktop): -
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: -
Tagging data: Yes

DataCleaner

The heart of DataCleaner is a strong data profiling engine for discovering and analyzing the quality of your data. Find the patterns, missing values, character sets and other characteristics of your data values.

Access control: No
Commercial: Free
Desktop/Cloud: Desktop
Excel workbooks: Yes
Flat files: Yes
Free edition: Yes
Metadata identification: No
NoSQL sources: Yes
Runs on: (for desktop): Linux,Mac OS,Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: -
Tagging data: No

JProfiler

JProfiler is a simple and powerful database profiling tool for JDBC, JPA, and NoSQL. JProfiler's JDBC and JPA/Hibernate probes as well as the NoSQL probes for MongoDB, Cassandra, and HBase show the reasons for slow database access and how slow statements are called by your code. From the JDBC timeline view that shows you all JDBC connections with their activities, through the hot spots view that shows you slow statements to various telemetry views and a list of single events, the database probes are an essential tool for getting insight into your database layer.

Access control: No
Commercial: Commercial
Desktop/Cloud: Desktop
Excel workbooks: No
Flat files: No
Free edition: No
Metadata identification: No
NoSQL sources: Yes
Runs on: (for desktop): Linux,Mac OS,Windows
Sensitive data discovery: No
SQL sources: No
Statistics of data: Avg,Max,Min,Stdev
Tagging data: No

CloverDX

CloverDX Data Profiler is a CloverDX module that lets you perform various analyses of your data. It is a part of CloverDX Designer and helps to do various profiling tasks, such as finding the maximum value, median, the most unique value, and many others.

Access control: No
Commercial: Commercial
Desktop/Cloud: Desktop
Excel workbooks: Yes
Flat files: Yes
Free edition: No
Metadata identification: Yes
NoSQL sources: Yes
Runs on: (for desktop): Linux,Mac OS,Windows
Sensitive data discovery: No
SQL sources: Yes
Statistics of data: Avg,Max,Min,Stdev
Tagging data: No

The use of data profiling tools can lead to higher-quality, more reliable data or eliminating errors that add costs to data-driven projects. Eliminating these costly errors involve processes such as:

• Collecting descriptive statistics.
• Collecting data types, length and recurring patterns.
• Tagging data with keywords, descriptions or categories.
• Performing data quality assessment.
• Discovering metadata and assessing its accuracy.

The most efficient way of handling the data profiling process is to automate it with a data management solution. We prepared a list of open-source data profiling tools that help you carry out the analysis of your data and identify the issues.