Data profiling tools for MongoDB
Data Profiling tools allow analyzing, monitoring, and reviewing data from existing databases in order to provide critical insights. Data profiling can help organizations improve data quality and decision-making process by identifying problems and addressing them before they arise.
Dataedo
Dataedo is a metadata management & data catalog tool with a data profiling feature. It allows you to use sample data to learn what data is stored in your data assets. You can browse min, max, average and median values, see top values, as well as value and row distribution to understand the data better before using it.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: | - |
Global IDs Data Profiling Suite
Global IDs Data Profiling Suite is a data discovery and profiling tool that automates the discovery of data assets, automates data profiling, and provides an active inventory of all data assets.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Linux |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
IBM InfoSphere Information Analyzer
IBM InfoSphere Information Analyzer provides data profiling and analysis to accurately evaluate the content and structure of your data for consistency and quality. It utilizes a reusable rules library and supports multi-level evaluations by rule record and pattern. It also facilitates the management of exceptions to established rules to help identify data inconsistencies, redundancies and anomalies, and make inferences about the best choices for structure.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Linux,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Stdev |
Tagging data: |
Data Ladder
Data Ladder’s DataMatch Enterprise offers one of the easiest to use data profiling tools in the market. It quickly provides enough metadata to construct a cogent profile analysis of data quality and quantifies the scope and depth of necessary add-ons to make the project successful. Once it does the profiling, it proceeds to perform data matching, cleansing, deduplication and standardization, finally achieving data validation.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min |
Tagging data: |
Aperture Data Studio
Aperture Data Studio is a powerful and easy-to-use data management suite that helps you quickly and easily profile data to understand deficiencies as an essential first step to cleansing, joining, and validating data. It profiles the complete data set and audits every step in readiness for statutory reporting and enhanced transparency of data and processes, de-risking compliance initiatives.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Stdev |
Tagging data: |
DataRobot Data Prep
DataRobot Data Prep enables both novice and expert users to quickly and interactively explore, profile, clean, enrich and shape diverse data into AI assets ready for machine learning model development and deployment. It offers a visually interactive user interface that presents data in familiar tabular or spreadsheet style with no coding required. DataRobot provides profiles for every record and feature, including how many values are unique or missing and the statistical mean, standard deviation, median, minimum value, and maximum value.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: |
Talend Data Fabric
Talend Data Fabric combines data integration, integrity, and governance in a single, unified platform. Talend Data Fabric's capabilities allow you to extract, process, and profile data from virtually any source to your data warehouse. Data profiling lets you quickly identify data quality issues, discover hidden patterns, and spot anomalies through summary statistics and graphical representations.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Mac OS,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
Toad Data Point
Toad Data Point is a multi-platform database query, data prep, and reporting tool. It lets you visually profile and sample database tables and data sets for patterns, unique values, duplicates, missing information, min./max. values and more.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: |
DataCleaner
The heart of DataCleaner is a strong data profiling engine for discovering and analyzing the quality of your data. Find the patterns, missing values, character sets and other characteristics of your data values.
Access control: | |
---|---|
Commercial: | Free |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Linux,Mac OS,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
JProfiler
JProfiler is a simple and powerful database profiling tool for JDBC, JPA, and NoSQL. JProfiler's JDBC and JPA/Hibernate probes as well as the NoSQL probes for MongoDB, Cassandra, and HBase show the reasons for slow database access and how slow statements are called by your code. From the JDBC timeline view that shows you all JDBC connections with their activities, through the hot spots view that shows you slow statements to various telemetry views and a list of single events, the database probes are an essential tool for getting insight into your database layer.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Linux,Mac OS,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: |
CloverDX
CloverDX Data Profiler is a CloverDX module that lets you perform various analyses of your data. It is a part of CloverDX Designer and helps to do various profiling tasks, such as finding the maximum value, median, the most unique value, and many others.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Linux,Mac OS,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: |
OpenDQ
OpenDQ integrates data profiling, standardization, enhancement, fuzzy matching, and de-duplication components with enterprise-class data extraction, transformation, and loading software, to create a comprehensive and complete view of enterprise data. It lets you identify your data’s current state, resolve missing values/erroneous values, discover formats and patterns, reveal hidden business rules, report on column minimums, averages, and maximums, measure business rule compliance across data sets, and provide point in time data profiling history.
Access control: | |
---|---|
Commercial: | Free |
Desktop/Cloud: | Desktop |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | Linux,Mac OS,Windows |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min |
Tagging data: |
Trifacta
Trifacta is an open and interactive cloud platform for data engineers and analysts to collaboratively profile, prepare, and pipeline data for analytics and machine learning. For ease of data profiling, Trifacta automatically identifies dataset formats, schemas, specific attributes, and relationships across attributes and datasets, along with associated metadata for each dataset.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | Avg,Max,Min,Stdev |
Tagging data: |
DQLabs
DQLabs platform has a data profiling platform that is AI-driven and accepts data from multiple sources in different formats if necessary. The user interface is user-friendly and will allow the user to track the data profiling process and make adjustments where they feel it’s necessary. The platform algorithms will detect deep insight into the source data and increase the quality of the profiled data.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
Alation Data Catalog
Alation’s data profiling capabilities help reduce the time spent in the data exploration phase. With Alation’s data profile, data consumers have the metrics they need to easily discern the quality of any data object. Alation displays important characteristics, statistics, and numerical graphs about the data — enabling data scientists and data engineers to quickly take action. The data profiling now also includes new charts and customizations.
Access control: | |
---|---|
Commercial: | Commercial |
Desktop/Cloud: | Cloud |
Excel workbooks: | |
Flat files: | |
Free edition: | |
Metadata identification: | |
NoSQL sources: | |
Runs on: (for desktop): | - |
Sensitive data discovery: | |
SQL sources: | |
Statistics of data: | - |
Tagging data: |
The use of data profiling tools can lead to higher-quality, more reliable data or eliminating errors that add costs to data-driven projects. Eliminating these costly errors involve processes such as:
• Collecting descriptive statistics.
• Collecting data types, length and recurring patterns.
• Tagging data with keywords, descriptions or categories.
• Performing data quality assessment.
• Discovering metadata and assessing its accuracy.
The most efficient way of handling the data profiling process is to automate it with a data management solution. We prepared a list of open-source data profiling tools that help you carry out the analysis of your data and identify the issues.