You are here: Data Profiling and Mapping Suite - Feature List

Data Profiling and Mapping Suite : Feature List


The Global IDs Data Profiling and Mapping Suite provides extensive features and functionality associated with
  • Data Profiling
  • Data Classification
  • Data Mapping

Data Profiling : Features and Functionality

Data Profiling provides the ability to automatically scan all the data values associated with each attribute in the data landscape. Analysis of these data values can create a large number of metrics, which can collectively lead to a "semantic understanding" of each data attribute. The Data Profiling and Mapping Suite offers the following list of features to aid the Data Profiling Process.

# Requirement Available Comments
1 Statistical Profiling
Yes
2 Pattern Profiling
Yes
3 Domain Profiling
Yes
4 Relationship Profiling
Yes
5 Sub Table Profiling
Yes
6 ID Profiling
Yes
7 File Profiling (Flat Files, Delimited Files, Microsoft Excel Files) Yes

 

Data Classification : Features and Functionality

Data Classification provides the ability to automatically identify information domains and group them together on the basis of similiarity.

For example: If an organization had to search for Credit Card Numbers across it's data landscape, it could pursue the following manual approaches.

  1. Create a complete metadata repositoriy, and search for terms like "CREDIT" or "CCN" or "CREDIT_CARD_NUM". These types of searches would not be thorough, since the data modeler who designed the database may have called the field "NUMBER" or "NUM" to prevent Credit Card Numbers from being detected.
  2. Manually look at each attribute in each entity across the data landscape, to see where Credit Card Numbers are present. This approach is not feasible for large data landscapes.

The Data Classification software automates the solution to this problem by systematically going through the data associated with EACH attribute in the data landscape and detecting patterns within the data. Since Credit Card Numbers have distinct patterns, any attribute containing this pattern can be found, regardless of the column name. The approach is comprehensive since the software can scan millions of attributes to detect data domains of interest.

The Data Classification Modules (called Taxonomy Managers) can identify data domains in the following areas.

Global Domains
Industry Domains
Business Domains
Names of People
CounterParty IDs (Financial Services) Customers IDs
Locations / Addresses Securities IDs (Financial Services)
Product IDs
Names of Organizations
ICD9-CM Codes (Healthcare)
Location IDs
IDs for People (National Identifiers)
Provider IDs (Healthcare)
Employee IDs
IDs for Organizations (e.g. DUNS)
SIC / NAICS Codes (Cross Industry) Legal Entities
IDs for Products (EAN//UPC, GS1....)
ISBN Codes (Publishing)
Business Unit Names
IDs for Locations (e.g. Zip+4)

Account IDs
IDs for Books (e.g. ISBNs)

Subtypes
IDs for Countries (e.g. ISO_3A)

Codes
Currencies

Hierarchy Levels
Credit Card Numbers

Usernames / Passwords
Units of Measures

Unencrypted fields (Identifiers)
.....and any custom domain*

ERP / CRM Identifiiers

* Must have a distinct pattern that is discernable by humans. Some exceptions apply.

As a result of the classification process, the software is able to identify the distribution of any data domain within the data landscape. Consequently, the software can apply both data quality rules and business rules in comprehensive manner across the entire systems environment.

The following list of requirements are supported by the the Data Classification Process.

# Requirement Available Comments
1 Classify all "important" data assets on the basis of similarity
Yes
2 Group attributes into semantically-equivalent domains
Yes
3 Create maps, showing distribution of semantic domains Yes
4 Auto-detect quality rules that apply to each semantic domain
Yes
5 Find outliers within semantic domains
Yes
6 Populate the metadata repository with taxonomies
Yes
7 Provide Data Stewards with web-access to domains
Yes
8 Support domain searches across entire data landscape
Yes e.g. Search of Credit Card Numbers
9 Group Semantic Domains to Semantic Objects
Yes Requires some degree of manual input
10 Continuously monitor the data domains for changes Yes
11 Create classification reports
Yes