- Data Profiling
- Data Classification
- Data Mapping
Data Profiling : Features and Functionality
Data Profiling provides the ability to automatically scan all the data values associated with each attribute in the data landscape. Analysis of these data values can create a large number of metrics, which can collectively lead to a "semantic understanding" of each data attribute. The Data Profiling and Mapping Suite offers the following list of features to aid the Data Profiling Process.
| # | Requirement | Available | Comments |
|---|---|---|---|
| 1 | Statistical Profiling |
Yes | |
| 2 | Pattern Profiling |
Yes | |
| 3 | Domain Profiling |
Yes | |
| 4 | Relationship Profiling |
Yes | |
| 5 | Sub Table Profiling |
Yes | |
| 6 | ID Profiling |
Yes | |
| 7 | File Profiling (Flat Files, Delimited Files, Microsoft Excel Files) | Yes |
Data Classification : Features and Functionality
Data Classification provides the ability to automatically identify information domains and group them together on the basis of similiarity.
For example: If an organization had to search for Credit Card Numbers across it's data landscape, it could pursue the following manual approaches.
- Create a complete metadata repositoriy, and search for terms like "CREDIT" or "CCN" or "CREDIT_CARD_NUM". These types of searches would not be thorough, since the data modeler who designed the database may have called the field "NUMBER" or "NUM" to prevent Credit Card Numbers from being detected.
- Manually look at each attribute in each entity across the data landscape, to see where Credit Card Numbers are present. This approach is not feasible for large data landscapes.
The Data Classification software automates the solution to this problem by systematically going through the data associated with EACH attribute in the data landscape and detecting patterns within the data. Since Credit Card Numbers have distinct patterns, any attribute containing this pattern can be found, regardless of the column name. The approach is comprehensive since the software can scan millions of attributes to detect data domains of interest.
The Data Classification Modules (called Taxonomy Managers) can identify data domains in the following areas.
| Global Domains | Industry Domains | Business Domains |
|---|---|---|
| Names of People |
CounterParty IDs (Financial Services) | Customers IDs |
| Locations / Addresses | Securities IDs (Financial Services) |
Product IDs |
| Names of Organizations |
ICD9-CM Codes (Healthcare) |
Location IDs |
| IDs for People (National Identifiers) |
Provider IDs (Healthcare) |
Employee IDs |
| IDs for Organizations (e.g. DUNS) |
SIC / NAICS Codes (Cross Industry) | Legal Entities |
| IDs for Products (EAN//UPC, GS1....) |
ISBN Codes (Publishing) |
Business Unit Names |
| IDs for Locations (e.g. Zip+4) |
Account IDs | |
| IDs for Books (e.g. ISBNs) |
Subtypes |
|
| IDs for Countries (e.g. ISO_3A) |
Codes |
|
| Currencies |
Hierarchy Levels |
|
| Credit Card Numbers |
Usernames / Passwords | |
| Units of Measures |
Unencrypted fields (Identifiers) |
|
| .....and any custom domain* |
ERP / CRM Identifiiers |
* Must have a distinct pattern that is discernable by humans. Some exceptions apply.
As a result of the classification process, the software is able to identify the distribution of any data domain within the data landscape. Consequently, the software can apply both data quality rules and business rules in comprehensive manner across the entire systems environment.
The following list of requirements are supported by the the Data Classification Process.
| # | Requirement | Available | Comments |
|---|---|---|---|
| 1 | Classify all "important" data assets on the basis of similarity |
Yes | |
| 2 | Group attributes into semantically-equivalent domains |
Yes | |
| 3 | Create maps, showing distribution of semantic domains | Yes | |
| 4 | Auto-detect quality rules that apply to each semantic domain |
Yes | |
| 5 | Find outliers within semantic domains |
Yes | |
| 6 | Populate the metadata repository with taxonomies |
Yes | |
| 7 | Provide Data Stewards with web-access to domains |
Yes | |
| 8 | Support domain searches across entire data landscape |
Yes | e.g. Search of Credit Card Numbers |
| 9 | Group Semantic Domains to Semantic Objects |
Yes | Requires some degree of manual input |
| 10 | Continuously monitor the data domains for changes | Yes | |
| 11 | Create classification reports |
Yes |
Contact a Sales Executive
Request for a Demo
Register for a White Paper
Register for a Webinar