Using our Reference Data Integration Methodology, we can integrate the reference data from the enterprise data landscape. We follow a repeatable process for integrating reference data. The steps are:
We employ the following steps to integrate the reference data.
STEP I:SCAN
We use our DBCrawler™ application to first scan the enterprise database landscape to find all the data objects that are of interest. This means that we can retrieve
- Data
- MetaData
- Keys and IDs
- Relationships
from all the databases in the global data landscape (provided that we have the appropriate access information)
STEP II:DISCOVER
We use the metadata catalog created by the DBCrawler™ application to discover all the reference data objects that are of interest. We can scan the key relationships, to understand which database tables that contain
- Reference Data
- Master Data
- Dimensional Data
- Key Data
- ID Data
STEP III:PROFILE
We run a large number of data profiling agents on the data that has been discovered. The objective of these profiling agents is to understand the semantic meaning of the reference data and all associated attributes. The profiling exercise leads to an a complete understanding of
- Physical Metadata
- Semantic Metadata Data
- Attribute Relationships
- Descriptive Metadata (if available)
STEP IV:IDENTIFY
The semantic metadata profiles obtained from the previous step allow us to identify with a high level of certainty the reference objects that are present in a global data landscape.
STEP V:CLASSIFY
The reference data objects that have been identified in the previous step are classified into the appropriate categories. We are able to classify reference data in approximately 200 categories, the important ones being
- Customers
- Products
- Vendors
- Employees
- Assets
- Facilities
- Shareholders
- Investors
- Prospects
- Competitors
STEP VI:EXTRACT
Once the reference data (that is of interest) has been identified, we use the Global Data Store™ application to extract the reference data into a central Reference Data repository.
STEP VII:TRANSFORM
In many cases the reference data objects need to be transformed, before further integration can be performed. Each transformation results in additional data fields that enhance the quality of the reference data.
An example of a simple transformation: A date-format transformation agent can change DD-MMM-YY format to YYYY-MM-DD format.
An example of a complex transformation: A transformation agent is built with custom code provided by the client.
STEP VIII:PARSE
In many cases the reference data objects need to be parsed, before further integration can be performed. Each transformation results in additional data fields that enhance the quality of the reference data.
An example of simple parsing: A date parser agent can parse DD-MMM-YY format to three fields YYYY, MM, DD.
An example of a complex parsing: A parser agent is built with custom code provided by the client.
STEP IX:COMPUTE QUALITY METRICS
For the reference data that has been extracted from the source, a large number of data quality agents are used to compute metrics that assess the quality of the data. Some of these metrics relate to:
- Level of record duplication
- Level of record completeness
- Level of record integrity
- Data Quality Experts can used to provide additional metrics related to
- Level of record accuracy
- Level of record concurrency
- Level of record conformity (with business rules)
STEP X: ANALYZE
Analysis of reference data allows the determination of the causes of low quality. This analysis leads to a correction mechanism for improving the quality of the reference data:
- Determination of data standards that the reference data must conform to
- Determination of data validation rules
- Assignment of reference data stewards
STEP XI: STANDARDIZE
The reference data is made to conform to data standards. Examples are :
- Global standards
- Country-specific standards
- Company-specific standards
STEP XII: CORRECT
In this step, all the data that has been determined to be incorrect is manually corrected, such that the reference data has
- Value Integrity
- Structure Integrity
- Retention Integrity
- Derivation Integrity
STEP XIII:DE DUPE
Each reference data object is searched for duplicate records. All the duplicate records are removed using a de-duping process, using 5 specialized algorithms.
STEP XIV:ENHANCE
The accuracy and quality of the reference data is enhanced by including additional fields, from 3rd parties or through foreign key relationships. Also, versioning information is added at this stage.
For example: Company information is often enhanced by the inclusion of the DUNS™ numbers (from Dun and Bradstreet)
STEP XV:AGGREGATE
Reference Data of the same type from multiple data sources are aggregated in the Reference Data Repository.
For example: If there are 20 customer reference data tables that have been cleansed and enhanced, they are aggregated in the Customer Reference Data Repository.
STEP XVI:MATCH
Multiple tables containing the Reference Data of the same type from multiple data sources are matched to create
- Integrated Reference Data containing a comprehensive list of all records.
- A Cross Reference Table showing how each customer is related
For example: All the customer records in the 20 customer reference data tables are matched to each other to create a single Customer Reference Data Table, containing all the customers across the enterprise.
STEP XVII:REPORT
Reports are generated from the Enterprise reference data with
- Flexible Hierarchies
- Multiple Taxonomies
- Sorting and Filtering capabilities
- Export capability
For example: All the customer records in the enterprise can view according to a large number of dimensions, some of which are shown below
- Time
- Geography
- Demographic
- Profitability Segment
- Revenue Segment
STEP XVIII:SYNCHRONIZE
The Integrated Reference Data Table containing a comprehensive list of all records and the Cross Reference Table are synchronized with the database tables, such that all the records are automatically kept up to date.
For example: If one of customer records is changed (in a particular table/schema /instance/database/system), then the Integrated Customer Reference Table becomes aware of the change, and updates corresponding record within the table.
Thus the Reference Table is always evolving as the customer information within the database environment changes.
STEP XVIII:ADD/MODIFY/DELETE
If any reference data record is added to one of the database tables in the enterprise landscape, the Data Steward can be immediately aware of the new record, and can make sure that all processes related to creating new records have been complied to.
For example: If a customer record is added to a particular table/schema /instance/database/system, then the Data Steward who is responsible for the Integrated Customer Reference Table becomes aware of the addition. This ensures that the Customer Reference Table always up to date and compliant with the data quality processes that are in place.
STEP XX:ARCHIVE
All the Reference Data Tables must be continuously archived to ensure that it can be quickly recovered, if required.
For example: If a customer record is added to a particular table/schema /instance/database/system, then the Data Steward who is responsible for the Integrated Customer Reference Table becomes aware of the addition. This ensures that the Customer Reference Table always up to date and compliant with the data quality processes that are in place.