Integrating Data from Outside the Enterprise

In business to business relationships, partner data becomes your data. Integrating and leveraging external data sources is being done today as a painful manual process, but is ripe for automation as an ideal use case for data classification.

March 11, 2020

Categorized in: Automation, Data Classification, Data Ingestion


Jars on shelf viktor talashuk Bv FN Arg W7 EQ unsplash

In a previous post, we wrote of data classification as key to making sense of the increasing volumes of business data inside today’s enterprises. But we live also in a connected world of business-to-business relationships — which today means data-driven partnerships in which the data of partners, suppliers, clients, group policyholders and so forth (the list goes on) becomes your data. For the life of each such relationship, beginning with the initial intake, the intelligent ingestion of data from external sources becomes a problem that your enterprise must solve in one way or another.

Classification, as we have written, reduces the many to one in terms of logical business concepts. Another way to put it is that classification helps discover the connections between many disparate items that share the same meaning. The problem of intelligent ingestion and incorporation of data into your business from external sources is a perfect use case for sense-making through data classification.

In this problem, the “many” are represented by the number of external sources, the differing field structures and names of fields for the same type of business object, the lack of standardization of data values, the diversity of file formats, and so on. All this makes for a daunting (and ongoing) process of reconciling the structure and content of each and every external data source with those of the target databases in your own organization.

This reconciliation is most often carried out manually from spreadsheets or semistructured data files (generally the output of database dumps) provided by the IT departments of your business partners. An automated solution is highly desirable, since it relieves the burden on human resources as well as the potential for human error in a manual process.

Classification is the key here, because the logical drivers of the reconciliation required for ingesting external data are precisely the core business concepts that are common to both sides of the data-driven partnership.

It follows that data on both sides must be classified in terms of these core concepts. The good news is that on the target side (that is, your side), the sense-making process needs to be performed only once. Not only for this purpose, but for plenty of other purposes, you need to make logical sense of your own data through classification — using advanced algorithms (including machine learning) to analyze your own data and metadata, so as to discover which core business concepts are present in which databases and tables, and how these concepts are physically represented there.

A similar process must then be put in place for classifying each partner data source (typically, the content of semi-structured files) in terms of these same concepts, using the same advanced algorithms (and ML). When that is done, the mapping between source and target field structures is computable directly. This mapping then governs the automated ingestion of partner data into the appropriate systems in your enterprise.

In summary, the ingestion and incorporation of partner data, just as much as the management of internal data, requires an automated process that makes sense of the data being incorporated. The linchpin of this automated process is data classification, which builds the intelligible “bridges” that enable the large and increasing volumes of data on either side of the business to business relationship to truly serve the needs and interests of both parties.