The Importance of Enterprise Data Classification

Classification is a foundational requirement for all enterprise data ecosystems. Without classification, the ability for enterprise applications to deliver results is severely constrained.

March 6, 2020

Categorized in: Data Classification

Share:

Columns clark van der beken Oi X6f STI unsplash

The philosophers tell us that unity is the beginning of intelligibility. That is, in order to make sense of the world in its vastness and variety, we must seek for the unifying forms or patterns that are realized in the many things we observe. The world becomes more logical to us when the many can be grasped in terms of the one.

Does this deep truth apply to enterprise data management? Indeed it does. At the level of the physical data (values, columns, tables, databases, systems) in an enterprise, our observation encounters “the many” in its vastness and variety — unto the millions, the tens or hundreds of millions, or even higher orders of magnitude. And yet, if a single human intelligence were able to attain a synoptic or comprehensive grasp of this vastness, it might discover only a hundred or so unifying forms or patterns.

A concept is such a unifying form or pattern. Business concepts are forms or patterns that unify business data, making sense of it by reducing the vastness to a comprehensible, much smaller network of logical entities and their interrelationships. (Such a network of logical entities is sometimes called an “ontology”.) Examples of business concepts include a Customer, an Order, a Subscription, a Trade. Concepts are complex: their constituents can overlap, one concept can be an attribute of another, and so forth.

Concepts are classes — they classify (and thus unify) their many physical instances under one meaning, despite the innumerable variations that occur amongst the many.

Classification — grasping the many in terms of one concept — is a primary exercise and a marker of intelligence. As data becomes “big” (and getting bigger as we speak), it becomes ever more unmanageable, unless it can be classified intelligently and dealt with according to the meanings it instantiates and that connect it with other data.

The available human resources at any enterprise are insufficient to make intelligent sense of all the data, for the resources needed scale with the size of the data environment, and the commitment of those resources must be ongoing as that environment evolves. The future of data management hinges in significant part on the ability of machines to emulate human intelligence in its capacity to classify and reduce the many to one.

In future posts, as we continue to explore key aspects of enterprise data management — data lineage, data governance, data quality, data privacy, cloud migration, machine learning, analytics, data science, and others — we will be recognizing time after time the pivotal role played by sense-making through data classification in making enterprise data manageable.