Understanding the Difference Between Data Governance and Data Quality

Data governance and data quality are closely related, but with an important difference in perspective. Understanding this difference brings to light a crucial prerequisite for both: a greater transparency of the ecosystem.

March 25, 2020

Categorized in: Data Quality, Data Governance

Share:

Michael moffett HFB8 F0zbo Ew unsplash

It is commonly said — and true — that data governance is an imperative for today’s data-driven enterprises. Data quality is spoken of similarly, even in the same breath and the same context of discussion, as if the two terms were interchangeable. We are not the first to ask: Is there a difference between data governance and data quality?

We hold that there is indeed a difference between data governance and data quality, though we would also insist that understanding the difference between them is one and the same thing with understanding how the two are related.

Data governance and data quality both involve putting various kinds of data management processes in place, and, in fact, many of the processes data governance puts in place are those that help ensure the quality of data. It is even possible to define the “quality of data” broadly enough to make quality appear to coincide with governance — for example, if one thinks of the security and availability of data, as well as processes like master data management, as falling under the “quality” umbrella. On the one hand, then, it cannot be denied that there is at least a substantial overlap, in terms of the data management processes involved, between data governance and data quality. On the other hand, if we simply went along with an identification of the two concepts, we would be overlooking an important difference in perspective.

The concern of data quality is with the concrete usability of particular data. In other words, data quality asks whether this data fulfills the criteria of business utility specified for data of this kind. Is the data accurate? Is it consistent? Is it in a standard form? Profiling, cleansing, deduplication and standardization are some of the ongoing processes that address data quality concerns directly.

The concern of data governance is with managing data as a (or even the) primary asset of a line of business, or of a whole enterprise. Clearly, data quality processes such as those of the preceding paragraph are important tools of data governance at the lowest level. However, it would be a mistake simply to tackle “data quality” headlong without defining the overall governance strategy and identifying the most crucial issues. As anyone knows who has been involved in such activities, there is no limit to the resources that can be spent in the attempt to cleanse data and render it “perfect”.

Data governance asks questions such as: where should data quality efforts be concentrated? Where are the critical flow points in the ecosystem where quality controls should be placed? Where does specific, key data originate? Where is key data getting lost? Where is unused (that is, useless) data accumulating? To data governance, in other words, belongs the judicious use of data quality tools, as well as other tools.

For data governance is concerned with all aspects of managing data as a primary asset. It is concerned with making the data better integrated. It is concerned with the well-functioning and efficiency of data-driven, high-volume business processes. It is concerned with compliance and the management of sensitive data. It is concerned with the impact of changes in the data environment. One can construe all of these concerns as having a relation to the quality of data, but the perspective involved is clearly the strategic perspective of the enterprise or line of business.


All of these concerns of data governance, whether closely or more distantly related to data quality concerns, have a critical foundational requirement. If data governance is to be exercised in an objective manner, it must be grounded in the truth about the data — otherwise, it is blind. But how much of the relevant truth about the data is actually visible? When enterprise data is large and complex, poorly integrated, and fragmented across many silos, it becomes increasingly opaque. A prerequisite for rational data governance is, therefore, an organizing of the data ecosystem in such a way as to render it less opaque and more transparent — in other words, governable. We will explore this subject further in an upcoming post.