What Are Data Quality Dimensions and Why Do They Matter
A Data Quality dimension is a measurable feature or characteristic of data. The term dimension is used to make the connection to dimensions in the measurement of physical objects (e.g., length, width, height). Data quality dimensions provide a vocabulary for defining data quality requirements. From there, they can be used to define the results of initial data quality assessment as well as ongoing measurement.
Why There’s No One “Right” Set of Dimensions
While there is not a single, agreed-to set of Data Quality Dimensions, the subsequent formulations contain common ideas.
Ensuring Completeness: Capturing 100% of Your Data Potential
Completeness: The proportion of data stored against the potential of 100%.
Achieving Uniqueness: Avoiding Duplicate Records
Uniqueness: No entity instance (thing) will be recorded more than once based upon how that thing is identified.
Maintaining Timeliness: Keeping Data Up-to-Date
Timeliness: The degree to which data represent reality from the required point in time.
Validity Checks: Conforming to Syntax and Format Requirements
Validity: Data is valid if it conforms to the syntax (format, type, range) of its definition.
Accuracy: Reflecting the Real-World Truth
Accuracy: The degree to which data correctly describes the ‘real world’ object or event being described.
Consistency: Ensuring Data Aligns Across Systems
Consistency: The absence of difference, when comparing two or more representations of a thing against a definition.
Usability: Making Data Understandable and Actionable
Usability: Is the data understandable, simple, relevant, accessible, maintainable, and at the right level of precision?
Handling Timing Issues Beyond Basic Timeliness
Timing Issues (beyond timeliness itself): Is it stable yet responsive to legitimate change requests?
Flexibility: Adapting Data for Multiple Uses
Flexibility: Is the data comparable and compatible with other data? Does it have useful groupings and classifications? Can it be repurposed? Is it easy to manipulate?
Building Confidence Through Governance and Security
Confidence: Are Data Governance, Data Protection, and Data Security processes in place? What is the reputation of the data, and is it verified or verifiable?
Measuring Value: Balancing Cost, Benefit, and Risk
Value: Is there a good cost/benefit case for the data? Is it being optimally used? Does it endanger people’s safety or privacy, or the legal responsibilities of the enterprise? Does it support or contradict the corporate image or the corporate message?
Objective vs. Subjective Dimensions: What to Measure and What to Interpret
Dimensions include some characteristics that can be measured objectively (Completeness, Validity, Format Conformity) and others that depend heavily on context or on subjective interpretation (usability, reliability, reputation).
Summing Up: The Big Picture of Data Quality Dimensions
Whatever names are used, dimensions focus on whether there is enough data (Completeness), whether it is right (Accuracy, Validity), how well it fits together (Consistency, Integrity, Uniqueness), whether it is up-to-date (Timeliness), accessible, usable, and secure.