A Data Quality dimension is a measurable feature or characteristic of data. The term dimension is used to make the connection to dimensions in the measurement of physical objects (e.g., length, width, height). Data quality dimensions provide a vocabulary for defining data quality requirements. From there, they can be used to define results of initial data quality assessment as well as ongoing measurement.
While there is not a single, agreed-to set of Data Quality Dimensions, the subsequent formulations contain common ideas.
- Completeness: The proportion of data stored against the potential for 100%.
- Uniqueness: No entity instance (thing) will be recorded more than once based upon how that thing is identified.
- Timeliness: The degree to which data represent reality from the required point in time.
- Validity: Data is valid if it conforms to the syntax (format, type, range) of its definition.
- Accuracy: The degree to which data correctly describes the ‘real world’ object or event being described.
- Consistency: The absence of difference, when comparing two or more representations of a thing against a definition.
- Usability: Is the data understandable, simple, relevant, accessible, maintainable and at the right level of precision?
- Timing Issues (beyond timeliness itself): Is it stable yet responsive to legitimate change requests?
- Flexibility: Is the data comparable and compatible with other data? Does it have useful groupings and classifications? Can it be repurposed? Is it easy to manipulate?
- Confidence: Are Data Governance, Data Protection, and Data Security processes in place? What is the reputation of the data, and is it verified or verifiable?
- Value: Is there a good cost / benefit case for the data? Is it being optimally used? Does it endanger people’s safety or privacy, or the legal responsibilities of the enterprise? Does it support or contradict the corporate image or the corporate message?
Dimensions include some characteristics that can be measured objectively (Completeness, Validity, Format Conformity) and others that depend on heavily context or on subjective interpretation (usability, reliability, reputation).
Whatever names are used, dimensions focus on whether there is enough data (Completeness), whether it is right (Accuracy, Validity), how well it fits together (Consistency, Integrity, Uniqueness), whether it is up-to-date (Timeliness), accessible, usable, and secure.