image

Data Quality – Cleansing and Enrichment

Data Cleansing / Scrubbing

Data Cleansing or Scrubbing transforms data to make it conform to data standards and domain rules. Cleansing includes detecting and correcting data errors to bring the quality of data to an acceptable level. It costs money and introduces risk to continuously remediate data through cleansing. Ideally, the need for data cleansing should decrease over time, as root causes of data issues are resolved. The need for data cleansing can be addressed by:

  • Implementing Controls to Prevent Data Entry Errors
  • Correcting the Data in the Source System
  • Improving the Business Processes that Create the Data

Data Enhancement / Enrichment
Data enhancement or enrichment is the process of adding attributes to a data set to increase its quality and usability. Some enhancements are gained by integrating data sets internal to an organization. External data can also be purchased to enhance organizational data. Data Enhancement may include:

  • Time/Date Stamps: One way to improve data is to document the time and date that data items are created, modified, or retired, which can help to track historical data events. If issues are detected with the data, timestamps can be very valuable in root cause analysis, because they enable analysts to isolate the time-frame of the issue.
  • Audit Data: Auditing can document data lineage, which is important for historical tracking as well as validation.
  • Reference Vocabularies: Business specific terminology, ontologies, and glossaries enhance understanding and control while bringing customized business context.
  • Contextual Information: Adding context such as location, environment, or access methods and tagging data for review and analysis.
  • Geographic Information: Geographic information can be enhanced through address standardization and Geo-coding, which includes regional coding, municipality, neighborhood mapping, latitude / longitude pairs, or other kinds of location-based data.
  • Demographic Information: Customer data can be enhanced through demographic information, such as age, marital status, gender, income, or ethnic coding. Business entity data can be associated with annual revenue, number of employees, size of occupied space, etc.
  • Psychographic Information: Data used to segment the target populations by specific behaviors, habits, or preferences, such as product and brand preferences, organization memberships, leisure activities, commuting transportation style, shopping time preferences, etc.
  • Valuation Information: Use this kind of enhancement for asset valuation, inventory, and sale.

Leave a Reply

Your email address will not be published. Required fields are marked *

20 − 3 =