image

Data Management – Data Profiling

Understanding Data Content and Structure is essential for Data Governance, Data Architecture, Data Modeling and Design, Data Storage and Operations, Data Security, Data Quality and Data Integration and Interoperability. Data profiling contributes to this end. Actual data structure and contents always differ from what is assumed. Sometimes differences are small; other times they are large enough to derail overall effort. Profiling can help teams discover these differences and use that knowledge to make better decisions. If Data Profiling is skipped, then information that should influence design will not be discovered until testing or operations.

Basic Profiling involves Analysis of:

  • Data Format as defined in the data structures and inferred from the actual data
  • Data Population, including the levels of null, blank, or defaulted data
  • Data Values and how closely they correspond to a defined set of valid values
  • Patterns and Relationships internal to the data set, such as related fields and cardinality rules Relationships to other data sets

One goal of profiling is to assess the Quality of Data. Assessing the fitness of the Data for a particular use requires documenting business rules and measuring how well the Data meets those business rules. Assessing accuracy requires comparing to a definitive set of Data that has been determined to be correct. Such Data sets are not always available, so measuring accuracy may not be possible, especially as part of a profiling effort.

As with high-level Data Discovery, Data Profiling includes verifying assumptions about the Data against the Actual Data. Capture results of data profiling in a Metadata repository for use on later projects and use what is learned from the process to improve the accuracy of existing Metadata (Olson, 2003).

The requirement to profile Data must be balanced with an organization’s security and privacy regulations.

Leave a Reply

Your email address will not be published. Required fields are marked *

2 × 4 =