Data Profiling - Data Quality-Explained Clearly

Data Quality – Data Profiling

Posted on February 24, 2022 by Muhammad Rawish Siddiqui

What Is Data Profiling in Data Quality?

Data Profiling is a form of data analysis used to inspect data and assess quality. Data profiling uses statistical techniques to discover the true structure, content, and quality of a collection of data (Olson, 2003). A profiling engine produces statistics that analysts can use to identify patterns in data content and structure. Such as:

Counts of Nulls: Identifies nulls exist and allows for inspection of whether they are allowable or not
Max/Min Value: Identifies outliers, like negatives
Max/Min Length: Identifies outliers or invalids for fields with specific length requirements
Frequency Distribution of Values for Individual Columns: Enables assessment of reason-ability (e.g., distribution of country codes for transactions, inspection of frequently or infrequently occurring values, as well as the percentage of the records populated with defaulted values)
Data Type and Format: Identifies level of non-conformance to format requirements, as well as identification of unexpected formats (e.g., number of decimals, embedded spaces, sample values)

Cross-Column and Inter-Table Data Analysis

Profiling also includes cross-column analysis, which can identify overlapping or duplicate columns and expose embedded value dependencies. Inter-table analysis explores overlapping values sets and helps identify foreign key relationships. Most tools allow for drilling down into the analyzed data for further investigation.

Data Profiling as a Foundation for Data Quality

D.P is an effective way to understand, it is just a first step to data quality improvement. Solving problems requires other forms of analysis, including business process analysis, analysis of data lineage, and deeper data analysis that can help isolate root causes of problems.
Results of D.P can be used to identify opportunities to improve the quality of both Data and Metadata. Results from the profiling engine must be assessed by an analyst to determine whether data conforms to rules and other requirements. A good analyst can use profiling results to confirm known relationships and uncover hidden characteristics and patterns within and between data sets, including business rules, and validity constraints. Profiling is usually used as part of data discovery for projects (especially data integration projects; or to assess the current state of data that is targeted for improvement.

Data Quality – Data Profiling

What Is Data Profiling in Data Quality?

Cross-Column and Inter-Table Data Analysis

Data Profiling as a Foundation for Data Quality

Leave a Reply Cancel reply

Recent Posts

Archives

Categories