De-Normalization is the deliberate transformation of normalized logical data model entities into physical tables with redundant or duplicate data structures. In other words, de-normalization intentionally puts one attribute in multiple places. There are several reasons to de-normalize data, such as in some circumstances, De-Normalizing or adding redundancy can improve performance so much that it outweighs the cost of the duplicate storage. Dimensional structures are the main means of De-Normalization.
- Improve Performance: Combining data from multiple other tables in advance to avoid costly run-time joins
- Creating Smaller, Pre-Filtered Copies of Data to reduce costly run-time calculations and/or table scans of large tables
- Pre-Calculating and Storing Costly Data Calculations to avoid runtime system resource competition
De-Normalization can also be used to enforce user security by segregating data into multiple views or copies of tables according to access needs. This process does introduce a risk of data errors due to duplication. Therefore, De-Normalization is frequently chosen if structures such as views and partitions fall short in producing an efficient physical design. It is good practice to implement data quality checks to ensure that the copies of the attributes are correctly stored. In general, De-Normalize only to improve database query performance or to facilitate enforcement of user security.
In Dimensional Data Modeling, de-normalization is called collapsing or combining. If each Dimension is Collapsed into a Single Structure, the resulting Data Model is called a Star Schema. If the Dimensions are not Collapsed, the resulting Data Model is called a Snowflake.
Canonical Model: A variant of a physical scheme is a Canonical Model, used for data in motion between systems. This model describes the structure of data being passed between systems as packets or messages. When sending data through web services, an Enterprise Service Bus (ESB), or through Enterprise Application Integration (EAI), the canonical model describes what data structure the sending service and any receiving services should use. These structures should be designed to be as generic as possible to enable re-use and simplify interface requirements. This structure may only be instantiated as a buffer or queue structure on an intermediary messaging system (middleware) to hold message contents temporarily.
Partitioning refers to the process of splitting a table. It is performed to facilitate archiving and to improve retrieval Performance. Partitioning can be either vertical (separating groups of columns) or horizontal (separating groups of rows).
- Vertically Split: To reduce query sets, create subset tables that contain subsets of columns. For example, split a customer table in two based on whether the fields are mostly static or mostly volatile (to improve load / index performance), or based on whether the fields are commonly or uncommonly included in queries (to improve table scan performance).
- Horizontally Split: To reduce query sets, create subset tables using the value of a column as the differentiator. For example, create regional customer tables that contain only customers in a specific region.
Views: A view is a virtual table. Views provide a means to look at data from one or many tables that contain or reference the actual attributes. A standard view runs SQL to retrieve data at the point when an attribute in the view is requested. An instantiated (often called ‘materialized’) view runs at a predetermined time. Views are used to simplify queries, control data access, and rename columns, without the redundancy and loss.