Sign in
Log inSign up

Dimensions & Data Warehouse Part 1

Zaida Souhil's photo
Zaida Souhil
·Jan 14, 2022·

5 min read

Dimensions & Data Warehouse Part 1

Dimension Modeling

The data warehouse schema adheres to a specific model known as dimension modeling, which consists of data structure techniques for data storage and retrieval optimization. With dimensions surrounding the fact table, the model is divided into two schema: start-like schema and snowflake schema.

Star schema

In this schema, we have a fact table in the center that is surrounded by dimensions. Dimensions are simply analysis axes, such as the time dimension, which allows us to analyze our data based on time. This schema is known as a star schema because its structure is shaped like a star and it is designed for querying large data sets.

Star-Schema.png

Characteristics of Star Schema:

  • Every dimension in a star schema is represented with the only one-dimension table.
  • The dimension table is joined to the fact table using a foreign key
  • The dimension table are not joined to each other
  • Fact table would contain keys and measures
  • The Star schema is easy to understand and provides optimal disk usage.
  • The dimension tables are not normalized.
  • The schema is widely supported by BI Tools.

Snowflake Schema

A Snowflake Schema is an extension of a Star Schema that adds dimension. The dimension tables have been normalized, resulting in the data being split into additional tables.

Snow-flake-schema.png

Characteristics of Snowflake Schema:

  • The main benefit of the snowflake schema it uses smaller disk space.
  • Easier to implement a dimension is added to the Schema
  • Due to multiple tables query performance is reduced
  • The primary challenge that you will face while using the snowflake Schema is that you need to perform more maintenance efforts because of the more lookup tables.

Type of dimensions

Conformed dimension

A conformed dimension is a dimension that is linked to multiple fact tables or, in other words, different business areas.

What does it mean to have a conformed dimension? This has to do with the term "integrated" in the data warehouse definition. Even if a particular entity had different meanings and properties in the source systems, there must be a single version of this object once the data flows into the data warehouse.

Conform-dim.png The time dimension is a common conforming dimension in an organization. The time dimension isn't the only easy-to-create conforming dimension. One example is the customer dimension. Different customer databases are likely to exist in different parts of the organization in every company with a history. Data must be compared to other data, rules must be established, and data must be cleansed in order to obtain a conformed customer dimension. Furthermore, we'll need to apply the same filters to the new values when loading incremental data into the data warehouse to ensure that we're only adding genuine new customers to the customer dimension.

Role Playing dimension

A role-playing dimension is a dimension that appears multiple times in a fact table, such as the time dimension, which may have various date attributes such as Order Date, Delivered Data, and Shipped Date. In this case, we can create different dimensions based on the date type, but maintaining them will be a nightmare, and the star schema only has one relationship with the dimension table. Instead, we can use a single dimension table and create views based on it, with each view having its own type of date, as shown in this example:

Dim-Role-Playing.png

Junk Dimension

A junk dimension is a set of random transactional codes, flags, and text attributes associated with a specific dimension.

This type of dimension is used to reduce the number of unnecessary dimensions (columns with low cardinality) and the number of columns in the fact table. Let see an example to understand more, Let see an example to understand more:

  • Date
  • Product
  • Store
  • Order No
  • Payment Method
  • Store Type
  • Customer Assistance
  • Quantity
  • Unit Price

So our fact table we’ll have :

  • Date_Key
  • Product
  • Sotre_Key
  • Quantity
  • Unit Price

The issue is that if we want to add payment mode, store type, and customer assistance to our fact table, we'll have to create three more dimensions, each with low cardinality. We can pack these tables into one or more junk dimensions. For example, let's say our payment method has three attributes, store type has two, and customer assistance has two. As a result of combining these tables, we'll have (3x2 x2=12 rows) of unique combinations, and our junk dim will have a surrogate key that connects to our fact table.

Junk-dimen.png

Degenerate Dimensions

It's a dimension key in a fact table that doesn't have its own dimension table and is needed as a dimension for analysis or drill-down. This degenerate dimension is also known as fact dimension because it is built from a fact table item and placed in the fact table. By putting a high cardinality dimension key in the fact table, it helps to reduce duplicate data.

Degenerate dimension keys are extremely unique and nearly as large as fact tables. Moving this attribute in a single column dimension table necessitates a significantly large dimension table, and you would also need to join the fact table's surrogate key to the dimension table, which causes performance issues, especially when the fact table is reasonably large.

Order number in a purchase order fact table with multiple line items, for example, could be a good candidate for degenerate dimension. The fact table for purchase orders can be grouped by order number to get a list of all associated items.

Outrigger Dimensions

It's a dimension that's linked to another dimension in order to normalize the star schema, and it's shared by multiple dimensions.

Our schema becomes a snowflake schema when we use outrigger dimensions.

Snow-flake-schema.png