Star and Snowflake Schema in Data Warehouse with Model Examples
The modelling of data warehouse systems is an area of expertise for the Multidimensional Schema language. The schemas have been developed to cater to the one-of-a-kind requirements of very large databases that have been developed for analytical purposes (OLAP).
Different kinds of data warehouse schema include:
The following are the three primary kinds of multidimensional schemas, each of which has its own set of benefits.
In this tutorial, we will explain the Star schema, as well as the Snowflake schema, Galaxy schema, and Star Cluster schema.
What is a Star Schema?
The Star Schema is a type of data warehouse organisation in which the centre of the star can contain one fact table and several dimension tables associated with it. Because its structure is so similar to that of a star, it has been given the name “star schema.” The Star Schema data model is the simplest type of data model that can be used in a Data Warehouse. In some circles, it is also referred to as the Star Join Schema, and it is designed specifically with querying large data sets in mind.
In the following illustration of a Star Schema, the fact table occupies the central position. This table contains the keys that are needed to access each dimension table, such as Dealer ID, Model ID, Date ID, Product ID, Branch ID, and other attributes such as Units sold and Revenue.
Characteristics of Star Schema:
Diagram of a Star, for Illustration Purposes
Star Schema possesses the following characteristics:
In a star schema, each dimension is represented by a single table that only contains one dimension of data.
It is expected that the set of attributes will be included in the dimension table.
The fact table and the dimension table are joined together with the help of a foreign key.
There is no connection between the different dimension tables.
The key and measures would be included in the fact table.
The Star schema is straightforward to comprehend and ensures efficient use of disc space.
There is no attempt at normalisation made with the dimension tables. For instance, in the figure that was just presented, the Country ID column does not contain a lookup table for the country, as it would in an OLTP design.
There is widespread support for the schema among BI Tools.
What is a Snowflake Schema?
In the context of data warehouses, the term “snowflake schema” refers to a logical arrangement of tables in a multidimensional database in such a way that the entity relationship diagram takes the form of a snowflake. A Star Schema can be extended into a Snowflake Schema, which adds on additional dimensions to the original model. Normalization of the dimension tables results in the splitting of the data into additional tables.
In the subsequent illustration of the Snowflake Schema, Country is further normalised into its own individual table.
Characteristics of Snowflake Schema:
An Illustration of the Snowflake Schema
An Illustration of the Snowflake Schema
Schema of Snowflakes Possessing These Characteristics
The primary advantage of using the snowflake schema is that it requires less space on the disc.
A dimension that is easier to implement has been added to the Schema.
The performance of queries suffers because there are multiple tables.
Star Schema Vs Snowflake Schema: Key Differences
Following is a key difference between Snowflake schema vs Star schema:
|Hierarchies for the dimensions are stored in the dimensional table.
|Hierarchies are divided into separate tables.
|It contains a fact table surrounded by dimension tables.
|One fact table surrounded by dimension table which are in turn surrounded by dimension table
|In a star schema, only single join creates the relationship between the fact table and any dimension tables.
|A snowflake schema requires many joins to fetch the data.
|Simple DB Design.
|Very Complex DB Design.
|Denormalized Data structure and query also run faster.
|Normalized Data Structure.
|High level of Data redundancy
|Very low-level data redundancy
|Single Dimension table contains aggregated data.
|Data Split into different Dimension Tables.
|Cube processing is faster.
|Cube processing might be slow because of the complex join.
|Offers higher performing queries using Star Join Query Optimization.
Tables may be connected with multiple dimensions.
|The Snowflake schema is represented by centralized fact table which unlikely connected with multiple dimensions.
Using the Snowflake Schema will present you with a number of challenges, the most significant of which is the requirement that you increase the amount of maintenance work you do due to the increased number of lookup tables.
A Galaxy Schema is a diagram of a galaxy.
Two fact tables are contained within a Galaxy Schema, and these fact tables share dimension tables with one another. A different name for it is the Fact Constellation Schema. Because the schema is conceptualised as a constellation of stars, we refer to it as the Galaxy Schema.
As is evident from the preceding illustration, there are two fact tables.
In Galaxy, dimensions that are shared between schemas are referred to as Conformed Dimensions.
Galaxy Schema Possesses the Following Traits:
This schema’s dimensions have been broken up into their own distinct dimensions, each of which is based on a different level of the hierarchy.
For instance, if there are four tiers of hierarchy in geography, such as regions, countries, states, and cities, then the Galaxy schema ought to have four dimensions.
In addition, it is feasible to construct this kind of schema by subdividing the single-star schema into multiple star schemes.
This schema has very large dimensions, which are required in order to construct based on the different levels of the hierarchy.
Utilizing this schema to aggregate fact tables results in a deeper level of comprehension for the reader.
What is Star Cluster Schema?
The snowflake schema includes hierarchies that have been fully expanded. On the other hand, this might make the Schema more complicated and call for additional joins. The star schema, on the other hand, has fully collapsed hierarchies, which can result in redundant information being stored. Therefore, the Star Cluster Schema design may be the optimal solution as it strikes a balance between the two different schemas.
Forks in hierarchies can sometimes be found in the form of overlapping dimensions. A fork occurs when the same entity serves as a parent in two distinct dimensional hierarchies at the same time. Following this, the entities associated with the fork were classified as having one-to-many relationships.