Data Warehousing - Multidimensional OLAP
Data Warehousing -
Multidimensional OLAP
Introduction
Multidimensional OLAP (MOLAP)
uses the array-based multidimensional storage engines for multidimensional
views of data. With multidimensional data stores, the storage utilization may
be low if the data set is sparse. Therefore many MOLAP Server uses the two
level of data storage representation to handle dense and sparse data sets.
Points to
remember:
·
MOLAP tools need to process information with consistent response
time regardless of level of summarizing or calculations selected.
·
The MOLAP tools need to avoid many of the complexities of creating
a relational database to store data for analysis.
·
The MOLAP tools need fastest possible performance.
·
MOLAP Server adopts two level of storage representation to handle
dense and sparse data sets.
·
Denser subcubes are identified and stored as array structure.
·
Sparse subcubes employs compression technology.
MOLAP
Architecture
MOLAP includes the following
components.
·
Database server
·
MOLAP server
·
Front end tool
Advantages
Here is the list of
advantages of Multidimensional OLAP
·
MOLAP allows fastest indexing to the precomputed summarized data.
·
Helps the user who are connected to a network and need to analyze
larger, less defined data.
·
Easier to use therefore MOLAP is best suitable for inexperienced
user.
Disadvantages
·
MOLAP are not capable of containing detailed data.
·
The storage utilization may be low if the data set is sparse.
MOLAP vs
ROLAP
SN
|
MOLAP
|
ROLAP
|
1
|
The information retrieval is fast.
|
Information retrieval is comparatively
slow.
|
2
|
It uses the sparse array to store the
data sets.
|
It uses relational table.
|
3
|
MOLAP is best suited for inexperienced
users since it is very easy to use.
|
ROLAP is best suited for experienced
users.
|
4
|
The separate database for data cube.
|
It may not require space other than
available in Data warehouse.
|
5
|
DBMS facility is weak.
|
DBMS facility is strong.
|
Data Warehousing -
Schemas
Introduction
The schema is a logical
description of the entire database. The schema includes the name and
description of records of all record types including all associated data-items
and aggregates. Likewise the database the data warehouse also require the
schema. The database uses the relational model on the other hand the data
warehouse uses the Stars, snowflake and fact constellation schema. In this
chapter we will discuss the schemas used in data warehouse.
Star
Schema
·
In star schema each dimension is represented with only one
dimension table.
·
This dimension table contains the set of attributes.
·
In the following diagram we have shown the sales data of a company
with respect to the four dimensions namely, time, item, branch and location.
·
There is a fact table at the centre. This fact table contains the
keys to each of four dimensions.
·
The fact table also contain the attributes namely, dollars sold
and units sold.
Note: Each dimension has only one dimension table and each table holds a
set of attributes. For example the location dimension table contains the attribute
set {location_key,street,city,province_or_state,country}. This constraint may
cause data redundancy. For example the "Vancouver" and
"Victoria" both cities are both in Canadian province of British
Columbia. The entries for such cities may cause data redundancy along the
attributes province_or_state and country.
Snowflake
Schema
·
In Snowflake schema some dimension tables are normalized.
·
The normalization split up the data into additional tables.
·
Unlike Star schema the dimensions table in snowflake schema are
normalized for example the item dimension table in star schema is normalized
and split into two dimension tables namely, item and supplier table.
·
Therefore now the item dimension table contains the attributes
item_key, item_name, type, brand, and supplier-key.
·
The supplier key is linked to supplier dimension table. The
supplier dimension table contains the attributes supplier_key, and
supplier_type.
<b<>Note: Due to
normalization in Snowflake schema the redundancy is reduced therefore it
becomes easy to maintain and save storage space.</b<>
Fact
Constellation Schema
·
In fact Constellation there are multiple fact tables. This schema
is also known as galaxy schema.
·
In the following diagram we have two fact tables namely, sales and
shipping.
·
The sale fact table is same as that in star schema.
·
The shipping fact table has the five dimensions namely, item_key,
time_key, shipper-key, from-location.
·
The shipping fact table also contains two measures namely, dollars
sold and units sold.
·
It is also possible for dimension table to share between fact
tables. For example time, item and location dimension tables are shared between
sales and shipping fact table.
Schema
Definition
The Multidimensional schema
is defined using Data Mining Query Language( DMQL). the two primitives namely,
cube definition and dimension definition can be used for defining the Data
warehouses and data marts.
SYNTAX FOR CUBE DEFINITION
define cube < cube_name > [ < dimension-list > }: < measure_list >
SYNTAX FOR DIMENSION DEFINITION
define dimension < dimension_name > as ( < attribute_or_dimension_list > )
Star
Schema Definition
The star schema that we have
discussed can be defined using the Data Mining Query Language (DMQL) as
follows:
define cube sales star [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state, country)
Snowflake
Schema Definition
The Snowflake schema that we
have discussed can be defined using the Data Mining Query Language (DMQL) as
follows:
define cube sales snowflake [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier
(supplier key, supplier type))
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city
(city key, city, province or state, country))
Fact
Constellation Schema Definition
The Snowflake schema that we
have discussed can be defined using the Data Mining Query Language (DMQL) as
follows:
define cube sales [time, item, branch, location]:
dollars sold = sum(sales in dollars), units sold = count(*)
define dimension time as (time key, day, day of week, month, quarter, year)
define dimension item as (item key, item name, brand, type, supplier type)
define dimension branch as (branch key, branch name, branch type)
define dimension location as (location key, street, city, province or state,country)
define cube shipping [time, item, shipper, from location, to location]:
dollars cost = sum(cost in dollars), units shipped = count(*)
define dimension time as time in cube sales
define dimension item as item in cube sales
define dimension shipper as (shipper key, shipper name, location as
location in cube sales, shipper type)
define dimension from location as location in cube sales
define dimension to location as location in cube sales
Comments
Post a Comment