Data Warehousing - Concepts
Data Warehousing -
Overview
The term "Data
Warehouse" was first coined by Bill Inmon in 1990. He said that Data
warehouse is subject Oriented, Integrated, Time-Variant and nonvolatile
collection of data.This data helps in supporting decision making process by
analyst in an organization
The operational database
undergoes the per day transactions which causes the frequent changes to the
data on daily basis.But if in future the business executive wants to analyse
the previous feedback on any data such as product,supplier,or the consumer
data. In this case the analyst will be having no data available to analyse
because the previous data is updated due to transactions.
The Data Warehouses provide
us generalized and consolidated data in multidimensional view. Along with
generalize and consolidated view of data the Data Warehouses also provide us
Online Analytical Processing (OLAP) tools. These tools help us in interactive
and effective analysis of data in multidimensional space. This analysis results
in data generalization and data mining.
The data mining functions
like association,clustering ,classification, prediction can be integrated with
OLAP operations to enhance interactive mining of knowledge at multiple level of
abstraction. That's why data warehouse has now become important platform for
data analysis and online analytical processing.
Understanding
Data Warehouse
·
The Data Warehouse is that database which is kept separate from
the organization's operational database.
·
There is no frequent updation done in data warehouse.
·
Data warehouse possess consolidated historical data which help the
organization to analyse it's business.
·
Data warehouse helps the executives to organize,understand and use
their data to take strategic decision.
·
Data warehouse systems available which helps in integration of
diversity of application systems.
·
The Data warehouse system allows analysis of consolidated
historical data analysis.
Definition
Data warehouse is Subject
Oriented, Integrated, Time-Variant and Nonvolatile collection of data that
support management's decision making process.
Why Data
Warehouse Separated from Operational Databases
The following are the reasons
why Data Warehouse are kept separate from operational databases:
·
The operational database is constructed for well known tasks and
workload such as searching particular records, indexing etc but the data
warehouse queries are often complex and it presents the general form of data.
·
Operational databases supports the concurrent processing of
multiple transactions.Concurrency control and recovery mechanism are required
for operational databases to ensure robustness and consistency of database.
·
Operational database query allow to read, modify operations while
the OLAP query need onlyread only access of stored data.
·
Operational database maintain the current data on the other hand
data warehouse maintain the historical data.
Data
Warehouse Features
The key features of Data
Warehouse such as Subject Oriented, Integrated, Nonvolatile and Time-Variant
are are discussed below:
·
Subject Oriented - The Data Warehouse is Subject Oriented because it provide us the
information around a subject rather the organization's ongoing operations.
These subjects can be product, customers, suppliers, sales, revenue etc. The
data warehouse does not focus on the ongoing operations rather it focuses on
modelling and analysis of data for decision making.
·
Integrated - Data Warehouse is constructed by integration of data from
heterogeneous sources such as relational databases, flat files etc. This
integration enhance the effective analysis of data.
·
Time-Variant - The Data in Data Warehouse is identified with a particular time
period. The data in data warehouse provide information from historical point of
view.
·
Non Volatile - Non volatile means that the previous data is not removed when
new data is added to it. The data warehouse is kept separate from the
operational database therefore frequent changes in operational database is not
reflected in data warehouse.
Note: - Data Warehouse does not require transaction processing, recovery
and concurrency control because it is physically stored separate from the
operational database.
Data
Warehouse Applications
As discussed before Data
Warehouse helps the business executives in organize, analyse and use their data
for decision making. Data Warehouse serves as a soul part of a
plan-execute-assess "closed-loop" feedback system for enterprise
management. Data Warehouse is widely used in the following fields:
·
financial services
·
Banking Services
·
Consumer goods
·
Retail sectors.
·
Controlled manufacturing
Data
Warehouse Types
Information processing,
Analytical processing and Data Mining are the three types of data warehouse
applications that are discussed below:
·
Information processing - Data Warehouse allow us to process the information stored in
it.The information can be processed by means of querying, basic statistical
analysis, reporting using crosstabs, tables, charts, or graphs.
·
Analytical Processing - Data Warehouse supports analytical processing of the information
stored in it.The data can be analysed by means of basic OLAP
operations,including slice-and-dice,drill down,drill up, and pivoting.
·
Data Mining - Data Mining supports knowledge discovery by finding the hidden
patterns and associations, constructing analytical models, performing
classification and prediction.These mining results can be presented using the
visualization tools.
SN
|
Data Warehouse (OLAP)
|
Operational Database(OLTP)
|
1
|
This involves historical processing of
information.
|
This involves day to day processing.
|
2
|
OLAP systems are used by knowledge
workers such as executive, manager and analyst.
|
OLTP system are used by clerk, DBA, or
database professionals.
|
3
|
This is used to analysis the business.
|
This is used to run the business.
|
4
|
It focuses on Information out.
|
It focuses on Data in.
|
5
|
This is based on Star Schema, Snowflake
Schema and Fact Constellation Schema.
|
This is based on Entity Relationship
Model.
|
6
|
It focuses on Information out.
|
This is application oriented.
|
7
|
This contains historical data.
|
This contains current data.
|
8
|
This provides summarized and
consolidated data.
|
This provide primitive and highly
detailed data.
|
9
|
This provide summarized and
multidimensional view of data.
|
This provides detailed and flat
relational view of data.
|
10
|
The number or users are in Hundreds.
|
The number of users are in thousands.
|
11
|
The number of records accessed are in
millions.
|
The number of records accessed are in
tens.
|
12
|
The database size is from 100GB to TB
|
The database size is from 100 MB to GB.
|
13
|
This are highly flexible.
|
This provide high performance.
|
Data Warehousing -
Concepts
What is
Data Warehousing?
Data Warehousing is the
process of constructing and using the data warehouse. The data warehouse is
constructed by integrating the data from multiple heterogeneous sources. This
data warehouse supports analytical reporting, structured and/or ad hoc queries
and decision making. Data Warehousing involves data cleaning, data integration
and data consolidations.
Using Data
Warehouse Information
There are decision support
technologies available which help to utilize the data warehouse. These
technologies helps the executives to use the warehouse quickly and effectively.
They can gather the data, analyse it and take the decisions based on the
information in the warehouse. The information gathered from the warehouse can
be used in any of the following domains:
·
Tuning production strategies - The product strategies can be well tuned by repositioning the
products and managing product portfolios by comparing the sales quarterly or
yearly.
·
Customer Analysis - The customer analysis is done by analyzing the customer's buying
preferences, buying time, budget cycles etc.
·
Operations Analysis - Data Warehousing also helps in customer relationship management,
making environmental corrections.The Information also allow us to analyse the
business operations.
Integrating
Heterogeneous Databases
To integrate heterogeneous
databases we have the two approaches as follows:
·
Query Driven Approach
·
Update Driven Approach
Query
Driven Approach
This is the traditional
approach to integrate heterogeneous databases. This approach was used to build
wrappers and integrators on the top of multiple heterogeneous databases. These
integrators are also known as mediators.
PROCESS OF QUERY DRIVEN APPROACH:
·
when the query is issued to a client side, a metadata dictionary
translate the query into the queries appropriate for the individual
heterogeneous site involved.
·
Now these queries are mapped and sent to the local query
processor.
·
The results from heterogeneous sites are integrated into a global
answer set.
DISADVANTAGES
·
The Query Driven Approach needs complex integration and filtering
processes.
·
This approach is very inefficient.
·
This approach is very expensive for frequent queries.
·
This approach is also very expensive for queries that requires
aggregations.
Update
Driven Approach
We are provided with the
alternative approach to traditional approach. Today's Data Warehouse system
follows update driven approach rather than the traditional approach discussed
earlier. In Update driven approach the information from multiple heterogeneous
sources is integrated in advance and stored in a warehouse. This information is
available for direct querying and analysis.
ADVANTAGES
This
approach has the following advantages:
·
This approach provide high performance.
·
The data are copied, processed, integrated, annotated, summarized
and restructured in semantic data store in advance.
·
Query processing does not require interface with the processing at
local sources.
Data
Warehouse Tools and Utilities Functions
The following are the
functions of Data Warehouse tools and Utilities:
·
Data Extraction - Data Extraction involves gathering the data from multiple
heterogeneous sources.
·
Data Cleaning - Data Cleaning involves finding and correcting the errors in
data.
·
Data Transformation - Data Transformation involves converting data from legacy format
to warehouse format.
·
Data Loading - Data Loading involves sorting, summarizing, consolidating,
checking integrity and building indices and partitions.
·
Refreshing - Refreshing involves updating from data sources to warehouse.
Note: Data Cleaning and Data Transformation are important steps in
improving the quality of data and data mining results.
Comments
Post a Comment