Sunday, November 2, 2014

Data Warehousing - Concepts

Data Warehousing - Overview

The term "Data Warehouse" was first coined by Bill Inmon in 1990. He said that Data warehouse is subject Oriented, Integrated, Time-Variant and nonvolatile collection of data.This data helps in supporting decision making process by analyst in an organization
The operational database undergoes the per day transactions which causes the frequent changes to the data on daily basis.But if in future the business executive wants to analyse the previous feedback on any data such as product,supplier,or the consumer data. In this case the analyst will be having no data available to analyse because the previous data is updated due to transactions.
The Data Warehouses provide us generalized and consolidated data in multidimensional view. Along with generalize and consolidated view of data the Data Warehouses also provide us Online Analytical Processing (OLAP) tools. These tools help us in interactive and effective analysis of data in multidimensional space. This analysis results in data generalization and data mining.
The data mining functions like association,clustering ,classification, prediction can be integrated with OLAP operations to enhance interactive mining of knowledge at multiple level of abstraction. That's why data warehouse has now become important platform for data analysis and online analytical processing.

Understanding Data Warehouse

·         The Data Warehouse is that database which is kept separate from the organization's operational database.
·         There is no frequent updation done in data warehouse.
·         Data warehouse possess consolidated historical data which help the organization to analyse it's business.
·         Data warehouse helps the executives to organize,understand and use their data to take strategic decision.
·         Data warehouse systems available which helps in integration of diversity of application systems.
·         The Data warehouse system allows analysis of consolidated historical data analysis.

Definition

Data warehouse is Subject Oriented, Integrated, Time-Variant and Nonvolatile collection of data that support management's decision making process.

Why Data Warehouse Separated from Operational Databases

The following are the reasons why Data Warehouse are kept separate from operational databases:
·         The operational database is constructed for well known tasks and workload such as searching particular records, indexing etc but the data warehouse queries are often complex and it presents the general form of data.
·         Operational databases supports the concurrent processing of multiple transactions.Concurrency control and recovery mechanism are required for operational databases to ensure robustness and consistency of database.
·         Operational database query allow to read, modify operations while the OLAP query need onlyread only access of stored data.
·         Operational database maintain the current data on the other hand data warehouse maintain the historical data.

Data Warehouse Features

The key features of Data Warehouse such as Subject Oriented, Integrated, Nonvolatile and Time-Variant are are discussed below:
·         Subject Oriented - The Data Warehouse is Subject Oriented because it provide us the information around a subject rather the organization's ongoing operations. These subjects can be product, customers, suppliers, sales, revenue etc. The data warehouse does not focus on the ongoing operations rather it focuses on modelling and analysis of data for decision making.
·         Integrated - Data Warehouse is constructed by integration of data from heterogeneous sources such as relational databases, flat files etc. This integration enhance the effective analysis of data.
·         Time-Variant - The Data in Data Warehouse is identified with a particular time period. The data in data warehouse provide information from historical point of view.
·         Non Volatile - Non volatile means that the previous data is not removed when new data is added to it. The data warehouse is kept separate from the operational database therefore frequent changes in operational database is not reflected in data warehouse.
Note: - Data Warehouse does not require transaction processing, recovery and concurrency control because it is physically stored separate from the operational database.

Data Warehouse Applications

As discussed before Data Warehouse helps the business executives in organize, analyse and use their data for decision making. Data Warehouse serves as a soul part of a plan-execute-assess "closed-loop" feedback system for enterprise management. Data Warehouse is widely used in the following fields:
·         financial services
·         Banking Services
·         Consumer goods
·         Retail sectors.
·         Controlled manufacturing

Data Warehouse Types

Information processing, Analytical processing and Data Mining are the three types of data warehouse applications that are discussed below:
·         Information processing - Data Warehouse allow us to process the information stored in it.The information can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts, or graphs.
·         Analytical Processing - Data Warehouse supports analytical processing of the information stored in it.The data can be analysed by means of basic OLAP operations,including slice-and-dice,drill down,drill up, and pivoting.
·         Data Mining - Data Mining supports knowledge discovery by finding the hidden patterns and associations, constructing analytical models, performing classification and prediction.These mining results can be presented using the visualization tools.
SN
Data Warehouse (OLAP)
Operational Database(OLTP)
1
This involves historical processing of information.
This involves day to day processing.
2
OLAP systems are used by knowledge workers such as executive, manager and analyst.
OLTP system are used by clerk, DBA, or database professionals.
3
This is used to analysis the business.
This is used to run the business.
4
It focuses on Information out.
It focuses on Data in.
5
This is based on Star Schema, Snowflake Schema and Fact Constellation Schema.
This is based on Entity Relationship Model.
6
It focuses on Information out.
This is application oriented.
7
This contains historical data.
This contains current data.
8
This provides summarized and consolidated data.
This provide primitive and highly detailed data.
9
This provide summarized and multidimensional view of data.
This provides detailed and flat relational view of data.
10
The number or users are in Hundreds.
The number of users are in thousands.
11
The number of records accessed are in millions.
The number of records accessed are in tens.
12
The database size is from 100GB to TB
The database size is from 100 MB to GB.
13
This are highly flexible.
This provide high performance.

Data Warehousing - Concepts

What is Data Warehousing?

Data Warehousing is the process of constructing and using the data warehouse. The data warehouse is constructed by integrating the data from multiple heterogeneous sources. This data warehouse supports analytical reporting, structured and/or ad hoc queries and decision making. Data Warehousing involves data cleaning, data integration and data consolidations.

Using Data Warehouse Information

There are decision support technologies available which help to utilize the data warehouse. These technologies helps the executives to use the warehouse quickly and effectively. They can gather the data, analyse it and take the decisions based on the information in the warehouse. The information gathered from the warehouse can be used in any of the following domains:
·         Tuning production strategies - The product strategies can be well tuned by repositioning the products and managing product portfolios by comparing the sales quarterly or yearly.
·         Customer Analysis - The customer analysis is done by analyzing the customer's buying preferences, buying time, budget cycles etc.
·         Operations Analysis - Data Warehousing also helps in customer relationship management, making environmental corrections.The Information also allow us to analyse the business operations.

Integrating Heterogeneous Databases

To integrate heterogeneous databases we have the two approaches as follows:
·         Query Driven Approach
·         Update Driven Approach

Query Driven Approach

This is the traditional approach to integrate heterogeneous databases. This approach was used to build wrappers and integrators on the top of multiple heterogeneous databases. These integrators are also known as mediators.

PROCESS OF QUERY DRIVEN APPROACH:

·         when the query is issued to a client side, a metadata dictionary translate the query into the queries appropriate for the individual heterogeneous site involved.
·         Now these queries are mapped and sent to the local query processor.
·         The results from heterogeneous sites are integrated into a global answer set.

DISADVANTAGES

·         The Query Driven Approach needs complex integration and filtering processes.
·         This approach is very inefficient.
·         This approach is very expensive for frequent queries.
·         This approach is also very expensive for queries that requires aggregations.

Update Driven Approach

We are provided with the alternative approach to traditional approach. Today's Data Warehouse system follows update driven approach rather than the traditional approach discussed earlier. In Update driven approach the information from multiple heterogeneous sources is integrated in advance and stored in a warehouse. This information is available for direct querying and analysis.

ADVANTAGES

This approach has the following advantages:
·         This approach provide high performance.
·         The data are copied, processed, integrated, annotated, summarized and restructured in semantic data store in advance.
·         Query processing does not require interface with the processing at local sources.

Data Warehouse Tools and Utilities Functions

The following are the functions of Data Warehouse tools and Utilities:
·         Data Extraction - Data Extraction involves gathering the data from multiple heterogeneous sources.
·         Data Cleaning - Data Cleaning involves finding and correcting the errors in data.
·         Data Transformation - Data Transformation involves converting data from legacy format to warehouse format.
·         Data Loading - Data Loading involves sorting, summarizing, consolidating, checking integrity and building indices and partitions.
·         Refreshing - Refreshing involves updating from data sources to warehouse.

Note: Data Cleaning and Data Transformation are important steps in improving the quality of data and data mining results.

No comments:

Post a Comment