Sunday, November 2, 2014

Data Warehousing - Process Managers & Security

Data Warehousing - Process Managers

Data Warehouse Load Manager


  • This Component performs the operations required to extract and load process.
  • The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.

LOAD MANAGER ARCHITECTURE

The load manager does the following functions.

  • Extract the data from source system.
  • Fast Load the extracted data into temporary data store.
  • Perform simple transformations into structure similar to the one in the data warehouse.

EXTRACT DATA FROM SOURCE

The data is extracted from the operational databases or the external information providers. Gateways is the application programs that are used to extract data. It is supported by underlying DBMS and allows client program to generate SQL to be executed at a server. Open Database Connection( ODBC), Java Database Connection (JDBC), are examples of gateway.

FAST LOAD

·           In order to minimize the total load window the data need to be loaded into the warehouse in the fastest possible time.
·           The transformations affects the speed of data processing.
·           It is more effective to load the data into relational database prior to applying transformations and checks.
·           Gateway technology proves to be not suitable, since they tend not be performant when large data volumes are involved.

SIMPLE TRANSFORMATIONS

While loading it may be required to perform simple transformations. After this has been completed we are in position to do the complex checks. Suppose we are loading the EPOS sales transaction we need to perform the following checks.
·           Strip out all the columns that are not required within the warehouse.
·           Convert all the values to required data types.

Warehouse Manager

·           Warehouse manager is responsible for the warehouse management process.
·           The warehouse manager consist of third party system software, C programs and shell scripts.
·           The size and complexity of warehouse manager varies between specific solutions.

WAREHOUSE MANAGER ARCHITECTURE

The warehouse manager includes the following.
·           The Controlling process
·           Stored procedures or C with SQL
·           Backup/Recovery tool
·           SQL Scripts

OPERATIONS PERFORMED BY WAREHOUSE MANAGER

·           Warehouse manager analyses the data to perform consistency and referential integrity checks.
·           Creates the indexes, business views, partition views against the base data.
·           Generates the new aggregations and also updates the existing aggregation
·           Generates the normalizations.
·           Warehouse manager Warehouse manager transforms and merge the source data into the temporary store into the published data warehouse.
·           Backup the data in the data warehouse.
·           Warehouse Manager archives the data that has reached the end of its captured life.
Note: Warehouse Manager also analyses query profiles to determine index and aggregations are appropriate.

Query Manager

·           Query Manager is responsible for directing the queries to the suitable tables.
·           By directing the queries to appropriate table the query request and response process is speed up.
·           Query Manager is responsible for scheduling the execution of the queries posed by the user.

QUERY MANAGER ARCHITECTURE

Query Manager includes the following.
·           The query redirection via C tool or RDBMS.
·           Stored procedures.
·           Query Management tool.
·           Query Scheduling via C tool or RDBMS.
·           Query Scheduling via third party Software.

OPERATIONS PERFORMED BY QUERY MANAGER

·           Query Manager direct to the appropriate tables.
·           Query Manager schedule the execution of the queries posed by the end user.
·           Query Manager stores query profiles to allow the warehouse manager to determine which indexes and aggregations are appropriate.

Data Warehousing - Security

Introduction

The objective data warehouse is to allow large amount of data to be easily accessible by the users. Hence allowing user to extract the information about the business as a whole. But we know that there could be some security restrictions applied on the data which can prove an obstacle for accessing the information. If the analyst has the restricted view of data then it is impossible to capture a complete picture of the trends within the business.
The data from each analyst can be summarised and passed onto management where the different summarise can be created. As the aggregations of summaries cannot be same as that of aggregation as a whole so It is possible to miss some information trends in the data unless someone is analysing the data as a whole.

Requirements

Adding the security will affect the performance of the data warehouse, therefore it is worth determining the security requirements early as possible. Adding the security after the data warehouse has gone live, is very difficult.
During the design phase of data warehouse we should keep in mind that what data sources may be added later and what would be the impact of adding those data sources. We should consider the following possibilities during the design phase.
·           Whether the new data sources will require new security and/or audit restrictions to be implemented?
·           Whether the new users added who have restricted access to data that is already generally available?
This situation arises when the future users and the data sources are not well known. In such a situation we need to use the knowledge of business and the objective of data warehouse to know likely requirements.

Factor to Consider for Security requirements

The following are the parts that are affected by the security hence it is worth consider these factors.
·           User Access
·           Data Load
·           Data Movement
·           Query Generation

USER ACCESS

We need to classify the data first and then the users by what data they can access.In other word the users are classified according to the data, they can access.

Data Classification

The following are the two approaches that can be used to classify the data:
·           The data can be classified according to its sensitivity. The highly sensitive data is classified as highly restricted and less sensitive data is classified as less restrictive.
·           The data can also be classified according to the job function. This restriction allows only the specific users to view particular data. In this we restrict the users to view only that that in which they are interested and are responsible for.
There are some issues in the second approach. To understand let's have an example, suppose you are building the data warehouse for a bank. suppose further that data being stored in the data warehouse is the transaction data for all the accounts. The question here is who is allowed to see the transaction data. The solution lies in classifying the data according to the function.

User classification

The following are the approaches that can be used to classify the users.
·           The users can be classified as per the hierarchy of users in an organisation i.e. users can be classified by department, section, group, and so on.
·           The user can also be classified according to their role, with people grouped across departments based on their role.

Classification on basis of Department

Let's have an example of a data warehouse where the users are from sales and marketing department. we can design the security by topdown company view, with access centered around the different departments. But they could be some restrictions on users at different level. This structure is shown in the following diagram.


But if each department accesses the different data then we should design the security access for each department separately. This can be achieved by the departmental data marts. Since these data marts are separated from the data warehouse hence we can enforce the separate security restrictions on each data mart. This approach is shown in the following figure.

Classification on basis of Role

If the data is generally available to all the departments.The it is worth to follow the role access hierarchy. In other words if the data is generally accessed by all the departments the apply the security restrictions as per the role of the user. The role accesshierarchy is shown in the following figure.

AUDIT REQUIREMENTS

The auditing is a subset of security. The auditing is a costly activity therefore it is worth understanding the audit requirements and reason for each audit requirement. The auditing can cause the heavy overheads on the system. To complete auditing in time we require the more hardware therefore it is recommended that where possible, auditing should be switch off. Audit requirements can be categorized into the following:
·           Connections
·           Disconnections
·           Data access
·           Data change
Note: For each of the above mentioned categories it is necessary to audit success, failure or both. From the perspective of security reasons the auditing of failures are very important. The auditing of failure are important because they can highlight the unauthorised or fraudulent access.

NETWORK REQUIREMENTS

The Network security is as important as other securities. We can not ignore the network security requirement. We need to consider the following issues.
·           Is it necessary to encrypt data before transferring it to the data warehouse machine?
·           Are there restrictions on which network routes the data can take?
·           These restrictions need to be considered carefully. Following are the points to remember.
·           The process of encryption and decryption will increase the overheads.It would require more processing power and processing time.
·           The cost of encryption can be high if the system is already a loaded system because the encryption is borne by the source system.

DATA MOVEMENT

There exist potential security implications while moving the data. Suppose we need to transfer some restricted data as a flat file to be loaded. When the data is loaded into the data warehouse the following questions are raised?
·           Where is the flat file stored?
·           Who has access to that disk space?
·           If we talk about the backup of these flat files the following questions are raised?
·           Do you backup encrypted or decrypted versions?
·           Do these backup needs to be made to special tapes that are stored separately?
·           Who has access to these tapes?
·           Some other form of data movement like query result sets also need to be considered. The question here are raised when creating the temporary table are as follows.
·           Where is that temporary table to be held?
·           How do you make such table visible?
We should avoid the accidental flouting of security restrictions. If a user with access to the restricted data can generate accessible temporary tables, data can be made visible to nonauthorized users. We can overcome it by having separate temporary area for users with access to restricted data.

Documentation

The audit and security requirements need to be properly documented. This will be treated as part of justification. This document can contain all the information gathered on the following.
·           Data classification
·           User classification
·           Network requirements
·           Data movement and storage requirements
·           All auditable actions

Impact of Security on Design

The security affects the application code and the development timescales. The Security affects the following.
·           Application development
·           Database design
·           Testing

APPLICATION DEVELOPMENT

The security affect the overall application development and it also affect the design of the important components of the data warehouse such as load manager, warehouse manager and the query manager. The load manager may require checking code to filter record and place them in different locations. The more transformation rule may also be required to hide certain data . Also there may be requirement of extra metadata to handle any extra objects.
To create and maintain the extra vies the warehouse manager may require extra code to enforce the security. There may be the requirement of the extra checks coded into the data warehouse to prevent it from being fooled into moving data into location where it should not be available. The query manager require the changes to handle any access restrictions. The query manager will need to be aware of all extra views and aggregations.

DATABASE DESIGN

The database layout is also affected because when the security is added there is increase in number of views and tables. Adding security adds the size to the database and hence increase the complexity of the database design and management. it will also add complexity to the backup management and recovery plan.

TESTING

The testing of the data warehouse is very complex and a lengthy process. Adding security to the data warehouse also affect the testing time complexity. It affects the testing in the following two ways.
·           It will increase the time required for integration and system testing.
There is added functionality to be tested which will cause increase in the size of the testing suite.

No comments:

Post a Comment