Data Warehousing - Process Managers & Security
Data Warehousing -
Process Managers
Data
Warehouse Load Manager
- This Component performs the operations required to extract and load process.
- The size and complexity of load manager varies between specific solutions from data warehouse to data warehouse.
LOAD MANAGER ARCHITECTURE
The
load manager does the following functions.
- Extract the data from source system.
- Fast Load the extracted data into temporary data store.
- Perform simple transformations into structure similar to the one in the data warehouse.
EXTRACT DATA FROM SOURCE
The
data is extracted from the operational databases or the external information
providers. Gateways is the application programs that are used to extract data.
It is supported by underlying DBMS and allows client program to generate SQL to
be executed at a server. Open Database Connection( ODBC), Java Database
Connection (JDBC), are examples of gateway.
FAST LOAD
·
In order to minimize the total load window the data need to be
loaded into the warehouse in the fastest possible time.
·
The transformations affects the speed of data processing.
·
It is more effective to load the data into relational database
prior to applying transformations and checks.
·
Gateway technology proves to be not suitable, since they tend not
be performant when large data volumes are involved.
SIMPLE TRANSFORMATIONS
While
loading it may be required to perform simple transformations. After this has
been completed we are in position to do the complex checks. Suppose we are
loading the EPOS sales transaction we need to perform the following checks.
·
Strip out all the columns that are not required within the
warehouse.
·
Convert all the values to required data types.
Warehouse Manager
·
Warehouse manager is responsible for the warehouse management
process.
·
The warehouse manager consist of third party system software, C
programs and shell scripts.
·
The size and complexity of warehouse manager varies between
specific solutions.
WAREHOUSE MANAGER ARCHITECTURE
The
warehouse manager includes the following.
·
The Controlling process
·
Stored procedures or C with SQL
·
Backup/Recovery tool
·
SQL Scripts
OPERATIONS PERFORMED BY WAREHOUSE MANAGER
·
Warehouse manager analyses the data to perform consistency and
referential integrity checks.
·
Creates the indexes, business views, partition views against the
base data.
·
Generates the new aggregations and also updates the existing
aggregation
·
Generates the normalizations.
·
Warehouse manager Warehouse manager transforms and merge the
source data into the temporary store into the published data warehouse.
·
Backup the data in the data warehouse.
·
Warehouse Manager archives the data that has reached the end of
its captured life.
Note: Warehouse
Manager also analyses query profiles to determine index and aggregations are
appropriate.
Query Manager
·
Query Manager is responsible for directing the queries to the
suitable tables.
·
By directing the queries to appropriate table the query request
and response process is speed up.
·
Query Manager is responsible for scheduling the execution of the
queries posed by the user.
QUERY MANAGER ARCHITECTURE
Query
Manager includes the following.
·
The query redirection via C tool or RDBMS.
·
Stored procedures.
·
Query Management tool.
·
Query Scheduling via C tool or RDBMS.
·
Query Scheduling via third party Software.
OPERATIONS PERFORMED BY QUERY MANAGER
·
Query Manager direct to the appropriate tables.
·
Query Manager schedule the execution of the queries posed by the
end user.
·
Query Manager stores query profiles to allow the warehouse manager
to determine which indexes and aggregations are appropriate.
Data Warehousing - Security
Introduction
The
objective data warehouse is to allow large amount of data to be easily
accessible by the users. Hence allowing user to extract the information about
the business as a whole. But we know that there could be some security
restrictions applied on the data which can prove an obstacle for accessing the
information. If the analyst has the restricted view of data then it is
impossible to capture a complete picture of the trends within the business.
The
data from each analyst can be summarised and passed onto management where the
different summarise can be created. As the aggregations of summaries cannot be
same as that of aggregation as a whole so It is possible to miss some
information trends in the data unless someone is analysing the data as a whole.
Requirements
Adding
the security will affect the performance of the data warehouse, therefore it is
worth determining the security requirements early as possible. Adding the
security after the data warehouse has gone live, is very difficult.
During
the design phase of data warehouse we should keep in mind that what data
sources may be added later and what would be the impact of adding those data
sources. We should consider the following possibilities during the design phase.
·
Whether the new data sources will require new security and/or
audit restrictions to be implemented?
·
Whether the new users added who have restricted access to data
that is already generally available?
This
situation arises when the future users and the data sources are not well known.
In such a situation we need to use the knowledge of business and the objective
of data warehouse to know likely requirements.
Factor to Consider for Security requirements
The
following are the parts that are affected by the security hence it is worth
consider these factors.
·
User Access
·
Data Load
·
Data Movement
·
Query Generation
USER ACCESS
We
need to classify the data first and then the users by what data they can
access.In other word the users are classified according to the data, they can
access.
Data
Classification
The
following are the two approaches that can be used to classify the data:
·
The data can be classified according to its sensitivity. The
highly sensitive data is classified as highly restricted and less sensitive
data is classified as less restrictive.
·
The data can also be classified according to the job function.
This restriction allows only the specific users to view particular data. In
this we restrict the users to view only that that in which they are interested
and are responsible for.
There
are some issues in the second approach. To understand let's have an example,
suppose you are building the data warehouse for a bank. suppose further that
data being stored in the data warehouse is the transaction data for all the
accounts. The question here is who is allowed to see the transaction data. The
solution lies in classifying the data according to the function.
User
classification
The
following are the approaches that can be used to classify the users.
·
The users can be classified as per the hierarchy of users in an
organisation i.e. users can be classified by department, section, group, and so
on.
·
The user can also be classified according to their role, with
people grouped across departments based on their role.
Classification
on basis of Department
Let's
have an example of a data warehouse where the users are from sales and
marketing department. we can design the security by topdown company view, with
access centered around the different departments. But they could be some
restrictions on users at different level. This structure is shown in the following
diagram.
But if each department
accesses the different data then we should design the security access for each
department separately. This can be achieved by the departmental data marts.
Since these data marts are separated from the data warehouse hence we can
enforce the separate security restrictions on each data mart. This approach is
shown in the following figure.
Classification
on basis of Role
If the data is generally
available to all the departments.The it is worth to follow the role access
hierarchy. In other words if the data is generally accessed by all the departments
the apply the security restrictions as per the role of the user. The role
accesshierarchy is shown in the following figure.
AUDIT REQUIREMENTS
The
auditing is a subset of security. The auditing is a costly activity therefore
it is worth understanding the audit requirements and reason for each audit
requirement. The auditing can cause the heavy overheads on the system. To
complete auditing in time we require the more hardware therefore it is
recommended that where possible, auditing should be switch off. Audit
requirements can be categorized into the following:
·
Connections
·
Disconnections
·
Data access
·
Data change
Note: For
each of the above mentioned categories it is necessary to audit success,
failure or both. From the perspective of security reasons the auditing of
failures are very important. The auditing of failure are important because they
can highlight the unauthorised or fraudulent access.
NETWORK REQUIREMENTS
The
Network security is as important as other securities. We can not ignore the
network security requirement. We need to consider the following issues.
·
Is it necessary to encrypt data before transferring it to the data
warehouse machine?
·
Are there restrictions on which network routes the data can take?
·
These restrictions need to be considered carefully. Following are
the points to remember.
·
The process of encryption and decryption will increase the
overheads.It would require more processing power and processing time.
·
The cost of encryption can be high if the system is already a
loaded system because the encryption is borne by the source system.
DATA MOVEMENT
There
exist potential security implications while moving the data. Suppose we need to
transfer some restricted data as a flat file to be loaded. When the data is
loaded into the data warehouse the following questions are raised?
·
Where is the flat file stored?
·
Who has access to that disk space?
·
If we talk about the backup of these flat files the following
questions are raised?
·
Do you backup encrypted or decrypted versions?
·
Do these backup needs to be made to special tapes that are stored
separately?
·
Who has access to these tapes?
·
Some other form of data movement like query result sets also need
to be considered. The question here are raised when creating the temporary
table are as follows.
·
Where is that temporary table to be held?
·
How do you make such table visible?
We
should avoid the accidental flouting of security restrictions. If a user with
access to the restricted data can generate accessible temporary tables, data
can be made visible to nonauthorized users. We can overcome it by having
separate temporary area for users with access to restricted data.
Documentation
The
audit and security requirements need to be properly documented. This will be
treated as part of justification. This document can contain all the information
gathered on the following.
·
Data classification
·
User classification
·
Network requirements
·
Data movement and storage requirements
·
All auditable actions
Impact of Security on Design
The
security affects the application code and the development timescales. The
Security affects the following.
·
Application development
·
Database design
·
Testing
APPLICATION DEVELOPMENT
The
security affect the overall application development and it also affect the
design of the important components of the data warehouse such as load manager,
warehouse manager and the query manager. The load manager may require checking
code to filter record and place them in different locations. The more
transformation rule may also be required to hide certain data . Also there may
be requirement of extra metadata to handle any extra objects.
To
create and maintain the extra vies the warehouse manager may require extra code
to enforce the security. There may be the requirement of the extra checks coded
into the data warehouse to prevent it from being fooled into moving data into
location where it should not be available. The query manager require the
changes to handle any access restrictions. The query manager will need to be
aware of all extra views and aggregations.
DATABASE DESIGN
The
database layout is also affected because when the security is added there is
increase in number of views and tables. Adding security adds the size to the
database and hence increase the complexity of the database design and
management. it will also add complexity to the backup management and recovery
plan.
TESTING
The
testing of the data warehouse is very complex and a lengthy process. Adding
security to the data warehouse also affect the testing time complexity. It
affects the testing in the following two ways.
·
It will increase the time required for integration and system
testing.
There is added functionality
to be tested which will cause increase in the size of the testing suite.
Comments
Post a Comment