Sunday, November 2, 2014

Data Warehousing - Testing

Introduction

Testing is very important for data warehouse systems to make them work correctly and efficiently. There are three basic level of testing that are listed below:

  •  Unit Testing
  •  Integration Testing
  • System testing

UNIT TESTING


  • In the Unit Testing each component is separately tested.
  • In this kind of testing each module i.e. procedure, program, SQL Script, Unix shell is tested.
  • This tested is performed by the developer.

INTEGRATION TESTING


  •  In this kind of testing the various modules of the application are brought together and then tested against number of inputs.
  • It is performed to test whether the various components do well after integration.

SYSTEM TESTING


  • In this kind of testing the whole data warehouse application is tested together.
  • The purpose of this testing is to check whether the entire system work correctly together or not.
  • This testing is performed by the testing team.
  • Since the size of the whole data warehouse is very large so it is usually possible to perform minimal system testing before the test plan proper can be enacted.

Test Schedule


  • First of all the Test Schedule is created in process of development of Test Plan.
  • In this we predict the estimated time required for the testing of entire data warehouse system.

DIFFICULTIES IN SCHEDULING THE TESTING


  • There are different methodologies available but none of them is perfect because the data warehouse is very complex and large. Also the data warehouse system is evolving in nature.
  • A simple problem may have large size of query which can take a day or more to complete i.e. the query does not complete in desired time scale.
  • There may be the hardware failure such as losing a disk, or the human error such as accidentally deleting the table or overwriting a large table.
Note: Due to the above mentioned difficulties it is recommended that always double the amount of time you would normally allow for testing.

Testing the backup recovery

This is very important testing that need to be performed. Here is the list of scenarios for which this testing is needed.

  • Media failure.
  • Loss or damage of table space or data file
  • Loss or damage of redo log file.
  • Loss or damage of control file
  • Instance failure.
  • Loss or damage of archive file.
  • Loss or damage of table.
  • Failure during data failure.

Testing Operational Environment

There are number of aspects that need to be tested. These aspects are listed below.

  • Security - A separate security document is required for security testing. This document contain the list of disallowed operations and devising test for each.
  • Scheduler - Scheduling software is required to control the daily operations of data warehouse. This need to be tested during the system testing. The scheduling software require interface with the data warehouse, which will need the scheduler to control the overnight processing and the management of aggregations.
  • Disk Configuration. - The Disk configuration also need to be tested to identify the I/O bottlenecks. The test should be performed with multiple times with different settings.
  • Management Tools. - It is needed to test all the management tools during system testing. Here is the list of tools that need to be tested.
o    Event manager
o    system Manager.
o    Database Manager.
o    Configuration Manager
o    Backup recovery manager.

Testing the Database

There are three set of tests that are listed below:
  • Testing the database manager and monitoring tools. - To test the database manager and the monitoring tools they should be used in the creation, running and management of test database.
  • Testing database features. - Here is the list of features that we have to test:
o    Querying in parallel
o    Create index in parallel
o    Data load in parallel
  • Testing database performance. - Query execution plays a very important role in data warehouse performance measures. There are set of fixed queries that need to be run regularly and they should be tested. To test ad hoc queries one should go through the user requirement document and understand the business completely. Take the time to test the most awkward queries that the business is likely to ask against different index and aggregation strategies.

Testing The Application


  •  All the managers should be integrated correctly and work in order to ensure that the end-to-end load, index, aggregate and queries work as per the expectations.
  • Each function of each manager should work in correct manner.
  • It is also necessary to test the application over a period of time.
  • The week-end and month-end task should also be tested.

Logistic of the Test

There is a question that What you are really testing? The answer to this question is that you are testing a suite of data warehouse application code.
The aim of system test is to test all of the following areas.

  • Scheduling Software
  • Day-to Day operational procedures.
  • Backup recovery strategy.
  • Management and scheduling tools.
  • Overnight processing
  • Query Performance
Note: The most important point is to test the scalability. Failure to do so will leave us a system design that does not work when the system grow.

Data Warehousing - Future Aspects

Following are the future aspects of Data Warehousing.

  • As we have seen that the size of the open database has grown approximately double the magnitude in last few years. This change in magnitude is of greater significance.
  • As the size of the databases grow , the estimates of what constitutes a very large database continues to grow.
  • The Hardware and software that are available today do not allow to keep a large amount of data online. For example a Telco call record require 10TB of data to be kept online which is just a size of one month record. If It require to keep record of sales, marketing customer, employee etc. then the size will be more than 100 TB.
  • The record not only contain the textual information but also contain some multimedia data. Multimedia data cannot be easily manipulated as text data. Searching the multimedia data is not an easy task whereas the textual information can be retrieved by the relational software available today.
  •  Apart from size planning, building and running ever-larger data warehouse systems are very complex. As the number of users increases the size of the data warehouse also increases. These users will also require to access to the system.

  • · With growth of internet there is requirement of users to access data online.

No comments:

Post a Comment