Data Warehousing - Backup
Introduction
There
exist large volume of data into the data warehouse and the data warehouse
system is very complex hence it becomes important to have backup of all the
data which is available for the recovery in future as per the requirement. In
this chapter I will discuss the issues on designing backup strategy.
Backup Terminologies
Before
proceeding further we should know some of the backup terminologies discussed
below.
·
Complete backup -
In complete backup the entire database is backed up at the same time. This backup
includes all the database files, control files and journal files.
·
Partial backup - Partial backup is not the
complete backup of database. Partial backup are very useful in large databases
because they allow a strategy whereby various parts of the database are backed
up in a round robin fashion on daybyday basis, so that the whole database is
backed up effectively once a week.
·
Cold backup -
Cold backup is taken while the database is completely shut down. In
multiinstance environment all the instances should be shut down.
·
Hot backup -
The hot backup is take when the database engine is up and running. Hot backup
requirements that need to be considered varies from RDBMS to RDBMS. Hot backups
are extremely useful.
·
Online backup -
It is same as the hot backup.
Hardware Backup
It is
important to decide which hardware to use for the backup.We have to make the
upper bound on the speed at which backup is can be processed. the speed of
processing backup and restore depends not only on the hardware being use rather
it also depends upon the how hardware is connected, bandwidth of the network,
backup software and speed of server's I/O system. Here I will discuss about
some of the hardware choices that are available and their pros and cons. These
choices are as follows.
·
Tape Technology
·
Disk Backups
TAPE TECHNOLOGY
The
tape choice can be categorized into the following.
·
Tape media
·
Standalone tape drives
·
Tape stackers
·
Tape silos
Tape Media
There
exists several varieties of tape media. The some tape media standard are listed
in the table below:
Tape Media
|
Capacity
|
I/O rates
|
DLT
|
40 GB
|
3 MB/s
|
3490e
|
1.6 GB
|
3 MB/s
|
8 mm
|
14 GB
|
1 MB/s
|
Other
factors that need to be considered are following:
·
Reliability of the tape medium.
·
Cost of tape medium per unit.
·
scalability.
·
Cost of upgrades to tape system.
·
Cost of tape medium per unit.
·
Shelf life of tape medium.
Standalone
tape drives
The
tape drives can be connected in the following ways.
·
Direct to the server.
·
As as networkavailable devices.
·
Remotely to other machine.
Issues
of connecting the tape drives
·
Suppose the server is the 48node MPP machine so which node do you
connect the tape drive, how do you spread them over the server nodes to get the
optimal performance with least disruption of the server and least internal I/O
latency?
·
Connecting the tape drive as a network available device require
the network to be up to the job of the huge data transfer rates needed. make sure
that sufficient bandwidth is available during the time you require it.
·
Connecting the tape drives remotely also require the high
bandwidth.
TAPE STACKERS
The
method of loading the multiple tapes into a single tape drive is known as tape
stackers. The stacker dismounts the current tape when it has finished with it
and load the next tape hence only one tape is available data a time to be
accessed.The price and the capabilities may vary but the common ability is that
they can perform unattended backups.
TAPE SILOS
The
tape silos provide the large store capacities.Tape silos can store and manage
the thousands of tapes. The tape silos can integrate the multiple tape drives.
They have the software and hardware to label and store the tapes they store. It
is very common for the silo to be connected remotely over a network or a
dedicated link.We should ensure that the bandwidth of that connection is up to
the job.
Other Technologies
The
technologies other than the tape are mentioned below.
·
Disk Backups
·
Optical jukeboxes
DISK BACKUPS
Methods
of disk backups are listed below.
·
Disk-to-disk backups
·
Mirror breaking
These
methods are used in OLTP system. These methods minimize the database downtime
and maximize the availability.
Disk-to-disk
backups
In
this kind of backup the backup is taken on to disk rather than to tape. Reasons
for doing Disktodisk backups are.
·
Speed of initial backups
·
Speed of restore
Backing
up the data from Disk to disk is much faster than to the tape. However it is
the intermediate step of backup later the data is backed up on the tape. The
other advantage of Disk to disk backups is that it gives you the online copy of
the latest backup.
Mirror
Breaking
The
idea is to have disks mirrored for resilience during the working day. When back
is required one of the mirror sets can be broken out. This technique is variat
of Disktodisk backups.
Note: The
database may need to be shutdown to guarantee the consistency of the backup.
OPTICAL JUKEBOXES
Optical
jukeboxes allow the data to be stored near line. This technique allow large
number of optical disks to be managed in same way as a tape stacker or tape
silo. The drawback of this technique is that it is slow write speed than disks.
But the optical media provide the long life and reliability make them good
choice of medium of archiving.
Software Backups
There
are software tools available which helps in backup process. These software
tools come as a package.These tools not only take backup in fact they
effectively manage and control the backup strategies. There are many software
packages available in the market .Some of them are here listed in the following
table.
Package Name
|
Vendor
|
Networker
|
Legato
|
ADSM
|
IBM
|
Epoch
|
Epoch Systems
|
Omniback II
|
HP
|
Alexandria
|
Sequent
|
CRITERIA FOR CHOOSING SOFTWARE PACKAGES
The
criteria of choosing the best software package is listed below:
·
How scalable is the product as tape drives are added?
·
Does the package have client server option, or must it run on
database server itself?
·
Will it work in cluster and MPP environments?
·
What degree of parallelism is required?
·
What platforms are supported by the package?
·
Does package support easy access to information about tape
contents?
·
Is the package database aware?
·
What tape drive and tape media are supported by package?
Comments
Post a Comment