nov2005.tar

Oracle Automated Recovery

Evan Jenkins and Cameron Fischer

An Oracle database represents a significant investment for any company, and often the data supports mission-critical applications. Unix systems administrators must ensure that data is protected and recoverable. In this article, we will describe a simple yet effective way to harden your Oracle database backup process.

Business Justification

The backup of our Oracle implementation, now about 3 years old, recently experienced a complete overhaul. The backup was converted from a file system-based cold copy of the database to a SAN-based, low-level synchronize and fracture, otherwise known as LUN cloning. A Storage Area Network (SAN) is a Fibre Channel network of attached massive storage devices. LUNs are Logical Unit Numbers, roughly equivalent to one logical disk presented to Unix hosts.

With the new-style backup, the Unix administrators gained about 4 hours of time in their backup window because the clone fracture process of backing up LUNs is nearly instantaneous, especially when compared to the laboriously slow cpio file system-based backup. We'll provide detailed explanation of the SAN-style backup later in this article.

Meanwhile, the IT management had concerns about the SAN backup. How reliable was it? Could it compare to the previous traditional backup's track record? What insurance was there that the new way of copying data was fool proof? These are all perfectly reasonable questions to ask and extra time to ensure management's concerns were addressed was time well spent.

We set out to answer the age-old question that Unix administrators have always faced -- how do you know your backup worked? We had the standard tools and vendor-supplied solutions, such as multi-streaming io to two tape drives and write verification. Although these tools are useful in verifying the final resting place of backups, they all fail to execute the one acid test that answers the above question: proving that the backup can recover the database for each and every backup instance. Our solution was exactly that -- an automated recovery of the production data files. In fact, the solution solved three important problems for the business:

It relieved IT management's concern that the SAN-based backup was at least as reliable as the traditional file-based backup.
It provided an additional fail-safe point for the backup process. Auto-recovery is run before disk images are written to tape. If auto-recovery fails, there is an excellent chance that the disk image is somehow corrupt and will fail to recover the production database. In this case, the DBAs can be notified and correct the problem before it becomes much more serious.
It allowed the Unix systems administrators to get a decent night's sleep.

These are all excellent points (especially the last one).

Figure 1 shows the mirror LUN continuously syncing with the production database during the daily online window. Figure 2 shows the beginning of the backup window when the database is shut down and the mirror LUN is fractured, thus providing an exact copy of the production data files. Figure 3 shows that auto-recovery begins by creating a snapshot LUN of the mirror LUN. This third copy of the data files is mounted on an alternate server, and a temporary database is brought online, thus validating the copy of data files.

For the auto-recovery mechanism to work, we needed a convenient and reliable way to present copies of LUNs to a host that could initialize a database instance. Our idea for auto-recovery was inspired by the impressive features of a typical network storage device and its cloning capabilities. Using this technique is often referred to as a Business Continuance Volume (BCV).

SAN-Based BCV Framework

In a conventional SAN configuration, a server is attached to a SAN, either directly or via Fibre Channel switches. Host Bus Adapters (HBAs) connect the host to the Fibre Channel network, or SAN. The SAN contains many logical devices that share a single Fibre Channel bus. LUNs are used to differentiate each unique device.

In a SAN, disks are contained in RAID groups, which are simply a collection of disks. From the collection of disks defined as RAID groups, logical disks are carved from the total disk space allocated to the RAID group. During the creation of a RAID group and LUN, the RAID type and specific configuration items are defined.

Once a LUN has been assigned to a host, and the appropriate host configuration is complete, the LUN is used in the same manner as local SCSI disk. The LUNs are set up as typical Unix mount points that can be used for the Oracle database.

Creating a cold backup of a several-hundred gigabyte database can be time consuming and cause extended database downtime, resulting in lost productivity. LUN cloning is a feature of a SAN that creates a block-for-block copy of a LUN. A clone requires the creation of a mirror LUN that will hold the copy image. It must be exactly the same block size as the original LUN (known as the source LUN).

The BCV Process

In the first step of the BCV process, the source LUN is synchronized with the mirror LUN. To speed up synchronization time, only the changed blocks since the last synchronization are copied to the mirror. These changes are applied in the background without end-user impact. Before the database shutdown, the source and mirror LUNs contain exactly the same data.

The second step of the BCV process is to perform an orderly shutdown of the database, ensuring it is in a consistent state.

The third step is to disconnect or fracture the source LUN from the mirror LUN. Fracturing allows each LUN to operate independently. Any blocks written to the source LUN will not be written to the mirror at this point. However, all updated blocks on the source have their block location recorded in the Clone Private LUN (CPL), which is the feature used to speed up the synchronization in step one. Once the fracture of the LUNs is complete, the production database is free to restart. However, it is prudent to check the backup before restarting the database, because the backup is now located on the mirror LUNs. The total backup time is 14 minutes.

The fourth step in the BCV process is to make the mirror LUNs visible on a host so they can be manipulated for the auto-recovery test and copied to tape. The SAN offers another feature called snapshotting. A snapshot is a representation of the mirror LUNS constructed from a private area in the SAN. This third copy will be used for the recovery operation, and it is safe to write to the snapshots; the mirror LUNs will remain in a pristine state.

The snapshot LUNs are assigned, configured, imported, checked, and mounted on an alternate host. This is achieved with a Command Line Interpreter (CLI) in the SAN service processor and a series of custom scripts. The alternate host now contains a copy of the database as it was when the fracture occurred. From this point forward, the mirror LUNs are mounted on the backup server in a read-only file system, awaiting the go-ahead to write to tape.

We extended our BCV snapshot scripts so they detect and report errors and accept parameters. Now they can create a snapshot of any LUN and present it to any host. Unix SSH plays an important role here, and what was once a modest BCV series of scripts has become a BCV-distributed framework.

Auto-Recovery

To perform the recovery operation, three basic ingredients are required:

1. Cut the Oracle control file. A valid copy of the control file from the production database is required. The control file must be obtained before the system is taken down for backup because the Oracle control file contains essential information about the database file layout. It must match the database image you are attempting to recover, or the database will refuse to mount and open. The control file is notoriously picky about bringing up the database. For example, data files are all synchronized with an internal header field. All data files must have exactly the same value in the timestamp field or the recovery will fail.

2. You need a mounted copy of the source LUNs on the server where the recovery operation will take place. The Oracle owner must have read-write access to the data files. Altering our database image is fine because our image is the disposable "third" copy. Our original copy (production) is still down, and our mirror copy is waiting to be written to tape.

3. An installation of the same version database is required on the alternate server to perform the database recovery. Access to the Oracle binaries and associated toolset is used to perform the recovery operation (in our case, Oracle 9.2.0.3). Another technique is to include the Oracle binaries in the production instance snapshot and use those instead.

An easily overlooked feature of our solution is the fact that we used some of the oldest and well-known tools that come with Unix. If I trawl through the scripts we wrote, I see nothing more than the use of ssh, scp, ksh, sh, sed, uniq, grep, and sqlplus (the command-line interface into the Oracle database). The use of these tools means that future maintainers will have no problem reading our scripts and modifying as necessary, and no third-party vendors can render our work obsolete. It also reflects one of the main reasons why Unix has fared so well in the IT industry over the years -- small scripts are pieced together using small yet powerful operating system components.

These components (like sed, the Unix stream editor) are themselves built from the judicious use of operating system calls in a standard way. This analogy holds true from the outside command interface right through to the kernel core. Unix is literally built from itself from the ground up.

Let's go through the process so we can illustrate some of the details of auto-recovery beginning with the basic ingredients as mentioned above.

To get the control file from the running database, we create a file containing a single sqlplus command to tell the database to dump the control file to disk:

alter database backup controlfile to trace;

The command leaves an Oracle trace file in the Oracle-defined dump destination area. We use the secure shell copy command scp to copy the file over to our alternate server for later use.

Now we can perform the recovery operation. To begin, we manipulate the control file so we can start a new Oracle instance. We wrote a file containing some sed commands:

1    s/^#/--/1
2    s/^CREATE CONTROLFILE REUSE DATABASE "PROD1" RESETLOGS  \
     ARCHIVELOG/CREATE CONTROLFILE REUSE SET DATABASE \"REC1\" \
     RESETLOGS  NOARCHIVELOG/
3    s/\/u03\//\/u03prod\//g
4    s/\/u05\//\/u05prod\//g
5    s/\/u06\//\/u06prod\//g
6    s/\/u07\//\/u07prod\//g
7    s/\/u08\//\/u08prod\//g
8    /^VARIABLE RECNO NUMBER;/d
9    /^EXECUTE :RECNO :=SYS.DBMS_BACKUP_RESTORE.SETCONFIG \
     ('CONTROLFILE AUTOBACKUP','ON');/d
10   /^EXECUTE :RECNO := \
     SYS.DBMS_BACKUP_RESTORE.SETCONFIG('BACKUP OPTIMIZATION','ON');/d
11   /^RECOVER DATABASE USING BACKUP CONTROLFILE/d

Line 1 removes all the comments from the file. Line 2 will seek the line beginning with:

CREATE CONTROLFILE REUSE DATABASE "PROD1" RESETLOGS  ARCHIVELOG

and replace it with:

CREATE CONTROLFILE REUSE SET DATABASE "REC1" RESETLOGS    NOARCHIVELOG

This effectively renames the new Oracle instance to REC1. Lines 3 to 7 replace references of the production mount points to the snapshot mount points. Lines 8 to 11 remove references to Oracle backup utilities that are configured on the production instance. We apply the sed commands thusly:

sed -f controlFile.sed controlFile.trc > createREC1.sql

Next, we create an Oracle initialization file using the production initialization file, which was transferred with the copies of production data files. We can find the file in the $ORACLE_HOME/dbs location. It is usually named initSID.ora, where "SID" is the Oracle instance name. We apply another sed script:

1    s/^db_name = PROD1/db_name = REC1/1
2    s/\/u01\//\/u01prod\//g
3    s/\/u03\//\/u03prod\//g
4    s/\/u05\//\/u05prod\//g
5    s/\/u06\//\/u06prod\//g
6    s/\/u07\//\/u07prod\//g
7    s/\/u08\//\/u08prod\//g
8    /^LOG_ARCHIVE_START=TRUE/d
9    /^LOG_ARCHIVE_FORMAT=%t_%s.arc/d
10   /^LOG_ARCHIVE_DEST=\/oraarch\/PROD1\/arch/d
11   /^RECOVER DATABASE USING BACKUP CONTROLFILE/d

These sed commands alter the file so when Oracle is started it will be content to call itself REC1 instead of PROD1. You can also see how we are referencing the snapshot mount points as above and removing some unnecessary lines from the file. Save the new file as initREC1.ora and leave it in the $ORACLE_HOME/dbs location. Now we are ready to do the recovery:

sqlplus "/ as sysdba" @createREC1.sql > createREC1.log

If everything works, we will have a brand-new Oracle instance on the host called REC1; it has the exact contents of the production system just before it was brought down. The true value, of course, is the act of recovering the database and the logs and errors that may be produced. The most important file produced from the recovery is the Oracle alert log, which can be found in $ORACLE_HOME/bdump/alert_REC1.log. In this file, you find all kinds of interesting information about your new instance, but most importantly you can find out why it failed to initialize. You will also find clues in the file createREC1.log. Process them both with grep like so:

grep ORA createREC1.log | uniq

The use of uniq will take care of repeated errors. The above command provides telling details about why the recovery might have failed; otherwise, it provides a crystal-clear sign that the recovery was successful and the backup has an excellent chance of rescuing the production system if necessary. The entire recovery operation takes fewer than 4 to 5 minutes to execute, but by the time it is finished, there can be no question about the validity of the backup image. We run the auto-recovery as an integrated component of the full system backup every single time it runs.

Backup solutions have a tendency to work fine for the first few months or even years, and then fail at seemingly the most inopportune moment. Auto-recovery is an independent verification run every single time a backup is taken. This can potentially save your company from catastrophic failure and data loss. And Unix systems administrators can get a good night's sleep.

Evan Jenkins, Unix Administrator and IT Consultant, lives in Edmonton, Alberta, Canada with Dianne, Max, and Jack. He can be reached at: evan.jenkins@telus.net.

Cameron Fischer, SAN and Unix Administrator, lives in Edmonton, Alberta, Canada with Heather. He can be reached at: cam@fischer.ca.