Restoring the Sun
Chet Gebhards
There have been a number of articles in this magazine and others dealing with backups, but most of these have said little about restoring data. Perhaps it seems a trivial task to drop a tape into a tape drive and issue a few commands to restore the data. Ho hum. But when the system disk crashes at 2:00 a.m. during the most critical period in your company's business cycle, and the CIO (you know, the guy you talked into spending big bucks on a tape library and backup tools) is looking over your shoulder, you had better be able to pull the data out of that fancy backup system in a hurry!
Consider these true stories (the names have been omitted to protect the embarrassed). One site had their backups down to a science. They used a tape silo and sophisticated software to do nightly backups of their Sun servers. And since they had FDDI between the silo and the servers, the backups were very quick. Everything was great until they lost a system disk on one of their servers. No problem, boot from CD-ROM, configure the network interface, and...uh-oh, the Solaris CD-ROM doesn't support FDDI interfaces.
Another site had a single system administrator, who did backups using cron and scripts to a local tape drive. A small system, but critical in the company's business. When that SA got married and went on his honeymoon, he gave one other person the root password, "just in case..." and told him to put new tapes in the drive every night. You guessed it; the system disk crashed the second night of the SA's two-week honeymoon. The SA, of course, had the system configuration all in his head, but he was not checking his voice mail, and he definitely wasn't thinking about UNIX. Nowhere was the system configuration documented on paper. After several hours of hunting around and experimenting, we managed to restore the system to something similar to its previous state.
I hope these stories have made some of you think about your backup procedures and what it would take to restore your system. In the rest of this article, I will explain the steps involved in restoring the Operating System Disk on a Sun Microsystems workstation under both Solaris 2.5.1 and SunOS 4.1.3. Much of the information will apply to other systems as well. For this discussion, I will assume a fairly simple system - a single disk drive holds the OS with no mirroring, the backup tapes are in (ufs) dump format, and the tape drive is local to the system.
Important System Information Assuming you have a good set of dump tapes for your system, what do you need to know in order to restore the system disk in the event of a crash? First, you must know which drive is your system drive. For this article for Solaris 2.X, I will assume that the disk drive is c0t0d0. For SunOS, I assume it is sd0. You should use the appropriate disk for your system in any of the commands presented here. If you do not know which disk is your system disk, check the file /etc/vfstab for Solaris or /etc/fstab for SunOS. Look for the device that is mounted as /. This is your boot disk. For Solaris, the entry will look like this:
/dev/dsk/c0t0d0s0 /dev/rdsk/c0t0d0s0 / ufs 1 no -
Next, you need to know how the disk is sliced or partitioned. One good way to find out is to use the format command to print the partition table. As root, do the following commands:
# format c0t0d0
(You will see the FORMAT MENU)
format> partition
(You will see the PARTITION MENU)
partition> print
(Disk partitioning information will be displayed)
(Copy this information to a file for printing)
partition> quit
format> quit
#
I strongly recommend printing both the contents of /etc/vfstab and the partitioning information from above and storing this information somewhere that can be easily accessed in event of a system failure. Keeping the information online on another system may be an option, but paper copies are better. Partitioning information is crucial to restoring the system.
You should also have a good diagram of what is connected where in your system. Make sure that all cables attached to the system are accurately and plainly marked. This may not be an issue if you administer a SPARCstation 20 with no external peripherals, but larger systems, such as Sun Enterprise servers, can get quite complicated. How long would it take you to reconnect 20 SPARCstorage arrays, two token rings, four Ethernets, and eight tape drives on two SCSI chains if somebody accidentally took apart the wrong system? You may know that your boot drive is c0t0d0s0, but if it is in a SPARCstorage array, which array is it in, and where is that array attached to the system? If you move the connection, it will no longer be c0t0d0s0!
So, you have the necessary information, and you have just replaced the system disk. Now, you need to boot the system from CD-ROM. Since processes differ somewhat from SunOS to Solaris, I will go through both scenarios, first for Solaris then for SunOS.
Restoring Solaris From the boot prompt (the 'OK' prompt) type:
OK boot cdrom -s
When the system has booted up, you should be at a shell prompt. You'll need to partition the disk using format.
# format c0t0d0
You will see the format menu. It is usually not necessary to actually format the disk, since almost all SCSI disks are preformatted at the factory. We simply need to repartition the disk. Select the partition option:
format> partition
You will see the partitioning menu. Enter the following commands, using the printout of your previous partition layout in that machine's logbook as a guide. (See comments about partition sizes later in this article if you don't have such a record.)
partition> 0
Enter partition id tag[root]: <RETURN>
Enter partition permission flags[wm]: <RETURN>
Enter new starting cyl[0]: <RETURN>
Enter partition size[??b, ??c, ??mb, ??gb]: <number of cylinders>
Repeat this process for all the other slices on the disk. Be sure to zero out any unused slices on the disk to avoid problems later. When you have finished partitioning the disk, you must label it.
partition> label
Ready to label disk, continue? Yes
You may now quit format.
partition> quit
format> quit
The next step is to create a new file system on the partitions you have just set up. You do not need to make a file system on the swap partition.
# newfs /dev/rdsk/c0t0d0s0
Note that you use the raw device, /dev/rdsk, for this. Repeat this step for any other slices on the drive, except slice 1. Slice 1 is usually used as swap space and does not need to be restored, nor does it require a file system. Next, we mount the root partition so that we can begin to restore to it.
# mount /dev/dsk/c0t0d0s0 /a
Change directories to /a:
# cd /a
Begin the restore process.
# ufsrestore xvf /dev/rmt/0n
You will see lots of messages from ufsrestore, followed by a list of files that are being extracted from the tape and put on the disk. When ufsrestore has read all the files from tape, it will ask you if you want to set the owner and mode on the files. You should answer "yes" to all these questions. ufsrestore has a number of options. The ones used here have the following meaning:
x = extract all files from the ufsdump
v = verbose mode; list the files as the come from the tape
f = file or device to use as input /dev/rmt/0n = do not rewind the tape.
This assumes that you have several filesystems dumped onto a single tape.
Once you have restored the root file system, continue restoring any other filesystems on the replaced disk. For example, suppose that you had made /var a separate file system on c0t0d0s3, and /var is the next ufsdump file on the tape. Then you would do the following:
# mount /dev/dsk/c0t0d0s3 /a/var
# cd /a/var
# ufsrestore xvf /dev/rmt/0n
Again, you will see a lot of messages from ufsrestore followed by a list of files that are being extracted. Answer "yes" to the questions asked by ufsrestore at the end of the procedure. When you have finished restoring all the disk slices on the replaced disk, unmount the partitions, making sure to start at the leaves of the hierarchy tree.
# umount /a/var
# umount /a
After you have restored all the files to the replaced disk, you must install the bootblock before you can boot the system from the new disk. In order for the system to boot, the boot(1M) program, called ufsboot, must be loaded on the boot disk by the bootblock program. This program must be placed in the boot area of the disk partition that will be booted. This bootblock program is different for each Sun platform. That is, an Ultra 1 will have a different bootblock program than a SPARCstation 20 or an Ultra Enterprise 5000. Copies of the program for the particular system can be found in /usr/ \ platform/<platform-name>/lib/fs/ufs, where <platform-name> can be found using the uname -i command.
To install the bootblock, perform the following commands:
# cd /usr/platform/'uname -i'/lib/fs/ufs
# /usr/sbin/installboot bootblk /dev/rdsk/c0t0d0s0
Note that the back tics around the uname -i command are very important. They will cause the output of uname -i to be substituted in the command. If you use the wrong tics, this command will fail. Once you have completed these steps, you may halt the system and reboot from the new disk. The system should boot up completely and be in the same state as when it was dumped.
Restoring SunOS The steps for restoring a SunOS system disk are the same as those for restoring a Solaris system disk:
- Replace the drive
- Partition the drive
- Create new file system
- Restore the data
- Install the boot block
The specific commands used are, however, a little different. Booting the SunOS (aka Solaris 1.1.1) disk is a two-step process. First, you boot from CD-ROM:
OK boot cdrom
You will be asked to select the boot disk; choose your system disk, sd0. Next, you will be asked if you want to format the system disk. Answer "Yes" and partition the disk according to the partition information you have for the system disk. You do not need to format the disk, only partition it.
Next, you will be asked if you want to install the OS on the swap partition of the newly partitioned disk; answer "Yes". Finally, you will be asked if you want to reboot using the just-installed copy of SunOS; answer "Yes" again, and the system will boot up to a shell prompt. When you see the shell prompt, you will need to create a file system on the drive in question. You do this in nearly the same way you would in Solaris:
# newfs /dev/rsd0a
Next, mount the new file system on /a of the file system:
# mount /dev/sd0a /a
Now, we will restore the tape.
# cd /a
# restore -xvf /dev/rst0n
You will see a message "Creating nodes" and a list of directories on the tape as these directories are created on the disk. Then, you will see a list of the files as they are extracted. Finally, you will see the question "Set owner and mode?" Answer yes to this question. Repeat this process for any other partitions that were on the failed disk. For instance, if /usr was on sd0g, then you would:
# newfs /dev/rsd0g
# mount /dev/sd0g /a/usr
# cd /a/usr
# restore -xvf /dev/rst0n
The boot(8S) program is loaded from disk by bootblock code from the bootblock region of the disk partition. It is necessary for the bootblock program to know the block numbers on the disk that are occupied by the boot program. The installboot program copies the block numbers of the boot program into a table in the bootblock code, then writes the modified bootblock code onto the disk. Since the blocks where the boot program resides will probably change when you restore from tape, you must run installboot to put the correct blocks into the bootblock program.
Use these commands:
# cd /usr/kvm/mdec
# installboot -v /a/boot bootsd /dev/rsd0a
# umount /a/usr
# umount /a
Now, you can reboot your system with the new drive, and the system should be in the same state as when you dumped it.
Restoring with No Configuration Information Now, you've read this far, and you've vowed to get all that information about your systems written down. In the meantime, one of your systems is down and won't reboot because the system disk has failed. Don't panic. You can still get some information from your dump tapes, and with a couple of educated guesses, you can get the system up and running. There are a couple of ways to proceed. If you have another system, and you have some spare disk space, you can start to gather some information while the hardware guy is replacing the disk drive. On the second system, create a directory on a file system that has at least 15 MB free. You probably won't use that much, but a little extra space never hurts. In a crunch, you could probably get by with 5 MB or less. Let's call the directory /hope (because we hope we can get some information out of this). We are going to use the interactive option to ufsrestore (this option is also in restore under SunOS, but I will use the Solaris versions here) to extract the vfstab from the dump tape. Use the following commands:
# cd /hope
# ufsrestore -ivf /dev/rmt/0
Verify volume and initialize maps
More messages concerning block size, date of the dump, etc
ufsrestore > cd etc
ufsrestore > add vfstab
Make node ./etc
ufsrestore > extract
Extract requested files You have not read any volumes yet.
Unless you know which volume your file(s) are on you should start \
with the last volume and work toward the first.
Specify next volume #: 1
extract file ./etc/vfstab
Add links
Set directory mode, owner, and times.
set owner/mode for '.'? [yn] n
ufsrestore > quit
#
You can now use your favorite editor to look at /hope/etc/vfstab to determine which partitions from your system disk were being used for file systems. Unfortunately, this will not give you any information about the size of the partitions.
If you do not have a system with some spare disk space for extracting the vfstab file from the tape, you can wait until you get the disk replaced, then use that disk. Simply boot from CD-ROM as instructed above. You should use the default partition on the disk, then newfs one of the slices, mount it as above, and use the ufsrestore -ivf command to extract /etc/vfstab onto that disk.
Next, we need to determine the size of the slices. If you have some mail or other output from your ufsdump jobs, it will tell you how much space was used for each partition. You can then use that information to estimate the sizes of the various partitions. This will not tell you how much swap space you had. Under SunOS 4.1.x, you need at least as much swap as physical memory, and the rule of thumb is two or three times more swap than physical memory. Under Solaris, you do not need as much swap, especially with the larger systems that can have several gigabytes of memory.
If you do not have this information, you will have to guess. With the myriad of disks out there, it is impossible to make specific recommendations about the size of the slices. However, I can give you some recommended minimums. Under SunOS, the root slice should be at least 32 MB, and /usr should be about 300 MB. Under Solaris, a single root slice should be at least 700 MB. If you separate root and /usr, then root should be about 150 MB and /usr should be about 550 MB. These are ballpark numbers, and may or may not work for your system, depending on what you have installed.
If you have the luxury of time, you can always restore the data from the dump tape to a partition, then see how big it is and partition your disk accordingly. Another tactic under Solaris is to rearrange how things are laid out on the disk. Simply make two or three partitions: one for swap, one for root (/), and optionally one for /var. Make /var about 100 MB, size swap according to the amount of memory in your system, (the more memory you have, the less swap you need), and put the rest in root (/). You can do this even if your original disk was sliced with a separate /usr partition. Boot from CD-ROM as above. After you have sliced the disk, newfs your root and /var partitions, then mount them as follows:
# mount /dev/dsk/c0t0d0s0 /a
# mkdir /a/var
# mount /dev/dsk/c0t0d0s3 /a/var
Next, restore the files:
# cd /a
# ufsrestore -xvf /dev/rmt/0n
# cd /a/usr (the /a/usr directory was created when you \ restored from tape)
# ufsrestore -xvf /dev/rmt/0n
# cd /a/var
# ufsrestore -xvf /dev/rmt0n
After you have restored all the data, proceed with the installboot sequence above. If you do not have the partitioning data and you have to estimate the size of the various partitions, it is probably better to overestimate the size of the file systems and underestimate the size of swap. You can always increase swap size with the mkfile and swapon commands, but it is very difficult to increase the size of a file system.
Rewind Perhaps you haven't had to restore a system disk yet, but sooner or later every sys admin faces this challenge. If you prepare for it and have the right information available, it will be easy. If not, you may have a little more work in store, but you can still do it. I hope this article has given you some idea of what to expect, along with some insight to help you avoid pitfalls.n
About the Author
Chet Gebhards has worked for Sun Microsystems for the past 10 years. He is currently a Software Systems Engineer with SunService. He holds a BSEE from the University of Texas and an MSCS from University of Nebraska at Omaha. The views and opinions expressed in this article are his, and do not necessarily reflect the views, opinions, or policies of Sun Microsystems, Inc or any of its subsidiaries. Chet can be reached by email at: chet.gebhards@central.sun.com.
|