Cover V14, i02

Article

feb2005.tar

Using Dirvish for Disk-to-Disk Backups: Part II

Keith Lofstrom

Dirvish is a disk-to-disk backup program, a Perl wrapper around the rsync file system network copy program. Using Unix-style hard links, dirvish and rsync allow successive backup images to share identical data files and occupy the same disk space, greatly reducing backup time and backup disk usage.

My first article about dirvish in the January issue of Sys Admin (http://www.samag.com/documents/s=9464/sam0501b/) described how I inherited this open source program and where I plan to take it. In this article, I'll describe how I use dirvish for my Linux systems. If you are running Windows or Mac clients, you will need to follow the directions at the rsync site for configuring rsync and ssh on those platforms. However, the setup for the dirvish backup server is almost exactly the same.

The Backup Server

Once your clients are provisioned with rsync and configured for ssh, they can be backed up by dirvish, unmodified and without further configuration. The dirvish software and configuration resides completely on backup server. For your backup server, you will need:

  • A Linux backup server with two (and preferably three) half-height, 5.25-inch drive bays available and a fairly recent kernel. OpenBSD will also work well, but the instructions here require some adaptation.
  • An IDE main hard drive.
  • An IDE backup drive (two or three are recommended, 250 GB recommended).
  • A duplicate IDE main hard drive, for full restore tests.
  • A ViPower IDE UDMA 133 Super Rack (VP-10LSFU-133) for the main drive.
  • A ViPower IDE to USB 2.0 SwapRACK (VP-1028LSF) for the backup drive.
  • Extra ViPower IDE drive trays (VP-15F-100/133).
  • A third rack (IDE or USB2) if you will be using your server for client rebuilds.
  • A USB2 card or internal USB2 connection. Older USB-1 connections are far too slow.
  • Ssh for the server and all clients, set up for private-keyed, password-free communication. The latest ssh is available from: http://www.openssh.org/.
  • Rsync for the server and all clients. Since September 2004, I have been using rsync 2.6.3. The latest version (for Unix, Linux, Windows, and Mac platforms) can be downloaded from: http://samba.anu.edu.au/rsync/.
  • Perl, with modules POSIX, Getopt::Long, Time::ParseDate, and Time::Period. I am using Perl version 5.8.0. The most recent Perl may be obtained at: http://www.perl.org/.

Hardware Setup

The hardware setup is easy. I suggest the ViPower swap system because they are commonly available and work well. A USB2-based swap cage permits hot-swapping of hard drives without a reboot. Direct IDE hotswap was added to the later 2.4.X series of Linux kernels by Alan Cox and removed from the 2.6.X kernels (through 2.6.7) for reasons unknown. Until this reappears in 2.6.X, use the USB2 interface for hot swap.

However, be very careful which USB2 cage you use. Numerous external drive boxes, as well as the SanMax Inclose internal USB2 cage, use the Cypress C7Y68013 chipset, and these have an incompatibility with current Linux kernels that causes them to hang during large data transfers. The kernel code will eventually get fixed, but meanwhile you should use a USB2 cage with known good behavior.

SCSI and SATA drives and cages may also work with hot swap; I have not tried them. If you have experience with this, please let me know at:

http://wiki.dirvish.org/
Parallel IDE drive controllers come in two flavors: the older LBA-28, and the newer LBA-48. LBA-28 is limited to hard drives with less than 228 512-byte sectors, or 137 GB, while LBA-48 permits up to 248 sectors, or 141 million GB. Some external USB2 enclosures will not work with the larger drives needed for backup, and some older motherboards will not either. An LBA-48 IDE drive controller card, such as the Promise TX2-133, will work with the larger drives and run faster as well.

The main drive and the backup drive in the backup server should be in identical removable trays. This permits rapid swapping for restores. I like the version of SuperRack with the slide switch, rather than the key, because it speeds up backup drive swapping. Every second shaved off the backup drive swap makes it more likely to get done.

Partitions

With the hardware set up and the drives installed, format the backup drive with three partitions: a 5-GB bootable partition, a 2-GB swap partition, and the remainder for the backup images. Just about any Linux distro will work in the 5-GB partition, as long as it is reasonably secure. I use Fedora Core 1, with all services except ssh turned off. I might want to access the Web, recompile a program, print something out, or add other features in the future, so a large boot partition allows for a lot of future flexibility. You can either do a from-CD install for the boot partition or copy the images from another machine. The main thing is to make sure that there are suitable X drivers, USB drivers, and Ethernet drivers for all the different hardware setups that the bootable partition might be expected to work with in the future.

The backup partition will be huge and will have enormous numbers of inodes and directories. When it is populated with images, it will take a very long time to fsck. If you are using Linux, you may have the best luck with version 3 of reiserfs (version 4 is still experimental). Reiser makes much more efficient use of space for directories and inodes. However, if you would rather use a traditional file system like ext3, with a fixed inode count, be sure to build the partition with more than the default number of inodes. One inode per 8 KB data is the default, while 1 inode per 4 KB is better. Otherwise, you will run out of inodes long before you have used up your data space. For a Linux ext3 partition, you can set the number of inodes per data byte with the -i 4096 option to mkfs.

Once created and populated, copying the files in a dirvish backup partition to add space or inodes is impractical. While a "dd" or other direct partition copy is reasonably fast, a file-level copy above the virtualization layer can take a week or more, due to all the chasing of hard links. Such file-level copies (such as those performed by tar, dump, cp, rsync, or cpio) are the equivalent of reading and writing every overlaid image, separately and in sequence. Thus, a 250-GB hard disk containing 200 100-GB image sets will read and write 20,000 GB while copying. So, it is critical to create the right kind of backup filesystem the first time.

After you have built the backup drive (and you can always tweak it while booted from your main drive), it is time to prepare for dirvish. Your distribution probably came with an older version of rsync, but it is important to download and install the latest version from the rsync Web site. This will take care of the most recent security problems. You should also make sure your ssh and sshd packages are up to date.

Setting Up the Software

Let's assume we are preparing one backup server named "Mom" and two backup client machines named "Dick" and "Jane". Mom can be almost any flavor of Unix, while Dick and Jane can be Linux, Unix, Windows, or OS X Macintosh -- anything that will run rsync through an ssh connection. For simplicity, we will assume Mom, Dick, and Jane are all Linux machines. Once you have ssh and rsync running on other types of machines, backing them up with dirvish works almost identically.

Mom will have a main drive, /dev/hda, and a backup drive, /dev/sda. For convenience, we will symbolically link /dev/backup to the backup partition on the backup drive, for example /dev/backup -> /dev/sda3. The backup drive will get nightly backup images from Jane and Dick, and Mom as well, at 4 a.m. every night. Two 250-GB backup drives, labeled A250 and B250, will be alternately rotated to a fire-resistant safe.

We have set up basic networking between the machines, and Mom must be able to ping Dick and Jane by name. Note that Mom does not need to be ping-able by Dick and Jane. Our backup server may be more secure if it is firewalled to ignore incoming traffic. However, all machines must have running sshd daemons, even if firewall rules permit only Mom to ssh to herself.

Why does Mom ssh to herself? Rsync is designed to copy between networked machines and uses ssh/sshd for everything. Although a special case could be created for copying between partitions on the same machine, the bottleneck is usually disk speed. The overhead of ssh/sshd is small by comparison.

Preparing ssh

Setting up ssh and sshd is beyond the scope of this article. For key-authenticated (no password) ssh to work, there must be copies of the host keys for Mom, Jane, and Dick in the /root/.ssh/known_hosts file on Mom. These are created the first time you do an ssh from Mom to Jane and Dick. No-password authentication is more difficult; there must be a copy of Mom's root key on Mom, Jane and Dick in /root/.ssh/authorized_keys2. The original key is in Mom's file /root/.ssh/id_rsa.pub or /root/.ssh/id_dsa.pub, generated by the ssh-keygen command.

When ssh and sshd are set up correctly, you should be able to ssh from Mom into any machine (including Mom) without supplying a password.

Preparing Perl

Most distros include Perl and install it as part of the base system. Dirvish is a collection of Perl scripts, and we will use the CPAN archive to download needed modules to the local Perl library:

root@mom# perl -MCPAN -e shell  
cpan> install POSIX
cpan> install Getopt::Long
cpan> install Time::ParseDate
cpan> install Time::Period
cpan> quit
        (much verbiage omitted)
Any or all of these modules may already be installed in our Perl libraries. If so, we will be informed, and the CPAN install will continue. We now have all the Perl modules needed need to operate dirvish.

Installing and Configuring Dirvish

Download the gzipped tar file for dirvish itself, from the dirvish site:

http://www.dirvish.org/
Put it into /tmp. This article assumes dirvish version 1.2, but later versions will be back-compatible. Untar the tar file into /usr/local/lib, then install it:

root@mom#  cd /usr/local/lib
root@mom#  tar -zxvf /tmp/dirvish_1.2.orig.tar.gz
root@mom#  cd Dirvish-1.2
root@mom#  sh install.sh
 perl to use (/usr/bin/perl)
 What installation prefix should be used? () /usr/local
 Directory to install executables? (/usr/local/sbin)
 Directory to install MANPAGES? (/usr/local/man)
 Configuration directory (/evc/dirvish)

 Perl executable to use is /usr/bin/perl
 Dirvish executables to be installed in /usr/local/sbin
 Dirvish manpages to be installed in /usr/local/man
 Dirvish will expect its configuration files in /etc/dirvish

 Is this correct? (no/yes/quit) yes

 Executables created.

 Install executables and manpages? (no/yes) yes

 installing /usr/local/sbin/dirvish
 installing /usr/local/sbin/dirvish-runall
 installing /usr/local/sbin/dirvish-expire
 installing /usr/local/sbin/dirvish-locate
 installing /usr/local/man/man8/dirvish.8
 installing /usr/local/man/man8/dirvish-runall.8
 installing /usr/local/man/man8/dirvish-expire.8
 installing /usr/local/man/man8/dirvish-locate.8
 installing /usr/local/man/man5/dirvish.conf.5

 Installation complete
 Clean installation directory? (no/yes) no
Now we need to configure dirvish and create some cron scripts, etc. Our three machines have the following partitions, which we will back up into separate dirvish "vaults":

Mom:  /boot  /usr  /home  / (containing the rest of the root directories)
Dick: /                     (one main partition)
Jane: /  /home              (two partitions)
We will only back up the normal OS directories on Mom's main disk. The backup directory and the redundant boot directory on the second (backup) hard drive should not be backed up, of course.

Mount /dev/backup to /backup. In the /backup partition, we will create the following directory tree with seven vaults:

  /backup/dirvish/mom/mom-boot
              .../mom/mom-usr
                  .../mom-home
                  .../mom-root
      .../dirvish/dick/dick-root
              .../jane/jane-root
                   .../jane-home
Each directory will contain a "dirvish" directory, and all directories will be permission 700 (root only). The following commands do this:

root@mom# cd /backup/dirvish
root@mom# for i in  boot usr home root
> do
> mkdir -p mom/mom-$i/dirvish
> done
root@mom# mkdir -p dick/dick-root/dirvish
root@mom# mkdir -p jane/jane-root/dirvish
root@mom# mkdir -p jane/jane-home/dirvish
root@mom# chmod -R 700 /backup/  
Note: Do NOT do this chmod after populating with data!

Next, we populate each of those directories with a default.conf file. Here's the contents of /backup/dirvish/jane/jane-root/dirvish/default.conf:

client: jane
tree: /
xdev: true
index: gzip
image-default: %Y-%m%d-%H%M
exclude:
        /proc/
        /mnt/
        isoimage
        *.iso
        /var**/tmp/
        /var/*/*.oldlog
This tells dirvish Jane's Internet address, where to start the backup tree, not to cross mount points, use gzip to store the daily backup indexes, use YYYY-MMDD-HHMM to label the images, and to exclude the proc directory from the image, as well as any subdirectories or files named "isoimage". If we wanted to combine both the / and /home partitions into one backup image, we would permit dirvish to cross mount points with "xdev: false".

The exclude directories are listed one per line on indented, separate lines from the exclude: statement itself. While the "list" format reflects dirvish's Perl heritage, the entries are not Perl regular expressions but rsync exclude patterns. If preceded by a slash, these are matched at the root of the vault. If not, these match the end. So the /proc/ line would match the /proc directory in the root file system or the /home/proc/ directory in the /home file system. The asterisk pattern match is a little strange -- a single asterisk matches any string of characters not containing a /, while the double asterisk matches strings that can contain zero or more /.

So, the string isoimage will match all files named isoimage, while *.iso will match all filenames with an .iso at the end. /var**/tmp/ will match any directory named tmp that is a subdirectory of the rooted directory /var, such as /var/tmp/, /var/foo/tmp, /var/foo/fum/tmp, etc. /var/*/*.oldlog will match and exclude files ending in .oldlog in the directories /var/foo/ and /var/fie/, but not in /var/foo/fum/. There are more complicated features; see the rsync man page.

This command shows dirvish's Perl heritage; we are constructing a "list" instead of a "scalar", and this simple approach to constructing a Perl list is easy to write code for. I hope to make future versions of dirvish more forgiving of configuration style.

Similar default.conf files are constructed for all seven vaults. It is all right to exclude non-existent directories, so we can leave /proc/ and isoimage in all of them if convenient, or we can add other excluded files or directories as needed. However, always test your excludes, and keep the names in mind when you create new files and directories.

The backup disk is complete and ready to roll. Next, we prepare the other big configuration file, /etc/dirvish/master.conf. Here is a typical master configuration file:

bank:
        /backup/dirvish/mom
        /backup/dirvish/dick
        /backup/dirvish/jane
exclude:
        lost+found/
        core
Runall:
        jane-home      03:00
        jane-root      03:00
        dick-root      03:00
        mom-home       03:00
        mom-usr        03:00
        mom-root       03:00
        mom-boot       03:00

expire-default:  never

# keep the Sunday backups forever, the dailies for 3 months
expire-rule:
#       MIN     HR      DOM     MON     DOW     STRFTIME_FMT
        *       *       *       *       *       +3 months
        *       *       *       *       1       never

pre-server:  /usr/local/sbin/dirvish-pre
post-server: /usr/local/sbin/dirvish-post
There are three banks, the same as our three machines, but we can put the vaults into one bank, or many, dividing them as needed to fit on multiple backup disk media.

We can also exclude files and directories from backup in the master.conf file as well; these are added to the per-vault excludes in our default.conf files.

Runall lists the vaults by name, in the order that they are processed, and the second column gives the hour used to timestamp the daily image name. This is not the run time for dirvish but is a reference time a little in advance of it. Vaults should be sequenced so the most important ones are backed up first, and so that the machines needed earliest are finished first. If Jane is your laptop machine, you want backups for Jane finished early, so you can take her on that early-morning flight.

The expire-default specifies what to do if we do not match the expire-rule just below. It is set to "never", because we only expire if the expire rule tells us to. Expire is performed by the /usr/local/sbin/dirvish-expire program, and you may choose not to schedule that in cron, since dirvish-expire is time consuming and disk space is cheaper than compute time.

The expire-rule list shown above is a bit complicated. It tells us to expire all images older than three months, made at any time, except for images made at any time on day 1, Sunday. Again, this is interpreted by the dirvish-expire program only when that is run; dirvish-runall ignores it.

The last two commands, pre-server and post-server, call user scripts on the server, before and after every vault update. There are two similar commands called pre-client and post-client, run for each client. These scripts are called in this order: pre-server, pre-client, rsync, post-client, post-server.

In the following example, dirvish-pre does nothing except return with an exit 0, but in dirvish-post we will run a shell script to gather some useful information:

#!/bin/bash
# /usr/local/sbin/dirvish-post

SFDISK='/sbin/sfdisk -d /dev/hdmain '
DF='/bin/df '
SSH='/usr/bin/ssh'

# variables
# DIRVISH_CLIENT provided from dirvish
# DIRVISH_DEST   provided from dirvish

$SSH $DIRVISH_CLIENT $DF     > $DIRVISH_DEST/../df.out
$SSH $DIRVISH_CLIENT $SFDISK > $DIRVISH_DEST/../sfdisk.out

exit 0  
This saves the output of df and sfdisk along with the daily image, which will help us rebuild the client disk if necessary.

I run dirvish from a master shell script, /usr/local/sbin/dirvish-daily, called in the early morning hours by cron. Here is one way to write it:

#!/bin/bash
# /usr/local/sbin/dirvish-daily
# this is called by /etc/cron.daily/backup

PATH=/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/sbin

/sbin/hdparm  -zb 1 /dev/backup
/bin/mount          /dev/backup  /backup

/usr/local/sbin/dirvish-runall

/bin/umount       /dev/backup
/sbin/hdparm -b 0 /dev/backup

exit 0
The dirvish-daily shell script runs the actual dirvish Perl program dirvish-runall and specifies its runtime environment. The daily script takes the disk online and offline with hdparm to make sure it is ready to be physically swapped during the day. The hdparm command may not be necessary with some kernels and hot-plug configurations. Disconnecting the disk also helps protect it from rogue software and hides it from system crackers.

The previous two scripts have been simplified from my actual configuration. I keep track of every machine that has been successfully backed up, as well as which backup drive was used (A250 or B250), and append that information to a file in /var/log. This helps me determine which disk contains which evening's backups.

If Mom is a Red Hat system, there is a cron script that runs updatedb, which constructs the database used by slocate. You do not want to add all the zillions of files on the backup drive to the slocate database, so you should add /backup to the -e directories:

#!/bin/sh
# /usr/cron.daily/slocate.cron

renice +19 -p $$ >/dev/null 2>&1
/usr/bin/updatedb -f "nfs,smbfs,ncpfs,proc,devpts"  \
                  -e "/tmp,/var/tmp,/usr/tmp,/afs,/net,/backup"
The runtime software is complete. Now we can make the initial backup images. This will take hours (we are copying every single byte on our small network), so we initialize all the vaults from one last script:

#!/bin/bash
# /usr/local/sbin/dirvish-init

DIRV="/usr/local/sbin/dirvish --vault"

$DIRV   mom-root
$DIRV   mom-boot
$DIRV   mom-home
$DIRV   mom-usr
$DIRV   dick-root
$DIRV   jane-root
$DIRV   jane-home

exit 0
Be sure to start this with plenty of time to complete; for 100 GB, this initial rsync copy can take about 8 hours and keep your network very busy. After a dirvish backup, most processes on Mom, Dick, and Jane will be swapped out to disk, so you will notice some temporary delay in your desktop responsiveness.

We are just about done with configuration! We need a script to schedule dirvish in the crontab. If Mom is a Red Hat-style Linux system, we create another script, /etc/cron.daily/backup, which is run at 4 a.m. with the rest of the daily scripts. This file is very simple:

#!/bin/bash
# /etc/cron.daily/backup, run dirvish

/usr/local/sbin/dirvish-daily
Voila! You are done setting up backups. Your first images will appear in the vault-level directories, timestamped with directory names like 2005-0215-0300. In each image, there will be five files and a directory:

   sfdisk.out   # made by dirvish-post, contains partition map
   df.out       # made by dirvish-post, contains partition usage
   log          # a list of every file looked at by dirvish
   index.gz     # a list of all the files that changed
   summary      # describes what dirvish did with this image

   tree         # contains the actual directory tree.
Tree will contain an identical image of the client filesystem, except for the excluded directories and files. One small discrepancy will be the modification dates for the symbolic links; rsync (like cp and tar) cannot modify permissions or modification times for the symbolic links it creates.

Our example used seven vaults; you will probably want to do your first experiments with just one vault backing up one small image, then expand it to your full network, which might use hundreds of vaults. If, as part of your experimentation, you call dirvish-runall twice in one day, it will refuse to write over an identically named image. You can delete the target image manually, however, and it will act just like an expire, allowing you to repeat your experiments.

You can look in the tree directory and find all the files in your original image. Unlike tape, a dump image, or a tar file, these files are honest-to-goodness data files and directories, and you can copy them back out of the image using normal Linux tools. The downside is that you can write over them, delete them (or at least one hard-link to them), or move them, and destroy the integrity of the backup. This is the downside of the hard-linking; you really only have one copy of each file. This is another good reason for the swap-to-the-safe rotation scheme, and you may decide to use three or even four rotating backup drives if you are ultra-careful.

Restore

Ah, but we are not REALLY done, are we? We do not have a backup system until we verify that we can completely restore a disk from our backups. I will discuss restoring files and whole drives from backups in the third and final article on dirvish, in an upcoming issue of Sys Admin. So collect the hardware and the software, get the system set up, try running a few dirvish backups, and keep making tapes or CDs until you learn how to get your drives and files rebuilt!

References

Dirvish -- http://www.dirvish.org

Rsync -- http://rsync.samba.org

Perl CPAN -- http://www.cpan.org

Vipower -- http://www.vipower.com

Keith Lofstrom (http://www.keithl.com) owns an integrated circuit design consultancy in Beaverton, Oregon. His specialty is mixed-signal and statistical design for deep submicron processes, as well as design for testability using the IEEE 1149.x standards. Keith has been using some flavor of Unix since 1980 and although he admits to a brief flirtation with DOS and Windows, he has seen the error of his ways. He is currently managing the dirvish backup program until a better leader comes along.