Cover V12, I04

Article
Listing 1
Listing 2
Listing 3
Listing 4

apr2003.tar

Amanda Backup Enhanced with SolarisTM Snapshots

Julian Briggs

This article discusses the development of a system to improve the reliability of backups, using Amanda backup software (see "Configuring Amanda", Sys Admin, April 2002, http://www.samag.com/documents/s=7033/sam0204a/sam0204a.htm), enhanced with Solaris filesystem snapshots (see "Free Snapshots?", Sys Admin, January 2002, http://www.samag.com/documents/s=1824/sam0201j/0201j.htm).

In my group, we manage a medium-sized (about 400 hosts), heterogeneous (Linux, Solaris, Windows), academic network. Four Solaris servers provide central compute power, data storage, and network services. We back up about 400 Gb of data on a 7-day cycle to an LTO 200 tape drive (nominally 200 Gb using hardware compression, as we do) using Amanda (http://www.amanda.org), an excellent, mature, free-software backup utility, with about a 150-Gb holding disk. We suffered from increasing dump errors and restore failures on our large (more than 50 Gb), filesystems used overnight by write-intensive research applications.

Backing up active filesystems is dangerous because inodes may change during the dump. This makes restores unreliable because the restored filesystem may be corrupt, or the restore may fail, typically with this error:

changing volumes on pipe input

abort? [yn]

This problem is exacerbated by ufsdump dumping all the directories first, then (perhaps several hours later on a large filesystem) dumping the last of the files. Meanwhile, some of those files may have been deleted so the directory inode on tape has entries for non-existent files.

The standard recommendation to avoid this is to unmount the filesystem, or to do the backup in single-user mode. Neither method is practical on a large, heavily used server. Many sites simply run backups at quiet times (overnight or weekends) and tolerate occasional dump errors and restore failures.

The Solution

The solution we adopted was to create a snapshot of the filesystem, which preserves a static view that we can then dump reliably. Sun introduced a snapshot utility, fssnap, as a patch in Solaris 8, integrated into Solaris 9. The challenge is to make this work with Amanda. We tried three approaches:

1. Create snapshots of all filesystems, run Amanda, then delete the snapshots.
2. Use an executable automount map to create snapshots on demand.
3. Use a wrapper to ufsdump to create, dump, and delete snapshots.

In this article, I'll describe each of these approaches, addressing both issues arising during implementation and issues remaining unresolved. I'll discuss these as they emerge, so issues in an early approach are relevant to later ones, and I'll show the code for each approach.

Create Snapshots, then Run Amanda

This is a simple approach. We start our Amanda run at midnight, so prior to that, we run a cron job on each server to create snapshots of all filesystems and mount these under /snap/ (e.g., /snap/_var_mail). We found the following issues arising during implementation.

Bugs

There are several known bugs in fssnap. /usr/sbin/fssnap behaves very differently from the man page description (in Solaris 8 and Solaris 9). A workaround is to use /usr/lib/fs/ufs/fssnap (Sun Bug ID: 4446301. 17 Apr 2001).

fssnap fails to create snapshots of / and /var if xntpd (NTP stands for Network Time Protocol) is running. The problems are that xntpd runs in real-time mode and uses / and /var filesystems; fssnap temporarily locks a filesystem with fslock when creating a snapshot; and filesystems used by real-time processes cannot be locked. A workaround is to stop xntpd, run fssnap, then start xntpd (Sun Bug ID: 4699740. 12 Jun 2002). (fssnap can happily delete snapshots while xntpd is running.)

fssnap errors from snapshots in use in /var/adm/messages: ... fssnap: [ID 964769 kern.warning] WARNING: snap_strategy: error calling snap_getchunk, chunk = 611890, offset = 24576, len = 196, resid = 196, error = 5.. "File offset of /dev/fssnap too large from ufsdump read causes panic." (Sun Bug ID: 4769472. 27 Nov 2002.) There is no workaround, but Sun advises, "Only use the block device, not the raw device (/dev/fssnap/* vs /dev/rfssnap/*)." Unfortunately, ufsdump dumps the raw device even if given the block device.

Backing Store

The backing store for a snapshot must be on a separate filesystem, which cannot be "snapshotted", but can be NFS-mounted. We use an automounted directory /share/backingstore/<hostname>. We manually create and share the hostname directory using a netgroup of our servers, servers:

mkdir /share/backingstore/ivy
share -F nfs -o rw=servers,root=servers /export0/backingstore
We use the fssnap unlink option, which creates a backing store file (e.g., /share/backingstore/ivy/0), opens it, then unlinks it, so the backing store is deleted when the snapshot is deleted.

Mount Points and Path to Snapshot

We create a snapshot of a front filesystem (e.g., /var/mail), which gives a snapshot device (e.g., /dev/fssnap/3), which we mount under /snap/ (e.g., /snap/_var_mail). Amanda lists filesystems to dump, by host, in a file (disklist). We cannot populate disklist with snapshot devices because the snapshot device for a given front filesystem may change. For example, today:

fssnap -o bs=/share/backingstore/ivy,unlink /export0
/dev/fssnap/0

Tomorrow:

fssnap -o bs=/share/backingstore/ivy,unlink /export0
/dev/fssnap/1
We "flatten" the mount point paths by replacing / with _, (e.g., /var/mail by _var_mail), to avoid hierarchical mount point issues (e.g., having to create snapshots of (and mount) / before /var before /var/mail). We then populate disklist with snapshot mount points corresponding to the front filesystems to be dumped:

ivy  /snap/_
ivy  /snap/_export0
ivy  /snap/_var
ivy  /snap/_var_mail
Delete Old Snapshots

Snapshots degrade performance so we really only want them around while dumping, especially with write-intensive filesystems. Every write to the front filesystem entails reading a block (from the front filesystem), writing it (to backing store), then writing the new block (to front filesystem). Furthermore, before creating a snapshot, we check for and delete any existing snapshot for a front filesystem. Otherwise, the create fails and we might back up an old snapshot of the front filesystem.

Implementation

The prototype script, fssnap.sh (Listing 1), creates snapshots of all ufs filesystems on a host (excluding those we never back up) and mounts them. Running fssnap.sh on a host with two snap-able filesystems (we exclude /export/swap):

ivy# df -k
/dev/dsk/c1t0d0s0   4133838 1781118 2311382     44%  /
/dev/dsk/c1t0d0s6   41311843 3397124 37501601    9%  /export
/dev/dsk/c1t0d0s7   16526762 2245857 14115638   14%  /export/swap
creates two snapshots and mounts them:

ivy# df -k
/dev/dsk/c1t0d0s0   4133838 1781118 2311382     44%  /
/dev/dsk/c1t0d0s6   41311843 3397124 37501601    9%  /export
/dev/dsk/c1t0d0s7   16526762 2245857 14115638   14%  /export/swap
/dev/fssnap/1       4133838 1781108 2311392     44%  /snap/_
/dev/fssnap/0       41311843 3397124 37501601    9%  /snap/_export
We delete snapshots on each host after the Amanda run by running fssnap.sh -d. We considered several options for launching this:

  • Ssh from dumphost to dumpclient -- This introduces a security risk because the dumphost runs a command as root on each dumpclient.
  • Run a single cron job on each dumpclient -- But when to run it? We have conflicting requirements. We want to delete snapshots as soon as the Amanda dump run is finished, typically 4-8am, but occasionally much later as Amanda overruns, perhaps until noon. So we must run it late.
Unresolved Issues

When using amrestore, the operator must use a filesystem name of the form /snap/_var_mail, not the expected /var/mail. This implementation works, but snapshots exist for much longer than needed. To avoid this drawback, we next tried using an executable automount map.

Executable automount Map

Here the automounter manages the mounts under /snap/ using an executable, indirect map, auto_fssnap. When we access a directory here (e.g., /snap/_var_mail), the executable map creates a snapshot and the automounter mounts it. Thus, a snapshot is only created when it is needed. However, as with the first approach, we still have difficulty deleting snapshots promptly after use. Several issues arose during implementation.

Amanda runs ufsdump S on each filesystem to estimate the size of the dump. Usually it does this several times to get estimates for several dump levels. This triggers creation and mounting of snapshots early in an Amanda run.

Knowing when we can delete a snapshot is the main difficulty. Some options we have explored are:

  • Use the automounter to delete the snapshot after umounting it. Unfortunately, the automounter does not support this, and executable maps are not run when a filesystem is umounted.
  • Find recently unmounted snapshots by watching automounter (in verbose mode) logs for umounts of snapshot mounts, or using a recurrent cron job and delete them. This fails because Amanda accesses a filesystem by mount point (e.g., /snap/_var_mail (given in disklist) only for the first size estimate). This access triggers the automounter. Thereafter, Amanda directly accesses the raw device associated with that mount point (e.g., /dev/rdsk/c0t0d0s4 for size estimates and dumps). These later Amanda accesses do not trigger the automounter. Thus, we may prematurely delete snapshots before (or while) they are ufsdumped, causing the dump to fail.
  • Watch Amanda logs for ufsdump "DUMP DONE" entries using a cron job. This works but is superseded by the next method.
  • Launch a single, background process (fssnapdel, Listing 2) from the automount map for each snapshot to watch the Amanda log files (every five minutes) and delete the snapshot when the dump is done.
Implementation

We create an executable automount map auto_fssnap (Listing 3) referenced from the NIS auto.master map:

/snap -ro /usr/local/etc/auto_fssnap
Unresolved Issues

If several fssnap commands run concurrently, only one succeeds. (I have logged an RFE with Sun on this.) One of our hosts has 12 filesystems to dump, so occasionally we saw failures as Amanda triggered the automounter to run several instances of auto_fssnap, and hence fssnap, concurrently.

fssnapdel could conceivably delete a snapshot just created by the automount map, before it is mounted. The fssnapdel processes are vulnerable to being killed, in which case, a snapshot may not be deleted.

Use a Wrapper to ufsdump

We also explored the use of a wrapper to ufsdump (Listing 4) to create, dump, and delete a snapshot of the front filesystem. (Early concerns about signal and stream handling turned out to be largely unfounded.) We encountered the following issues during implementation.

We built Amanda (amadmin amandad amgetconf amrecover amverify) with the ufsdump wrapper (amusfsdump), by making global substitutions in Amanda source between configure and make:

./configure ...
perl -pi.bak -e 's!/usr/sbin/ufsdump!/usr/local/etc/amufsdump!g' \
  config/config.h Makefile */Makefile */*.sh
make ...
The wrapper is suid root because fssnap must be run as root, which introduces potential security vulnerabilities. To reduce vulnerabilities, the script does the following:

  • Runs under Solaris. This fixes a generic vulnerability of suid scripts due to a race condition in which a script may change between the time the kernel opens the script to identify which interpreter to run, then reopens the script to interpret it.
  • Is executable only by root and users in group sys:
ls -l amufsdump
-rwsr-x--- 1 root sys 2822 Oct 30 11:01 amufsdump
  • Dies unless it is run by the backup user dumpman.
  • Runs with the lower privileges of the calling user, dumpman, except where it must run as root (calling /etc/init.d/xntpd, fssnap, ufsdump).
  • Uses taint perl to check all input to the script. In regard to environment, it sets a null PATH and sets IFS to a space. It ensures the script is called with four appropriate arguments, thus:
($OPTS, $SIZE, $TAPEDEV, $RAWDEV) =
    ("@ARGV" =~ m!^(\w+) (\d+) (-) (/dev/rdsk/c\d+t\d+d\d+s\d+)$!) or
    die "@ARGV.  Usage eg: amufsdump 0usf 1048576 - /dev/rdsk/c0t0d0s0";
It checks that external commands are referenced by absolute pathnames and checks input from the matches.

  • Avoid creating a snapshot if we are just getting an estimate of dump size, calling amufsdump with the S option :
amufsdump 0Ssf 1048576 -  /dev/rdsk/c0t0d0s3
  • Use locking to avoid running several instances of fssnap together, otherwise all but one will fail.
  • Ensure xntpd is stopped while creating snapshots of / and /var. Earlier approaches started xntpd after a snapshot, now we only restart it if it was already running.
  • Delete the snapshot immediately after the dump is done.
  • Ensure that ufsdump records the front filesystem raw device (/dev/rdsk/c0t0d0s0) in /etc/dumpdates, rather than the snapshot device (e.g., /dev/fssnap/3). We use the flag N to ufsdump (a feature introduced in Solaris 8) to specify the device to record in /etc/dumpdates. amandad calls amufsdump, as follows:
amufsdump 0usf 1048576 - /dev/rdsk/c0t0d0s0
amufsdump calls ufsdump:

ufsdump N0usf /dev/rdsk/c0t0d0s0 1048576 - /dev/fssnap/0
fssnap ignores kill -15, but a kill -9, which it cannot ignore, hangs the operating system with locked / and /var. amufsdump terminates on kill -15, leaving fssnap to complete. Killing amufsdump with kill -9, which cannot be trapped, passes this kill to fssnap with the risk of hanging the system. /etc/init.d/xntpd needs /usr/bin in its PATH to call sleep.

Overall Evaluation

Using Amanda with a ufsdump wrapper to create, dump, and delete snapshots works excellently. Dumps written by amufsdump-enhanced Amanda cannot be read by vanilla Amanda, and vice versa, because amdump encodes the UNIX dump program used (ufsdump, amufsdump, or tar) in the header of the dumpfile on tape). Both amrestore and amrecover look for this and "error" if they do not find one they recognize. This could be fixed if support for snapshots were integrated into Amanda. The performance impact is low. The snapshot-backing store of our most write-intensive filesystem grows to only about 1 Gb during a full dump (about three hours). This represents reading, writing over NFS, and writing locally 1-Gb data, a small overhead on a 50-Gb dump. To see the size of the backing store, run fssnap -i -o backing-store-len.

We now have a very reliable backup system. We dump and restore without errors. We found no scalability issues in integrating snapshots into Amanda. The system introduces three new potential vulnerabilities:

1. The suid, taint perl script amufsdump.
2. The snapshot devices themselves. fssnap creates these with permissions:

brw-r----- 1 root sys 199, 0 Oct 9 15:05 /devices/pseudo/fssnap@0:0
Thus, they are no more vulnerable than their corresponding raw devices.

3. The snapshot-backing store is NFS shared, and read-write with root access to each dump client. fssnap unlinks this file immediately after creating it, which reduces the risk of cracking. For tighter security, you can modify amufsdump to use local backing store, which requires a dedicated local file system.

The enhanced system is transparent to the operator. However, if an amufsdump process dies, it may leave an unwanted snapshot.

To maintain confidence in our backups, we evaluated several methods for verifying them. We run amverify after each dump. This lists the contents of each dump file on tape and gives some confidence about the readability of the tape. It takes several hours to do this, and it doesn't identify bad dumps files (written before we introduced snapshots). We prefer to restore a full or partial dump regularly and find that this works well and is often covered by the steady trickle of requests from users to recover lost files.

Conclusion

Amanda enhanced with Solaris fssnap snapshots provides an excellent backup system. The system should port simply to other OSes that support filesystem snapshots. Built-in support for snapshots in Amanda would further enhance transparency. These developments are left as an exercise for the reader.

Julian Briggs is Director of IT, Department of Computer Science, University of Sheffield, UK. He has practiced UNIX systems administration since the mid 1990s and Buddhist meditation (exploring, debugging, and enhancing the OS of his "neck-top" computer) since the early 1980s. He enjoys hill walking and is single with no children.