Article

oct2004.tar

Windows Client Backups with rsync and FreeBSD

Geoff Breach

We've all heard of those new "fast and unreliable" hard disks that have replaced the "slow and unreliable" ones that we used in the past. Well, during this past year, all of the unreliable ones seemed to be happening to me, and so I needed a solution to keep data loss to a minimum. My environment is a little unique -- most of the computers involved are laptops, and some of them never appear on my LAN. Also, I'm not in a position to demand that my customers make backups. Some may choose not to, for their own reasons. My backup solution must let the customer choose if and when to back up.

Generally speaking, the only ways to ensure that regular backups (whether they're your own or someone else's) happen is to make them either automatic or effortless! My customers are quite competent with self-managing versions of their work, so my primary requirement is for a backup system that guards against total loss -- by theft, fire, or failure -- rather than an ability to recover old versions of files.

A solution in this environment has a number of unusual requirements:

Must be client-initiated -- The system must accept backup data from client machines at any time, on no set schedule.
Must be portable -- While Win32 clients comprise the majority, client machines may also be Unix, Unix-like, or Mac/OSX.
Must be easy to operate -- one click. The end customer must not be required to enter passwords or otherwise answer questions in the course of the backup.
Must be efficient -- End customers are not always located in their offices on a fast corporate network. The system must be operable over the corporate network but also via other public networks such as broadband, public wireless, and even dial-up.
Must be secure -- If the system is to be operated over public networks, then it must transfer data and communications in a secure manner.
Return of archived data to end customers must be on their terms, in a format that they can readily recover on their own, without technical assistance, and without special media or equipment.
Provision must be made for the usual requirements of a backup system. Indefinite permanent archives in full and incremental means as required.
Must be inexpensive -- We have no budget for this!

The Solution

My solution uses rsync over an encrypted ssh connection to synchronize a subset of the files on the customer machines (we don't need to archive applications and operating systems, just working files) with a live copy on a FreeBSD server. The server keeps the live copy on a RAID-5 array of disks. The RAID-5 is designed to protect the server copy from the very failures from which we are trying to protect the clients. Customers have a single "Backup Now" icon on their Windows Start Menu, and they initiate backups at will, background the process, then go right back to work.

This isn't a high-performance solution, but it doesn't need to be. The primary write bottleneck tends to be at the client in comparing, compressing, and encrypting the data. The server merely needs to hold a large filesystem, and keep it live.

Storage

In pilot testing, the average disk space requirement was around 1.5 gigabytes per individual customer. Your mileage will vary depending on the nature of your customer's work and the decisions you make about which file types to archive.

In the following example, three 200-Gb disks are placed in a software RAID-5 array using the FreeBSD native Vinum Volume Manager. Vinum requires dedicated disk partitions with filesystem type "vinum". Using RAID-5 will provide on the order of 400 gigabytes of usable disk space.

Setting up vinum

Use the FreeBSD installation and configuration tool to access fdisk and the Disklabel Editor via the Custom -> Partition and Custom -> Label menu options:

lucy# /stand/sysinstall

Allocate to FreeBSD the portion of each physical disk that you want to use with fdisk; remember to use the "W" (Write Changes) if you are not allocating the space as a part of the initial installation. In most cases, three keystrokes, "A" (Use Entire Disk), "W" (Write Changes), and "Q" (Quit) are all that are required for each disk.

Partition the FreeBSD slices using the FreeBSD Disklabel Editor. The following example shows the three disks, each with a 128-Mb swap partition (optional and not required) and the remainder of the disk allocated to dummy mount points. The filesystem type will be changed manually in the next step, use the "T" (Toggle Newfs) option to disable newfs execution and "W" (Write) to write the new disklabels:

                  FreeBSD Disklabel Editor

Disk: ad5       Partition name: ad5s1   Free: 0 blocks (0MB)
Disk: ad6       Partition name: ad6s1   Free: 0 blocks (0MB)
Disk: ad7       Partition name: ad7s1   Free: 0 blocks (0MB)

Part      Mount          Size  Newfs
----      -----          ----  -----
ad5s1b    swap          128MB  SWAP
ad5s1e    /mnt/x0    190651MB  UFS+S N
ad6s1b    swap          128MB  SWAP
ad6s1e    /mnt/x1    190651MB  UFS+S N
ad7s1b    swap          128MB  SWAP
ad7s1e    /mnt/x2    190651MB  UFS+S N

Manually edit the disklabels on each disk to change the partition type to "vinum":

lucy# disklabel -r -e ad5

The disklabel command provides the disk label for editing in your default text editor. Locate the partition you want to use for vinum, usually the very last line in the file:

8 partitions:
#        size   offset    fstype [fsize bsize bps/cpg]
  b:    262144        0      swap              # (Cyl.    0 - 16*)
  c: 390716802        0    unused  0     0     # (Cyl.    0 - 24320*)
  e: 390454658   262144    4.2BSD              # (Cyl.   16*- 24320*)

Note that the "c:" partition represents the whole disk and must not be changed. Edit the filesystem type of the backup partition, in this case "e:", to be "vinum" and save the files. Disklabel automatically re-writes the label to the disk and updates the kernel's in-memory version of the label:

After the allocation using the FreeBSD Disklabel Editor, the three disks all have the following layout:

8 partitions:
#        size   offset    fstype [fsize bsize bps/cpg]
  b:    262144        0      swap              # (Cyl.    0 - 16*)
  c: 390716802        0    unused  0     0     # (Cyl.    0 - 24320*)
  e: 390454658   262144     vinum              # (Cyl.   16*- 24320*)

Note that this example includes a small swap partition at the beginning of each physical disk. This is an artifact from an old habit of mine and is by no means a requirement for this system. In fact, if you plan to experiment with hot-swapping of disks, you will find that the system will probably be more stable without swap partitions on the disks you plan to hot swap.

Vinum requires a configuration file to initialize a new virtual volume. This configuration names the three drives as "d5", "d6", and "d7" and defines a virtual volume in line 4 named "backups0". (This volume will later be presented to the operating system as a device called "/dev/vinum/backups0".) Line 5 specifies a RAID-5 plex with 400-Kb stripe size. Vinum performs best with a stripe size between 256Kb and 512Kb, but powers of 2 tend to cause all of the filesystem's superblocks to be placed on the first disk and so should be avoided. Finally, all of the previously named drives are added to the plex as subdisks, using the whole of each named drive:

drive d5 device /dev/ad5s1e
drive d6 device /dev/ad6s1e
drive d7 device /dev/ad7s1e
volume backups0 setupstate
   plex org raid5 400k
    sd length 0 drive d5
    sd length 0 drive d6
    sd length 0 drive d7

Execute the vinum create command to load the configuration. Vinum responds with a summary of the newly created configuration:

lucy# vinum create -f vinum.backup0.conf
3 drives:
D d5          State: up  Device /dev/ad5s1e    Avail: 0/190651 MB (0%)
D d6          State: up  Device /dev/ad6s1e    Avail: 0/190651 MB (0%)
D d7          State: up  Device /dev/ad7s1e    Avail: 0/190651 MB (0%)

1 volumes:
V backups0    State: up       Plexes:       1 Size: 372 GB

1 plexes:
P backups0.p0  R5 State: up    Subdisks:     3 Size: 372 GB

3 subdisks:
S backups0.p0.s0  State: up    PO:        0  B Size: 186 GB
S backups0.p0.s1  State: up    PO:      400 kB Size: 186 GB
S backups0.p0.s2  State: up    PO:      800 kB Size: 186 GB
lucy#

The "setupstate" keyword in the vinum configuration file causes the volume and its components to be created in an "up" state. Vinum RAID-5 volumes must be formally initialized before use however, so you should start the vinum control program and issue the init instruction. The list command will allow you to see the state of each component of the array:

lucy# vinum
vinum -> init backups0.p0
vinum -> vinum[309]: initializing subdisk /dev/vinum/sd/backups0.p0.s1
vinum[308]: initializing subdisk /dev/vinum/sd/backups0.p0.s0
vinum[310]: initializing subdisk /dev/vinum/sd/backups0.p0.s2
vinum -> list
3 drives:
D d5           State: up  Device /dev/ad5s1e  Avail: 0/190651 MB (0%)
D d6           State: up  Device /dev/ad6s1e  Avail: 0/190651 MB (0%)
D d7           State: up  Device /dev/ad7s1e  Avail: 0/190651 MB (0%)

1 volumes:
V backups0     State: down Plexes: 1 Size: 372 GB

1 plexes:
P backups0.p0  R5 State: initializing   Subdisks: 3 Size: 372 GB

3 subdisks:
S backups0.p0.s0  State: I 15%  PO:   0  B Size: 186 GB
S backups0.p0.s1  State: I 12%  PO: 400 kB Size: 186 GB
S backups0.p0.s2  State: I 12%  PO: 800 kB Size: 186 GB
vinum ->

If you are a coffee drinker, now would be a good time to get some; the initialization of the plex will take quite a while. Once the state of each of the subdisks and the backups0.p0 plex returns to "up", you can treat the new device as a regular filesystem. So, newfs it, mount it, link it to its "public" top-level directory, and add a suitable entry to /etc/fstab. Finally, add a line that reads start_vinum="YES" to /etc/rc.conf to ensure that vinum loads at boot time:

lucy# newfs -v /dev/vinum/backups0
lucy# mount /dev/vinum/backups0 /mnt/backups0
lucy# ln -s /mnt/backups0 /backups
lucy# mount -p
/dev/vinum/backups0      /mnt/backups0   ufs rw  2 2
lucy#

Install rsync

rsync is not included in the base FreeBSD distribution but is, of course, available from the FreeBSD ports collection and as a precompiled package on the release CD image. The following series of commands may be used to install rsync from the FreeBSD 4.10-RELEASE CD-ROM or directly from ftp.freebsd.org:

lucy# mount /cdrom
lucy# cd /cdrom/packages/All
lucy# pkg_add rsync-2.6.1.tgz

Or:

pkg_add ftp://ftp.freebsd.org/pub/FreeBSD/ports/i386/ \
  packages-4-stable/All/rsync-2.6.2_1.tgz
rsync Wrapper Script

If your customers will access the backup server only to make backups, then you might choose to limit their access to a wrapper script that allows them only to execute the commands needed for backup. This sample wrapper is placed in /backups/bin/rsync-wrapper.sh. We use the OpenSSH forced-commands feature to execute the wrapper script later:

#!/bin/sh
/usr/bin/logger -p local0.notice rsync-wrapper $SSH_CONNECTION \
  $SSH_ORIGINAL_COMMAND
if echo $SSH_ORIGINAL_COMMAND|grep -e "^rsync " >/dev/null 2>&1; then
   $SSH_ORIGINAL_COMMAND
elif echo $SSH_ORIGINAL_COMMAND|grep -e "^scp " >/dev/null 2>&1; then
   $SSH_ORIGINAL_COMMAND
else
   /usr/bin/logger -p security.warn rsync-wrapper Denied $SSH_CONNECTION \
     $SSH_ORIGINAL_COMMAND
   echo "Access denied."
fi

Security Considerations

To ensure that customers will never be asked to type a password or otherwise interact with the backup process, this implementation places a passphrase-less private key on the client machine's local hard disk. Then, ssh uses that key for authentication. This is a trade-off between security and convenience, but with a suitably secured backup server, access to the client's private key should provide no more access than an intruder must already have gained to hold the private key. If this calculated risk is one that you are not prepared to take, you should consider protecting the private key with a passphrase, storing the key on a portable device, or even using a different form of authentication.

Configuring OpenSSH

On the server, the OpenSSH configuration is altered to disallow password authentication, operate only the ssh version 2 protocol, and to disable port forwarding to and from clients. You should carefully examine the /etc/ssh/sshd_config file in its entirety to confirm that the settings are suitable for your security requirements. Here is a subset of /etc/ssh/sshd_config:

Protocol 2
PermitRootLogin no
AllowTcpForwarding no
GatewayPorts no
FreeBSD Login Classes

FreeBSD login classes may be used to further restrict access. In this example, settings are appended to /etc/login.conf to limit the path and umask for members of the "backupclients" login class. Further limits (for example, to process count or memory usage) might also be applied in this way. Be aware that FreeBSD users have the option to override some login class settings with a .login_conf file in their home directory. So, if you rely on login classes for security, you must also take steps to prevent unwanted overrides. Use cap_mkdb /etc/login.conf to rebuild the system database from the config file:

backupclients:Backup Clients:\
         :path=/backups/bin:\
         :umask=077:\
         :tc=default:

Creating FreeBSD User Accounts

The quickest and easiest one-line way to add a new user account to FreeBSD is with the "pw" utility. The install.sh script assembles a pw command ready to be pasted onto a command line on the server. It uses the following options:

-n -- Username
-d -- Home directory
-L -- FreeBSD login class, set to backupclients
-g -- Group, also set to backupclients
-c -- Comment, the customer's name
-m -- Instructs pw to create the home directory
-s -- Sets the shell to /backups/bin/rsync-wrapper.sh

Because the backupclients accounts have their path set to /backups/bin, you must place copies of (or links to) rsync, scp, grep, logger, and rsync-wrapper.sh in that directory on the server. Setting the user account's login shell to point to the rsync wrapper script is really just a second level of safety. In practice, the rsync wrapper script is executed because it is a forced command in the authorized_keys2 file, and the login process should never actually consider the shell in the passwd file.

rsync Include and Exclude Files

If configuring vinum was the most time-consuming part of this odyssey, then tinkering with rsync's include and exclude functionality will be the trickiest. In this implementation, I use separate include and exclude files, although rsync is capable of drawing all of its include/exclude information from one file.

When copying in --archive mode, rsync includes all files that it has not been specifically told to exclude. It processes include and exclude rules a bit like a packet-filtering firewall -- searching from the top down and aborting the search at the first match. So, if rsync copies everything that isn't specifically excluded, why bother with an include list? When in --archive mode, --recursive is implied, and rsync applies the include/exclude list recursively to each sub-tree. If it finds an exclude match in a path, it aborts checking for all subdirectories underneath it. If there is a chance that an exclude rule might match a directory containing files you want to keep, then you'd better make sure those files are matched with an include rule first!

Beware the case-sensitive match! rsync does The Right Thing and considers case when comparing and copying files. Windows filesystems preserve case, but are not case sensitive. (This means that if I have a file called "FILENAME.TXT" and I ask Windows "Do you have a file called 'filename.txt'?", it will answer "yes". At the same time, if I create the file as "FiLeNaMe.TxT", then Windows will remember the original case that I specified, it just won't honor it!) If you can't be sure whether the filenames you want to match will be in uppercase or lowercase, then you need to specify both. Obviously, specifying all possible permutations and combinations could get pretty crazy pretty fast, so you need to approach this one with a level head.

In the following example, I specify that I want to keep word processor and spreadsheet files from OpenOffice and Microsoft along with PDFs. Generally, I don't want to archive executables and libraries, because they can be re-installed from original source disks. Many of my customers, however, use an email client that will quite happily soldier on if it is transplanted in its entirety with all of the files in its install directory, so I archive everything in that directory including executables.

Two of the biggest files on any Windows system will be the swap file and the hibernation file. Since they're pretty much completely useless anywhere other than on a running system, there is no point in archiving them. Many pre-installed Windows systems keep a complete copy of the OS install set in the \I386 directory. I can get that on CD too, so I won't be archiving that either. Here is a sample include file:

*.sxw
*.SXW
*.stc
*.STC
*.sxc
*.SXC
*.doc
*.DOC
*.xls
*.XLS
*.pdf
*.PDF
*Eudora*
*Eudora Pro*

And, here is a sample exclude file:

Temporary*
System Volume Information
i386
I386
*.dll
*.DLL
*.exe
*.EXE
PAGEFILE.SYS
hiberfil.sys

A check of the rsync line in my backup batch file will show that I refer to two include files and one exclude file. In this implementation, I use a common include and exclude file for all customers, then a second include file that is unique to each customer, usually empty, in case particular customizations are required.

All three files are stored on the backup server and copied at the beginning of each backup. Thus, I can modify them in the comfort and privacy of my own server and have the clients refer to the latest versions for each new backup run. The backup batch file also creates a text file listing all filenames on the client system, and that file is conveniently delivered to the backup server on every run. While fine-tuning the include and exclude rules, I can compare these file lists to the files that arrive on the server and tweak the rules as required.

Installing Win32 Client Software

In keeping with the very Unix-like flavor of this solution, Cygwin (http://www.cygwin.com/) binaries are used on the Win32 clients to make up the client end of the bargain. There are two ways to achieve this. If you have other uses for a Unix-like environment on your Win32 machines, then you might as well install the whole Cygwin environment. If this backup solution is your only requirement, however, then you may choose to simply install the small subset of the Cygwin distribution that is required to achieve this goal.

Specifically, we require the rsync, ssh, scp, ssh-keygen, and mount commands. If you don't require a full Cygwin installation, then you can make a temporary installation on one machine and pick out the executables and libraries you need. To run a backup from a Win32 client to the FreeBSD server, the following binaries are required on the Win32 machine:

rsync.exe
scp.exe
ssh.exe
cygcrypto-0.9.7.dll
cygminires.dll
cygpopt-0.dll
cygwin1.dll
cygz.dll

Additionally, to use ssh-keygen and mount (only required for installation), simply copy their respective binaries. The cat, mkdir, mv, nice, and rm commands and the z shell (sh.exe) are added so that they may be used in the install and backup scripts. These could be removed after they have been used by the install script if you're concerned about the extra space they use. Here are some additional Cygwin binaries for installation and scripting:

cat.exe
mkdir.exe
mount.exe
mv.exe
nice.exe
rm.exe
sh.exe
ssh-keygen.exe
cygiconv-2.dll
cygintl-1.dll
cygintl-2.dll

The files must be placed somewhere in the Windows path. Either place them in their own directory and modify the PATH environment variable or drop them in an existing location, perhaps C:\WINDOWS\ or C:\WINDOWS\SYSTEM32\.

Setting up the Windows Clients

Before a backup can be initiated, a number of prerequisites must be satisfied:

The Cygwin rsync and scp executables expect to find ssh in the /usr/bin directory, and ssh expects to record the public keys of known hosts in the customer's home directory in /home/<username>/.ssh/known_hosts. Mount points must be created to connect the Unix-style paths to their Win32 equivalents.
The customer requires a public/private key pair to authenticate with the backup server and, of course, the customer's public key must be installed on the server along with an actual user account on the server. The install.sh script delivers a series of commands ready to be pasted onto a server command line.
A script is required to carry out the backup process.

Execute the install script from a Windows command line with the form sh install.sh douglasb "Douglas the Cat". Windows 98 has different ideas about some of the paths used in this script, so it will require a bit of tweaking to run there.

#!/usr/bin/sh
# Simple install script to configure rsync/ssh backups
# on Windows NT hosts...
#
# Usage: install.sh <customer login name> <full customer name>
#

# Set Cygwin mount points
mount -f -s -t "C:\Documents and Settings" /home
mount -f -s -t $SYSTEMROOT /usr/bin

# Create customer's public/private key pair..
cd /usr/bin
mkdir .ssh
ssh-keygen.exe -N "" -q -b 1024 -C "$2" -t rsa -f .ssh/id_rsa

# Change back to the system directory, and insert the
# customer's FreeBSD username into the backup batch file.
mv backup-c.bat backup-c.bat.src
echo set USERNAME=$1 > backup-c.bat
cat backup-c.bat.src >> backup-c.bat
rm backup-c.bat.src

# A Windows Shortcut to the backup batch file on the Start menu
# may be a nice touch. Creating a Windows link file from
# DOS/shell is possible but complex. It's far easier to pre-
# create a shortcut to "%WINDIR/backup-c.bat", and simply move
# it into place.
mv "Backup C Drive.lnk" "$ALLUSERSPROFILE/Start Menu/Backup C Drive.lnk"

# Create command strings for execution on the FreeBSD backup 
# server tocreate the customer account and populate .ssh/authorized_keys2
echo "/usr/sbin/pw useradd -n $1 -d /backups/$1 -L backupclients -g
backupclients -c \"$2\" -m -s /backups/bin/rsync-wrapper.sh" > tempfile.txt
echo mkdir "/backups/$1/.ssh" >> tempfile.txt
echo "echo command=\\\"/backups/bin/remote-rsync.sh\\\" 'cat
.ssh/id_rsa.pub' >/backups/$1/.ssh/authorized_keys2" >> tempfile.txt
echo "/bin/ln -s /backups/rsync-include.txt
/backups/$1/rsync-include.txt" >> tempfile.txt
echo "/bin/ln -s /backups/rsync-exclude.txt
/backups/$1/rsync-exclude.txt" >> tempfile.txt
echo "/usr/bin/touch /backups/$1/rsync-local-include.txt" >> tempfile.txt

# Present the command strings in a text editor for cut/paste
# to the host (this shell can actually execute windows binaries!)
/usr/bin/System32/notepad.exe tempfile.txt

# remove the temporary file.
rm tempfile.txt

The install.sh script prepends command="/backups/bin/rsync-wrapper.sh" to the customer's public key before offering it up for insertion in the authorized_keys2 file. If the customer authenticates by public/private key (and in this implementation, it is the only way a customer can gain access) then OpenSSH will ignore any command line sent by the client and instead execute this forced command. The rsync-wrapper.sh script records the client's command to syslog then confirms that it is either scp or rsync before allowing it to be executed. If the client sends any other command, it is logged to syslog's security facility and rejected.

Finally, here is the code for the backup-c.bat script that is executed by the Start menu shortcut inserted by the install.sh script:

C:
cd %WINDIR%
dir \  /a-d /s /b >all-files.txt
scp -i .ssh/id_rsa %USERNAME%@<my.backup.server>:rsync-include.txt
rsync-include.txt
scp -i .ssh/id_rsa %USERNAME%@<my.backup.server>:rsync-exclude.txt
rsync-exclude.txt
scp -i .ssh/id_rsa %USERNAME%@<my.backup.server>:rsync-local-include.txt
rsync-local-include.txt
nice -n 19 rsync.exe --archive --stats --progress --modify-window=5
--include-from=rsync-local-include.txt --include-from=rsync-include.txt
--exclude-from=rsync-exclude.txt --rsh="ssh -i .ssh/id_rsa"
/cygdrive/c/* %USERNAME%@<my.backup.server>:c/
pause

Network Time

To decide whether to copy a particular file rsync compares the size and the timestamp of the files. If your clients and your server have differing opinions on what the current time is, then you'll find a lot of unnecessary file transfers going on when your customers execute their backup scripts.

Many people are not aware that Windows 2000 ships with a perfectly serviceable NTP client -- it only made it into the GUI in Windows XP. The network time client is installed as a service named "Windows Time", but it does not start automatically by default in Windows 2000. Use the Services control panel (Start -> Run -> services.msc -> OK) to set it to start automatically.

Your backup server will also need a reliable time source. You could simply configure a cron job to run ntpdate every hour or so. If you have five minutes to spare instead of just one, configure xntpd and keep your server properly synchronized to a number of other servers. Be sure to add the line xntpd_enable="YES" to /etc/rc.conf if you do. With xntpd running on the server, your clients can synchronize to it, and they need never disagree on the time. Here's how to configure and start the Windows Time client:

C:\> net time /SETSNTP:ntp.mytimeserver.com
C:\> net start W32Time

Here's a sample ntp.conf for FreeBSD:

driftfile /var/db/ntp.drift
server ntp.atimeserver.com
server ntp.ticktock.com
server ntp.cuckoo.com
server ntp.hourglass.com

Windows 95/98/Me clients that don't ship with their own network time client might use the excellent open source NetTime. NetTime is available from:

http://nettime.sourceforge.net/

and is included on TheOpenCD from:

http://www.theopencd.org/

Even with a nicely synchronized clock, Windows' FAT filesystems cannot be relied upon to record timestamps with less than two seconds of granularity, so it is necessary to run rsync with the --modify-window option set to at least a second or two to avoid repeat copying of files.

Finally, always remember that tradition dictates that before you help yourself to someone else's network time service you should send them a quick email requesting permission. It's the polite thing to do and won't take much of your time!

Offline Backups

Once you have this system in place, making more permanent archives of the data from the comfort of your FreeBSD filesystems will be relatively easy. I've chosen to fulfill my "customer self-restore" goal by using mkisofs and cdrecord to write customers' data to CDs and DVDs. I use gzip to compress each individual file, so the customer is still working with a familiar filesystem, and many Windows-based zip packages happily speak gzip.

Commodity media might be an unattainable luxury in a larger implementation, so a more conventional backup to tape might be more appropriate. Amanda and Bacula are your friends here. Both support a wide array of tape drives and auto-changers.

If you have disk space to burn, rsnapshot might be of interest. Rsnapshot uses hard links to give the impression of multiple full backups, all neat snapshots at regular intervals in time. You'll need enough disk space to hold one full backup, plus changes, but the potential for offering self-restore capabilities to your customers, possibly over Samba shares, is an attractive prospect.

Traps for Young Players

The standard backup-system traps for young players apply here as with any other. Two in particular are important here. Before you put this system into production, you should satisfy yourself that you have good answers to two questions:

1. Does my RAID-5 setup work? In other words, can I replace a failed disk and have the array rebuild itself successfully?

Experiment with this one. Consider making a trial-run, perhaps with a smaller array. Set the partitions to 100Mb instead of 200+Gb to save yourself some time. Build your RAID-5 array, init and newfs it, mount it, and fill it up with data. Once that is done, forcibly fail the array -- perhaps use the atacontrol detach command. (Be careful to only down one disk -- RAID-5 won't help you if you lose more than one.) Or, if you're feeling a little crazy, power down one of your drives.

Vinum list should report that your volume is up and the plex is in a degraded state. You will be able to continue to read from and write to the array, albeit at a slower rate than usual. Replacing a failed disk in a vinum RAID-5 array requires that you prepare another disk with a partition the same size as the original, give it the same name, and bring it back into the array using the vinum start <diskname> command. Vinum will recalculate the data that should be on the disk from parity and bring it back into the array. While FreeBSD does support hot swapping of ATA disks using the atacontrol command, vinum is happier with disks that have been present since boot time.

2. Can I recover my offline and online backups?

This may sound like a silly question, but many people forget. The most elaborate and carefully crafted backup system in the world is useless if you can't recover the data. So test this, too. Back up a client machine, then attempt to restore the backups. Recovering the online backup should be as simple as rsyncing the data back in the opposite direction. Offline backups are often trickier. Did you really keep the data you need? Is the media you chose 5 years ago still readable by current equipment? Has the media degraded to the point where it can no longer be read?

You need to convince yourself that you can comfortably manage your backup system, particularly in the arguably inevitable event of failure. Play devil's advocate and think worst case -- imagine the horror scenarios and have a tested and working plan for getting yourself out of them unscathed. Power failures, hard drive failures, theft, and fires -- plan for them all. There are not many things in life more difficult than explaining to your boss that the backup system you built didn't work because of some minor technical oversight five years ago.

Resources

FreeBSD

Techniques described in this article were implemented on FreeBSD version 4.10-RELEASE, available from the main site and local mirrors everywhere:

http://www.freebsd.org/

rsync

rsync is available in source form from:

http://rsync.samba.org/

It is included in the FreeBSD ports collection (cd to /usr/ports/net/rsync, then make install clean) and in the packages directory on the 4.10-RELEASE CD. I use the rsync.exe binary from the Cygwin distribution for Win32 systems.

Cygwin

The Cygwin distribution can be found at:

http://www.cygwin.com/

Download the Cygwin setup.exe from that site, run it, then follow the bouncing ball. The setup program walks you through choosing a mirror to install from, and choosing which components you need. rsync and OpenSSH aren't defaults; you need to select them, but most of the other tools I have used are part of the base Cygwin install. If you require commercial support, Red Hat will happily sell it to you at:

http://www.redhat.com/software/cygwin/

OpenSSH

OpenSSH is included in the standard FreeBSD distribution and works perfectly well out of the box even without the few tweaks I've mentioned here. Confirm that your /etc/rc.conf file contains the line sshd_enable="YES" to ensure that sshd is started at boot time. OpenSSH source and documentation are available from the OpenBSD folks at:

http://www.openssh.com/

Amanda, Bacula, cdrtools, and rsnapshot

All excellent tools that you might use to make permanent offline backups of your data; they are available from the following sites, respectively:

http://www.amanda.org/
http://www.bacula.org/
http://ftp.berlios.de/pub/cdrecord/
ftp://ftp.berlios.de/pub/cdrecord/

Many are in the FreeBSD ports collection, so save yourself some time and check there first.

Samba

Samba allows Unix and Unix-like systems to offer SMBFS and CIFS file services to Windows (and other Unix) clients. Samba is available from:

http://www.samba.org/

Note also that FreeBSD has support for SMBFS, though you'll need to configure support for it into your kernel.

Geoff Breach, geoff@breach.com.au, is Technical Officer to the School of Management at the University of Technology, Sydney in Australia. He has administered AIX, HP-UX, SunOS, Solaris, BSDI, and FreeBSD in commercial environments and currently balances his time among postgraduate studies in management, research on the application of agent-based systems to supply chain problems, and a twice-daily motorcycle battle in Sydney's peak-hour traffic. Geoff's only dependent child is an 11-month-old kitten, Douglas, named for the late great Douglas Adams.