Online
Backups Using dump and NFS
Jamie Castle
I love Unix backup and recovery. Over the years I've jumped through
hoops trying to back up my FreeBSD systems with cpio, tar,
dd, and dump. Unix dump and restore
are my all-time favorite backup and recovery tools. Of all the utilities
I've tried, dump does what I demand of a backup/restore program:
bare metal recovery, easy interactive restores, and off-site backup
capability.
Until last year, I had relied on dump, restore,
and my trusty DLT tape drive to reliably back up and restore my
home network. I have a FreeBSD server that handles NFS and Samba
shares, DNS, serving mp3s, and media conversion. dump had
always done a great job of backing up my file systems without incident.
The problems started when my disk drives started growing beyond
the native capacity of my DLT drive. Even at 35-GB native, I was
no longer able to fit a level 0 dump of my server's file systems
on a single DLT tape. Now I was going to have to either buy a tape
library, or babysit my monthly level 0 backup, ready to change media
when dump detects EOT. Well, as a typical sys admin, this
was not a tenable solution for me.
One night while reading the dump man page, I stumbled across
a feature that had always been there -- dump can write to
a file. Earlier that day I had priced hard-disk drives online, and
they are cheap. ATA disk drives are well under $1/GB. My gears started
to slowly turn. Why don't I replace my slow sequential-access tape
drive with a big random-access ATA disk drive in a removable sled?
I wrote a simple shell script to start testing this out.
I installed an ATA removable drive chassis and slapped a 160-GB
ATA disk drive into a caddy, installed it, and created a 160-GB
partition called /backups. Then I ran a level 0 dump on all of my
partitions (the /mp3 partition was the biggest at 40GB, not much
for an mp3 collection) to the /backups partition. Each file system
was dumped to a file named <month><day><year>_<time>.<partition>.<dump
level>.dmp. To my surprise, the full backup ran, and ran fast!
The level 0 dump of my entire system had completed in about 1/10th
the time it had previously taken using the DLT drive. I remember
just sitting there, wondering if it had really worked. I typed:
$ file <month><day><year>_<time>.<partition>.<dump level>.dmp
08012004_2300.home.L0.dmp: new-fs dump file (little endian), This
dump Sun Aug 1 23:01:37 2004, Previous dump Wed Dec 31 19:00:00
1969, Volume 1, Level zero, type: tape header, Label none,
Filesystem /home, Device /dev/ad0s1d, Host hal.bigrock.com, Flags 3
Well, it sure does look like a dump file. Next, I ran an interactive
restore session:
Verify tape and initialize maps
Tape block size is 32
Dump date: Sun Aug 1 23:01:37 2004
Dumped from: the epoch
Level 0 dump of /home on hal.bigrock.com:/dev/ad0s1d
Label: none
Extract directories from tape
Initialize symbol table.
restore >
I was in business. Now I had to write a script that would manage all
of these dump files to keep me from losing my mind. The script would
first determine the year, month, and day, and create those directories
as sub-directories on /backups. I used the same dump-level scheme
I had been using for tape backups:
1st of month: Level 0
Friday: Level 1
Saturday: Level 2
Sunday: Level 3
Monday: Level 4
Tuesday: Level 5
Wednesday: Level 6
Thursday: Level 7
Satisfied, I finished the script and enabled it in root's crontab.
Everything worked great.
My FreeBSD workstation NFS mounts my home directory, along with
/mp3, and a few other directories from my server. My home directory
gets backed up by the new "online dump" script. But what about the
rest of my system? I still have a lot of work under /usr/local and
/usr/ports, not to mention very custom kernels under /.
My gears started turning again -- why not NFS export /backups
to my workstation, and run the same online_dump script on my workstation
for the local file systems? I copied the dump script over
to my workstation, added a few lines of code to NFS mount /backups,
and enabled it in root's crontab. It also worked like a charm. So
far this method of performing backups has fulfilled my desire of
having a single backup and recovery solution, but what about bare-metal
recovery?
FreeBSD and Solaris installation CD-ROMs have NFS support. If
I lose my system disk in my workstation, I simply replace the old
disk drive, boot into FreeBSD's "Rescue mode" (CD 2 has the live
filesystem for rescues), create partitions and filesystems, NFS
mount /backups, and recover my data. The only catch is to make sure
you have a record of your disk layout handy in order to easily re-create
it. I preserve my disk layout information by copying my systems'
disk labels out to the backup server once per month. Having the
disklabels on the backup server allows me to quickly recover my
disklabel. For more information on recovering disk labels, see the
FreeBSD disklabel man page. By using a removable drive sled for
the backup disk drive, I can pull the drive once per month and take
it off-site (to a friend's house, or bank safe deposit box, etc.)
I work for Global Science and Technologies as a Unix Systems Administrator
working on NASA's Earth Observing System (EOS.) I work for the EOS
Clearinghouse (ECHO) project, an effort to collect and maintain
a clearinghouse for Earth Science metadata. In the past, we've had
some trouble finding a functional, reliable backup scheme that met
our requirements for ease of use and bare-metal recovery.
I decided to implement a similar scheme for performing backups
of our server's system partitions (/ , /usr /var, etc.). Given how
cheap ATA disk drives are, I could employ a SATA RAID unit attached
via SCSI or Fibre Channel to store the dump files. Besides being
more cost-effective than tape, SATA RAID can be built from commodity
hardware and offers faster time-to-byte access than most commercially
available tape libraries. I researched a 4-TB SATA RAID unit, along
with a similar capacity tape library with high-speed LTO2 drives,
and presented my proposed online dump scheme along with my hardware
comparisons to our project engineer. We received the approval to
purchase a low-end SATA RAID unit equipped with two on-board SCSI-3
channels.
We attached the SATA RAID to our backup server (a Sun E220R server
running Solaris 8.) We then created a 1-TB (the maximum file system
size for UFS on Solaris 8 and prior) filesystem called /backups,
and NFS-exported the partition to our private Gigabit Ethernet network.
The online-dump configuration is identical to my system at home.
All of our FreeBSD and Solaris clients have the online-dump.sh script
(Listing 1) locally installed and configured to run as cron jobs.
The cron job start times are staggered to avoid over-taxing the
network and the backup server. We send the standard error and standard
out to and email to the sys admin group for accounting purposes.
Off-site backups are handled by powering down the SATA RAID after
a level 0 dump of all clients has completed, removing the disk drives
and taking them off-site where they are stored in a vault. And because
the SATA RAID firmware writes the RAID volume member-number to each
disk drive's disk label, we don't have to worry about the drives
being inserted into the SATA RAID out of order (although we do index
each drive because we're paranoid.)
There are security implications to consider when using nfs
to dump to a central server. The backup server must export
the NFS share with root permissions so that the root user on the
clients can write dump files to it, and all backup clients must
run the online-dump script as root (since only root has inode-level
access to filesystems.) You definitely will not want to run this
setup on a network that is directly connected to the Internet (i.e.,
no firewalls), and you shouldn't included un-trusted hosts in your
client list (you probably don't want to back up hosts you don't
trust anyway).
These security risks aren't much of an issue on my home network
because I trust every system on my network, and I'm firewalled and
NAT'd, so the world doesn't see my NFS traffic. We faced a more
serious security challenge deploying this system on our production
network. We mitigated the risks involved with using NFS by implementing
a private, non-routed network for the sole purpose of performing
backups, and configuring our NFS server to only share out the backup
partition to specific hosts.
Here's how we secured our online backup system:
1. Set up a private, non-routed network (RFC 1918, preferably
Gigabit Ethernet) for sharing out the /backups partition to backup
clients. Each backup client has a private interface for connecting
to the private network.
2. Configure the NFS server to share the /backups partition to
specific hosts. /etc/exports file should explicitly list hostnames:
/backups -maproot=root holly kryten cat lister
3. Only mount the /backups partition during the backup and be sure
to unmount it once the backup is completed. This is handled by the
online-dump.sh script, and helps prevent accidental deletion of data
by unwitting root users.
4. Create MD5 checksums of the dump files and keep those checksums
on write-once media, such as a CD-R. Even better, include the dump
files in a tripwire or fcheck file integrity database.
5. Use RAID for the backups partition, if possible, to minimize
the chance of data loss due to hardware failure.
6. Store the disks in a secure vault after rotating them out of
the backup cycle, as a backup set that's 30 days old is far better
than no backups at all. We have performed numerous interactive,
full- and bare-metal restores using only the Solaris or FreeBSD
install media, and we are thrilled with how painless restores are
using this scheme. Our average system recovery time has been reduced
from 6 hours to less than 45 minutes. Our next project will be to
further automate the process by writing some scripts to handle archiving
and deleting old dump files, and possibly a daemon that monitors
the dump sessions and reports success or failure to a Web page.
Jamie Castle has been a UNIX systems administrator since 1996.
He's worked as a contractor for NASA since 1999, and has been with
GST, Inc. since 2000. Before that he was a helicopter mechanic on
active duty in the United States Army for six years. He is married
and has two wonderful daughters. |