Dirvish
for Disk-to-Disk Backups -- An Open Source Success Story
Keith Lofstrom
What will you do if a company that produces one of your critical
software applications goes out of business? If it's proprietary
software, you may be in big trouble. Open source, on the other hand,
leaves you in control (sometimes, too much control). This article
tells how I ended up managing an open source backup tool, dirvish.
My integrated circuit design consultancy, KLIC, uses Linux and
other open source software for nearly all important tasks. One critical
task is backup; I cannot afford to lose days of work to a disk crash
or a typing error. In 2003, I moved from tape-based backups of individual
machines to network-based disk-to-disk backups. After some experiments
using rdump to swappable hard-disk media, J.W. Schultz told me about
his disk-to-disk backup program called dirvish. Dirvish is a Perl
wrapper around another open source tool, rsync.
Rsync and Dirvish
Rsync copies and maintains images of file systems between computers.
Rsync can be used for mirrors, copies, and backups. With rsync,
files are not just moved and copied, but are segmented, checksummed,
and compared between source and destination machines. Only data
segments with changed checksums or modification dates are moved.
Rsync's --link-dest option permits Unix-style hard links
to files in other directories on the destination machine. Hard-linking
allows successive backup images to share identical data files and
occupy the same disk space.
Hard disk backup storage, and the linkage between backup images,
have other benefits for my consulting business. Some consulting
contracts require that all copies of client data be removed at the
end of the project. This is impossible with tape-based backup; I
cannot remove one client's data and email without wiping all my
backup tapes. With my backup data stored as a regular file system
on disk, all I have to do is write over the client files; everything
hard-linked to those files is written over as well.
Rsync typically uses SSH for secure transport between machines
and sets up multiple data pipelines to reduce latency. Rsync has
been ported to most flavors of Unix, and rsync clients are also
available for Windows and Macintosh. Rsync is the execution engine
for many new backup systems, including dirvish.
Dirvish adds configuration-driven automation to rsync, making
it suitable for cron-driven nightly backups, driven from by a central
backup server. Dirvish permits networks of similar machines to share
portions of the backup images that are identical. Dirvish also has
a sophisticated expiration process for removing selected older images
and recovering backup drive space.
Control of the backup process by the central backup server is
critical. Client-driven backup schemes are possible, but this compromises
security and performance. Rsync is network-intensive, and only the
server can properly sequence client backups to control network loading.
The backup server must be absolutely safe from compromise; since
it contains images of all clients, it represents a single-point
security risk for the entire network. Fortunately, a dirvish backup
server can be configured to perform only the outbound SSH connections
necessary to rsync to the clients, while ignoring all inbound service
requests. This makes the backup machine ("server" may be a misnomer)
difficult to attack.
I use dirvish to perform nightly backups of half a dozen machines
(about 100 GB total data), on-site and remote, in about one hour.
My system data changes infrequently. Given the efficiencies brought
by hard-linking between images, I can store 200 or more nightly
images on a 250-GB backup drive on my main server. With large IDE
hard drives selling for less than 60 cents per gigabyte, I have
greatly reduced the cost and bother of making nightly backups.
A worst-case failure might destroy both the server and the backup
drive in it. I have multiple IDE backup drives in hot-swappable,
removable trays (e.g., see http://www.vipower.com). I then
rotate the drives to a fire-resistant safe each day. By rotating
the backup drives to the safe (which is easy with hot-swap), even
worst-case failures or full security breaches are survivable.
Bare-Metal Recovery
Backup is only half the problem, of course; rebuilding a crashed
hard drive is every sys admin's nightmare. I build backup drives
with a 5-GB bootable Linux partition and a 2-GB swap partition,
slightly reducing the size of the main backup partition. For a bare-metal
server recovery, I boot from the backup drive and use simple scripts
to rebuild the main server hard drive. A client drive can be rebuilt
in a third server drive bay, from a second drive bay on the client,
or on a different machine entirely. Full restores now take five
minutes of configuration work, followed by a couple of hours of
disk-to-disk copying.
Windows and Macintosh OS X drives may be more difficult to restore
from bare metal. These file systems are not fully supported by Linux,
so the restores must be performed natively, on a live Windows or
Macintosh machine, to a second empty disk. This can be done over
the network with rsync, but the process will involve careful sequencing
of scripts at both ends.
The situation is tolerable for Macintosh OS X. There is rsync
support for both UFS/FFS and the older HFS file system. It should
not be difficult to run dirvish and its associated scripts natively
on one machine under OS X, or between OS X machines (although I
have not tried this). If the backup drive is connected with USB2,
then hot-swaps can be performed with USB2 mount and unmount.
On Windows, some files may not be accessible to rsync for read
and write. This is an intentional "feature" of Windows; Microsoft
does not want it to be easy to copy their proprietary code. Microsoft
or other third-party licenses may forbid backing up their software
to another hard disk.
If you are backing up purchased, proprietary programs or information,
you may be legally restricted in what you may back up and how you
may do it. Digital Rights Management hardware and software may add
further complications. Consult your licenses and your lawyers.
These complications are part of the cost of running proprietary
operating systems that use file systems without open specifications.
We can still restore most of user-created Windows files and directories
over the network, and that should be good enough for helping most
Windows users maintain their data integrity day-to-day. If Windows
users want robust, inexpensive, easy-to-maintain, legally unencumbered,
and technologically advanced solutions to their problems, they may
want to consider open source offerings.
Dirvish is a great tool -- it's a little rough around the edges,
and needs better documentation, but it's turned a big backup chore
into a small one. Other users have also found J.W. Schultz's open
source software project very useful. In gratitude, I prepared slide
talks and writeups of my experiences with dirvish and presented
them at user groups and conventions. J.W. was very helpful in critiquing
and improving these presentations.
And Then It All Changed...
The technical editor of Sys Admin magazine, Hal Pomeranz,
attended one of my presentations and asked me to write an article
about dirvish for the October 2004 issue on backups. I wrote excitedly
to J.W. to invite him to co-author the article, which would be a
real boost to his consulting company, Pegasystems Technologies.
Strangely, there was no response. With the due date for the article
looming, I had to tell Hal that no article was possible without
the original author of the code.
All I knew about J.W. was that he was a middle-aged consultant
in the San Francisco bay area, a good Perl coder, and a fan of the
former U.S. space program. I didn't even know his first name, or
his phone number. Even his site registration was anonymous. In the
open source community, this really doesn't matter much, until you
want to find someone.
The original download site for dirvish, http://www.pegasys.ws,
went off the air sometime in May. In June, as a precaution, I registered
the domains "dirvish.org" and "dirvish.com". If J.W. reappeared,
the domains would go to him. After another month without news, I
took an archived version of the site (from September 2003) and activated
the domains on my Web server. I patched in the latest version of
the dirvish Perl code from the Debian archives. Later versions of
pegasys are now beginning to appear at http://www.archive.org.
In July 2004, I was told of a March Sacramento newspaper article,
describing the death of software consultant Jonathan W. Schultz
of Antioch, California, by drowning in Lake Mead. I have since made
contact with Jon's parents. Jon was 42 when he died. He was reclusive,
yet he had Internet friends all over the world. He contributed heavily
to the development of rsync. He was a devout Christian and will
be missed by many.
Jonathan was a natural coder who could think in Perl. I once asked
him if I could annotate the dirvish Perl code with comments. He
replied, "If the code is so littered with comments as to call for
a grep -v to make it readable as code, the comments should
be removed. You should edit the same code that you read." As Jon's
father later explained, "He once told me that when he looked at
the printout, it was like looking at a picture of a person, and
if something was in the wrong place or malformed it would be very
obvious to him. This was a God-given gift that neither of us understood...".
Most programmers do not have that gift, and Perl is a very rich
language, making it more difficult to read than other languages.
J.W.'s death left dirvish users and developers in a quandary; where
do we go with dirvish? How can we improve mission-critical code
that we control but do not fully understand?
Continuing the Work
The custom in the open source community is that if the originator
of a software package disappears or loses interest, another may
take up the code and continue. While I am a mediocre coder, and
a Perl newbie, I have managed technical projects and am better than
most at technical communication. Since management and communication
are dirvish's greatest needs, I took temporary control of the dirvish
project. Perhaps a better leader will emerge in time.
And I am diving deep into Perl. Randal Schwartz' books Learning
Perl and Perl Objects, References, and Modules have been
invaluable to my education, and his regular column in Sys Admin
has been useful as well. The Portland-area Perl community is very
supportive, and when I don't understand something, there are plenty
of people to help.
I set up a wiki at http://www.dirvish.org/wiki, where users
and contributors can add their information, patches, and requests
for enhancements. In just a few months, there have already been
some great contributions. We are accumulating patches for a small
experimental release, while we develop techniques for testing the
code for future general releases. By the time you read this, we
should have an active mailing list.
There are still important management issues to address. The code
needs more comments; ordinary programmers must be able to read it.
We may veer from J.W.'s "no comment" approach to make the working
environment safer. I am writing a design description of the software
for the wiki. The configuration file format is confusing, with some
things becoming Perl scalars and other things becoming lists in
an overly restrictive way; we need to make the configuration task
easier.
Finally, additional tools need to be created or borrowed from
other packages. A more automated restore system is needed, one that
easily adapts to changing partition needs and permits the selection
and mixing of different backup images. Single file restore should
be driven by users, with minimal intervention by systems administrators.
I may make all of these things happen, or none of them. Since
dirvish is an open source software package, others are free to contribute
improvements, bug fixes, or whole new interfaces. If others discover
problems, they are free to discuss them in public and seek help
from members of the dirvish community. Testing results are public;
if something doesn't work, users can learn about the problem quickly.
If I mismanage the project, others are free to fork the code, rename
it, and abandon me to the contemplation of my errors.
Free software is about freedom and allows you to make your own
right (or wrong) decisions or delegate them to the organizations
that you choose. It gives you control. Free software is also about
the benefits of cooperation and contribution. Even if you have a
lot of clout with a proprietary software vendor, you are unlikely
to get the enhancements you need, when you want them, and how you
want them. The most direct way to get what you need in an open source
package is to design and program the enhancements yourself. In the
worst case, the enhancement you offer may get rejected by the maintainer
and the other users; you can still apply your patches to your own
version. A more likely outcome is that some of the other users will
agree with you, and you can test, maintain, and improve your patch
with others.
However, the most likely outcome is that your improvement will
be gratefully accepted by maintainer and users alike and result
in a cascade of improvements on your improvement. The maintainer
will probably be grateful for the contribution, and other users
are likely to have the same needs you do. You will not have to fight
your way through layers of managers and marketers, and you will
not have to fit some vendor's strategic plan. The needs of the contributors
always outweigh the hypothetical needs of "future customers". Best
of all, others will test your code in widely varying situations;
by sharing your code, many mistakes will be found and corrected
by others, before they affect you and your company.
Conclusions
Your contribution does not need to be large; it can be as simple
as a single line of code, another test case, or even a spelling
correction. The main difference between a successful or unsuccessful
contribution is your own understanding of the intent of the software
and how the community already uses it, and your willingness to cooperate
with others. Please help us develop dirvish and rsync. Join us on
the wiki at http://www.dirvish.org/wiki and discuss your
needs, share your experiences, and offer your enhancements.
In the second part of this series, I'll discuss how to install
and configure dirvish. I will set up an example network and install
hardware and software to do dirvish disk-based backups. In the third
part of the series, I will use these backups to do bare-metal restores
of client and server hard drives.
Resources
KLIC -- http://www.kl-ic.com
Dirvish -- http://www.dirvish.org
Dirvish Wiki -- http://www.dirvish.org/wiki
Keith Lofstrom (http://www.keithl.com) owns an integrated
circuit design consultancy in Beaverton, Oregon. His specialty is
mixed-signal and statistical design for deep submicron processes,
as well as design for testability using the IEEE 1149.x standards.
Keith has been using some flavor of Unix since 1980, and although
he admits to a brief flirtation with DOS and Windows, he has seen
the error of his ways. He is currently managing the dirvish backup
program. |