Bogocats:
How Many Are in Your Backup?
James Price
One of the most important tasks a systems administrator performs
is providing backup and recovery of data. Backing up data to tape
is the solution most commonly used by Unix admins. In general, backing
up data to tape is a relatively straightforward, simple operation,
but some of the conventions and the details of the process can be
confusing and are worthy of examination.
Solaris Tape Device Conventions
The /dev/rmt directory typically contains many different entries
for the same tape drive because each entry actually specifies a
different drive behavior. This is a historical artifact in Unix
that arose from the inability of the open() system call to
receive the tape drive parameters as arguments. Instead, the tape
drive parameters are encoded in the various minor names.
Table 1 shows the Solaris mtio manual page section that
explains the format of the various device names (http://docs.sun.com/app/docs/doc/802-1930-07/6i5u9742j?a=view).
This shows that a single physical tape drive may have many logical
names, each of which specifies a specific density, whether to use
BSD behavior, and whether to rewind the tape. The density behavior
can be specified as either low, medium, high, ultra, or compressed.
However, these specifiers only express a relative density and not
all are necessarily implemented in any given device driver.
For a given tape drive, the device driver writer should assign
specific meanings for the kind of physical tape drive being handled,
as needed, and may choose to implement any or all of these format
types depending upon the capabilities of the tape drive.
The minor device byte structure is shown in Table 2, also taken
from the Solaris mtio manual page. Note that only two bits are used
for density selection, allowing for only four possible density values.
This explains why the values for "ultra" and "compressed" are listed
together as "u/c" in Table 1.
In Table 3, the various names for a tape drive show how the tape
drive parameters are indicated in practice. All of these nodes together
represent a single physical tape drive being referred to using the
various possible options.
Device Permissions
When a new tape drive is added, the operating system must be configured
to recognize the drive. Solaris provides for this with the reconfiguration
boot command boot -r, issued from the OpenBoot prompt. Alternatively,
touching the file "/reconfigure" and then rebooting will also cause
a reconfiguration boot. Upon rebooting, the system will probe for
the newly connected tape drive and build devices and names for it
in the "/devices" and "/dev" directories.
In older environments (running Solaris 7 or previous versions),
the drvconfig command is executed to configure the /devices
directory after a reconfiguration boot. This command builds the
necessary kernel data structures and makes entries in the /devices
tree. The /etc/minor_perm file is consulted by drvconfig
to obtain the permissions of the newly created entries. The devlinks
command is then executed to create symbolic links from the /dev
directory tree to the actual device nodes under the /devices directory
tree. This is then followed by execution of the tapes command,
which creates the various links in the /dev/rmt directory to the
tape device special files under the /devices directory tree. In
environments running Solaris 8 or higher, the devfsadm command
is issued instead. The devfsadm command now does all of the
work of maintaining the /dev and /devices namespaces and replaces
the previous pre-Solaris 8 administration tools. For compatibility
(in Solaris 8 and higher), the drvconfig, tapes, and
devlinks commands are implemented as links to the devfsadm
command.
Many administrators do not set the permissions of the newly created
tape drive devices carefully enough to ensure data security. Entries
in the file /etc/minor_perm determine the permissions for the newly
created nodes. Each entry in /etc/minor_perm consists of a line
starting with the node name, followed by the minor_name, permissions,
owner, and group information. For example, an entry used for a tape
drive could appear as follows:
st:* 0666 root sys
In this case, all devices exported by the "st" node will have the
permissions 0666 applied, granting permissions to both read and write
to the owner, group, and world. The data on this tape drive will be
readable by anyone with an account on the system, and the ability
of anyone to write to the tape is also granted by this setting. Considering
that many files that are backed up to tape contain sensitive data,
such as /etc/shadow, this would be poor security. With the ability
to read and restore the system's /etc/shadow file from a tape backup,
a user could obtain the encrypted root password and then run a password-cracking
program against it at their leisure. Also, with global write permissions
in place, any user could change or destroy the data on tapes. Making
certain that proper permissions are set on the tape drive nodes is
an important part of maintaining good security.
Additionally, the software commands that read and write backups
to the tape drive may need to have their permissions locked down.
If the permissions on the ufsrestore and ufsdump binaries
are set so that the SUID bit is set, and these files are owned by
root, then unprivileged users will be able to read from and write
to the tape. Unless this is intended, the SUID bit should be turned
off for those binaries.
Once the administrator has tightened permissions on the logical
tape devices and locked down the SUID bits on the root-owned system
binaries used to create and restore from backup file systems, issues
of physical security remain. Ensuring the physical security of backup
tapes is as important as taking the steps needed to provide for
the logical security of the drive devices in the operating system.
If the tapes are not physically secure, people with physical access
to the tapes may manipulate the tape data by using another machine.
Backup Integrity
Another aspect of Unix tape backups relates to the possibility
of files changing while the backup is being made. Unix uses a "snapshot"
method to back up file systems to tape. Between the time the snapshot
is taken, and the time the data is actually written to tape, many
changes may have occurred. On a quiescent system running in single-user
mode, these changes will be minimal, but on a system where users
and processes are active, there may be major inconsistencies between
the snapshot and the active files that have changed but have not
yet been written to tape. Because most backups are not performed
in single-user mode, but instead made when user processes and data
are active, many backups reflect these changes in the form of unexpected
data being written to tape. The administrator may be in for an unpleasant
surprise when it comes time to restore data from the tape. In fact,
the backup may be corrupted and unusable if the file system image
changes too radically during the dump process.
Schrödinger's Cat
In the 1930s, physicist Erwin Schrödinger devised a "thought
experiment" to illustrate a concept in quantum mechanics, called
the "Schrödinger's cat" experiment. Briefly, a cat is placed
into a sealed box, together with an apparatus containing a radioactive
atom and a canister of poison gas. A Geiger counter is connected
to the poison canister, and if the radioactive atom decays, the
Geiger counter detects an alpha particle, triggering the canister
of gas to be released, killing the cat. If no radioactive decay
occurs, the trigger does not fire, and the cat remains alive. In
Schrödinger's thought experiment, as long as the box remains
unopened, the cat is in an indeterminate state, possibly alive or
dead. But when the box is opened by an observer, the cat will be
either alive or dead -- in effect, the observer has "collapsed"
its wave function, and the cat will be in one state or the other.
A Unix tape backup that was made on an active system shares some
of the properties of this theoretical cat. Because the snapshot
method is used to make the tape backup, and the tape drive requires
some degree of time to write the data to tape, there is an element
of the unknown regarding the exact state of the data written to
the backup tape. Like Schrödinger's cat, the data can be either
"alive" or "dead" when the systems administrator goes to retrieve
it from tape. The unfortunate administrator may restore data that
seemed to dump to tape correctly, but upon inspection, has fallen
victim to the changes that occurred between having its "snapshot"
taken and actually being written to tape.
Bogocats
To measure the degree of this possibility present in a given backup,
I borrow Schrödinger's cat, and from Linux, I borrow the concept
of "BogoMips". BogoMips are measured by the Linux kernel at boot
time by determining how fast a certain busy loop executes on a given
machine. "Bogo" comes from "bogus", indicating that the BogoMips
value gives a rough "guesstimate" of the speed of the CPU, but it
is an imprecise and somewhat whimsical calculation. For the purposes
of measuring the possibility of the data on the tape being "alive"
or "dead", a "Bogocats" metric can be similarly derived. The lower
the Bogocat value of a given backup, the higher the chance becomes
of bogus data being on the tape.
Another concept from physics I will enlist is "Drake's equation",
which is a formula to estimate the number of intelligent civilizations
with which we may be able to communicate within our own galaxy.
Invented by Dr. Frank Drake, the formula takes into account many
factors, including the fraction of stars in our galaxy that have
planets, the fraction of those planets where life develops, and
the fraction of those planets where intelligence develops, among
other variables. The problem with Drake's equation is that the values
of all of the variables are unknown. Likewise, the manifold variables
that factor into the reliability of data on a given backup tape
may be difficult to assign specific values. However, as is the case
with Drake's equation, a theoretical description of the Bogocat
metric can be formulated and then used to make a rough estimate.
Using the Bogocat Equation
The Bogocat equation for computing the likelihood of "communicating
with intelligent data" on a given Unix backup tape is shown in Table
4. The Bogocat equation gives a "guesstimate" of the degree of reliability
of a given backup and depends on a number of factors.
First, we consider the speed of the tape drive measured in megabytes
per second, then divide this by the amount of data in the backup,
measured in megabytes. Second, the relative degree of quiescence
of the file system being backed up is considered and assigned a
decimal value between 0 and 1, with a completely quiescent file
system receiving a value of 1, and one with a good deal of activity
getting a lower value. Consideration is next given to the hardware
and tape quality, with a value of 1 given to a backup made with
perfect hardware and tape, and lower values given when malfunctioning
hardware or more questionable media is used.
As with the Drake equation, consider the Bogocat equation to be
a simple "rule of thumb" that is useful for making a rough estimate
of the degree of integrity of a backup. For example, consider a
backup made using a common tape drive, the Quantum SDLT 320. This
tape drive has a native transfer rate of 16MB/s, and like most other
tape drives, the SDLT 320 has compression built-in, offering a typical
compression ratio of 2:1. Because the data compression is done by
the drive hardware, this increases the data transfer rate to 32Mb/s.
Consider some examples of the Bogocat equation in use, with some
sample values. If one were backing up only 32 MB of data, using
a SDLT 320 drive with compression, on a reasonably quiescent system,
using good tapes, the Bogocat value would be computed as shown in
the first example in Table 5. However, if the level of quiescence
were far lower, and only half that of the first example, the Bogocat
value correspondingly would fall as shown in the second example
in Table 5.
Next, consider a backup made of a reasonably quiescent file system
containing 32 GB of data, instead of the much smaller 32-MB example
first considered. In this case, as shown in the third example in
Table 5, the Bogocat value is far lower. As the Bogocat value approaches
zero, the potential for corruption or unusability of the data on
the backup tape increases.
Conclusion
Given a reasonably sized backup taking place on a system with
an appropriately fast tape drive, on a fairly quiescent file system,
and using good hardware and fresh media, most backups will score
a Bogocat value well above zero, indicating a high probability of
making contact with "intelligent data". Tape backups with values
closer to zero are less likely to have "intelligent data" to communicate
with.
As in the case of Schrödinger's cat, the administrator will
not know the actual state of the data on the tape without "opening
the box" and observing it, by retrieving it from tape. Only then
does the data on the tape reveal itself to be "alive" or "dead".
More than one administrator has inadvertently scheduled a cron job
to perform tape backups while at the same time running other jobs
that change the very data being backed up, only to discover this
unfortunate situation later, after retrieving the data.
Acknowledgements
The author thanks Faried Nawaz and Paul Miach.
James Price has a B.S. in Computer Science from Portland State
University. He has been a Unix systems administrator at the Multnomah
County Library in Portland, Oregon, since 1998. He can be reached
at: jamesp@agora.rdrop.com. |