Dissecting
Virtual Tape Libraries
Bryan J. Smith
For a company that will never have a data loss event, backup is
optional. For companies that will experience a data loss event,
recovery is not optional. So, invariably, backup of any and all
data that may need to be recovered is not optional. Backup strategies
are still widely debated, with many beginning to claim tape is a
legacy solution that can and should be completely replaced by disk.
Keeping recovery clearly in mind, while disk may solve many backup-time
issues, disk still has serious environmental, longevity, and portability
issues.
The focal point for the debate does not begin with media/medium.
Given advancements in both operating system and networking technologies
over the past decade, all recovery strategies now basically fall
into three main categories:
- Online -- Transparent solutions such as redundant array
of independent disk (RAID) storage and system clustering/failover,
as well as the availability of filesystem snapshots.
- Off-line -- System storage or removable media for both
short-term and long-term recovery needs. Traditionally, this is
removable tape cartridge and, increasingly, both fixed and removable
disk.
- Remote -- An online variant, mirror data to systems on
a remote network (typically over the Internet).
Online recovery is still at the mercy of filesystem corruption,
let alone physical disaster, which negates RAID, clustering,
and snapshots. Remote backup solves the physical disaster issue,
but bandwidth is proportional with cost, which results in most
remote backups being only a subset required for nominal operation.
So, while both newer online and remote recovery solutions are
a welcome supplement to the traditional, off-line recovery procedures,
they can only augment off-line recovery, not replace it. This
leaves most systems administrators still requiring a traditional
backup, typically an off-line media.
Real-Time Tape Backup
Tape cartridge is still the preferred media for traditional
backup and for taking data off line. The overwhelming majority
of corporate data is still hosted in tape cartridges --
85% according to a 2004 study by the Enterprise Storage Group.
Tape cartridges are portable and fairly long-lasting, at least
when compared to magnetic disk (and even most magneto-optical
standards), especially fixed disk. These attributes are why
tape continues to be the preferred media for off-line storage
and recovery, typically in a sole and unified backup strategy.
Legacy experience has resulted in both tape backup products
and procedures being built around complete, real-time backup.
This is becoming more and more unmanageable due to four compounding
issues, as illustrated in Figure 1.
- Capacity -- Individual disk and total disk capacity deployed
now far exceeds even the leading edge of tape or typical tape
resources to back it all up. Initial cost of tape is at least
one order of magnitude greater than disk.
- Distribution -- Companies are deploying more application-specific
servers, and the initial cost of tape devices prevents each one
having their own tape device. Centralized backup results in the
common N:1 system to tape device ratio, with N steadily increasing.
- Window -- More data to back up from more servers, backup
times have been extended, while the common 12/60-hour, daily/weekly
window has not. Even if the system interconnects on backup servers
and can handle the throughput to feed a tape device, the increased
load on the network is typically detrimental.
- Saturation -- The network is becoming saturated with just
distributed backups. A network stream that cannot keep up with
the tape drive not only uses the tape drive inefficiently, but
it also introduces tape head and cartridge wear with excessive
stop-rewind-write operations.
This traditional, "in real-time" approach to centralized
tape backup is clearly not a viable backup strategy. While some
problems can be mitigated by only backing up a subset of the
data, this only re-introduces administrative and user headaches,
all while still not backing up all data. Is tape the culprit?
Or is it how tape is typically deployed? I'll come back
to those questions in a minute.
Note that in this article, I will not discuss concepts such
as of out-of-band or storage area networks (SAN), which some
readers will point out can be used to address backup window
and other constraints. Such discussions of network topology
and options are beyond the focus of this article, and that topic
has been covered in many other networking and storage articles.
The primary goal of this article is to introduce sys admins
(the primary audience, not network or storage administrators)
to the idea of not directly backing up to tape from end-servers
in real-time, as the rest of this article will dissect and expand
upon.
Commodity Disk Issues
Online disk recovery is already an established staple and
augments off-line media usage. The price of commodity, fixed
disk drives -- hereafter referred to simply as "commodity
disk" -- is driving its adoption for online and even
off-line recovery media. Beyond just solving the initial tape
device cost for smaller firms, the capacity, performance, and
recovery speeds of commodity disk are also driving adoption
among larger firms (see Table 1).
Unfortunately, commodity disk has some major issues for both
online and off-line recovery. Looking past the immediate backup
advantages, from a recovery standpoint, there are serious deficiencies
with both the long-term online usage as well as off-line storage
of a commodity disk.
- Online commodity disks are designed for desktop systems with
8-to-5 operation, rated for only 50,000 restarts at 8 hours per
operation (400,000 hours MTBF). This is due to many factors: leading-edge
density, materials, and the precision of their mechanics (resulting
in increased vibration). See the sidebar for a further discussion.
- Off-line commodity disks are not designed to be transported
or shut down for extended periods of time after initial operation,
as mechanics will fail after minimal shock or periods of inactivity
(e.g., a "knock" directly above the spindle is often
required to start either a long inactive or previously unused,
but aged, commodity disk).
Although a few, removable rigid disk (RRD) devices have entered
the market and address some of the environmental issues, they
lose much of their performance advantage and reintroduce capacity
and cost issues. As such, RRD media is finding little use for
anything other than end-user, point backups (like most consumer
optical formats) where the initial cost of enterprise tape is
cost prohibitive.
Near-Line Meets Tape
Despite the environmental issues in using commodity disk for
online, 24x7 storage, it has been adopted in many network attached
storage (NAS) and even a few storage area network (SAN) products.
Typically, these have been limited deployments where cost is
an issue and size is limited. Greater capacities and densities
only worsen the environment for commodity disk, while increasing
their number and the static cost realities that come with their
increased failure over enterprise disk.
A newly reintroduced term in the storage world is near-line
storage, with commodity disk (reborn as a "near-line"
enterprise disk -- see sidebar) replacing the older definition
typically referring to tape or other online media libraries.
Today's near-line storage concept defines data available
in tens of seconds to minutes from request, instead of the assumed
millisecond from request made of online storage. By its very
nature, near-line storage is an ideal application for commodity
disk where only a number of devices are in use, thus reducing
vibration, heat, and operational period.
A major application for near-line storage has always been
backup/recovery, and today's commodity disk enhances the
solution. A common, non-transparent, online recovery solution
is filesystem snapshots, which allow administrators and even
users to easily recover accidentally deleted files. A near-line
recovery solution offers almost the same level of convenience
(given a similar interface) with little added delay. A near-line
solution also stores multiple, independent (or incremental)
backups like near-instantaneous tape, but without relying on
the consistency of the local filesystem.
It was not long before someone added tape devices directly
to these near-line storage solutions and realized they could
solve many problems associated with always relying on tape media
for all backups. The near-line storage component becomes an
ideal buffer for tape backup, something only today's most
advanced backup servers already offer, by removing both the
real-time and full capacity needs of most off-line recovery
solutions with three, increasingly significant, advantages:
- Window -- The actual system backup window does not have
to match the commit to tape device device window. This also removes
the requirement of an available tape device during the window.
- Performance -- A near-line storage device with tape backup
always receives data from systems and writes data to tape as fast
as the interfaces allow. The bottleneck of one does not affect
the other.
- Non-Saturating -- This is the ultimate advantage, which
heavily augments the other two. Every backup after the initial
only needs to be an incremental backup of the filesystem, even
when committing an actual, full backup to tape.
Compared and contrasted are Figure 2, an advanced, buffering
backup server with tape devices, and Figure 3, the evolution
of near-line storage with both virtual tape volumes and physical
tape devices. Depending on the intelligence and buffering capability
of the backup server, it may be able to reduce the amount of
data actually transferred, increase the number of backup connections
over the number of actual, physical tape devices, and can commit
actual data to tape locally and a full 168 hours/week without
affecting the network or its systems. But the near-line storage
can offer several, additional backup and, more importantly,
restore capabilities.
An administrator must consider the actual limit of how much
data can be committed to physical tape in a full 168-hour week.
In a traditional backup server, even if it has advanced buffering
and other capabilities, it is still solely focused on committing
any and all backups to tape. The near-line device offers up
to six additional and compounding advantages over even the most
advanced backup servers, depending on implementation:
- Certainty -- No backup is ever missed; some backups just
never get committed to tape.
- Discretionary -- Tape committal is defined by necessity
of data in off-line form, instead of being rushed by time-limited
necessity to commit all backup to tape.
- Priority -- Tape committal can also be prioritized by off-line
data needs, off-line period frequency, or other off-line data
requirements.
- Hindsight -- There is always the option to off-line any
previous backup as long as the near-line solution still stores
it.
- Recovery -- Any near-line store is still available for
near-line recovery.
- Verification -- Can be used to verify off-line tapes at
any time, without any restore.
The small combination of taking the best features of near-line
storage using disk and pairing them with the continued necessity
of off-line recovery with tape results in huge gains over not
only forced, real-time tape backup, but even advanced, buffering
tape backup servers.
FalconStor IPStor
The Virtual Tape Library (VTL) was invented the second a Unix
programmer hacked /dev/rmt to write to a remote disk buffer,
or even earlier when an administrator piped a tarball through
a remote shell that managed multiple files for multiple volumes.
The "turn key" VTL solution invented over the past
few years has married a more formalized version of this software
approach to a near-line storage device with optional, physical
tape devices (possibly tape libraries) attached. Implementations
vary, with limited support for both virtual devices (as viewed
from the client side) and physical devices (on the near-line
storage side, or possibly sister backup servers), but most are
based on Linux with entry-level solutions starting at just over
$5,000.
For an evaluation of eight solutions ranging from $7,000 to
more than $70,000, see the article "Storage Pipeline: Virtual
Tape Libraries" by Mark Howard published in CMP Media's
Network Computing magazine last fall. At the entry level,
there will be limited virtual and physical device emulation
(maybe only a few devices, 1-2 libraries), limited or generic
expansion (single 1-5U rack appliance), and limited client support
(i.e., little more than popular Windows client, maybe a generic
Unix/Linux interface and/or NDMP) as standard. The most flexible
VTL solutions in both virtual/physical tape devices and client
support by a wide margin (i.e., the most universal Unix/Linux
flavor compatible) are based on the VirtualTape Library solution
in the FalconStor IPStor suite of products.
The primary host platform for FalconStor VTL is Red Hat Linux
7.3. FalconStor continues to provide OS updates for the end-of-life
Red Hat Linux 7.3 product as a standard part of the VirtualTape
Library product updates. The Linux command-line interface (CLI)
aspects of the system are well hidden by the IPStor suite's
graphical configuration and maintenance tools. It can be implemented
as an IP node or iSCSI (storage over IP) or Fibre Channel (FC)
SAN target appliance (internal as well as external storage via
iSCSI and FC are supported for actual disk storage). A redundant,
dual-VTL host configuration is supported as standard in all
editions.
Depending on the edition, dozens of backup sessions can be
supported with hundreds of virtualized devices simultaneously.
Backup sessions may be instantiated by remote backup hosts,
or the VTL may host the backup software directly (an option
for system designers since the system is Linux). Although the
purpose of any VTL is to emulate legacy, remote tape devices
for virtually any software, FalconStor's solution is certified
to support backup servers from BakBone, Computer Associates,
Legato, Tivoli, Veritas, and many others (this will vary by
end-user product implementation), along with NDMP and IBM iSeries
protocols. The device/library support is extensive (EMC's
CLARiiON implementation is in the hundreds), and direct, back-end
export/import (between virtual and physical tapes) is supported
to/from IBM 3494 and StorageTek ACSLS systems.
End-user product implementations of the FalconStor VTL include
Copan Systems' Revolution 200T, EMC's CLARiiON solutions,
MaXXan Systems' VT100 as well as custom solutions from
Network Engines. Although the list price for FalconStor's
VTL software is listed at $15,000-30,000, most end-user product
implementations, as an integrated hardware and software solution,
start at just over $20,000 -- scaling $2-3 per GB.
Massive Array of Idle Disks (MAID)
The commodity density, interconnect, and power design that
is used today for online storage is also how the overwhelming
majority of near-line storage solutions, including VTL, are
implemented. Eight or more, hot-swappable, 3.5" disks are
housed in a traditional 3U rackmount, connected via SCSI, SATA,
or Serial Attached SCSI (SAS), possibly with an intelligent
backplane that appears as a SCSI target or possibly SAN target
via Fibre Channel or iSCSI. As such, most VTL implementations
from the majority of vendors are either server appliances that
take arbitrary, external SCSI, iSCSI or FC storage (e.g., MaXXan
Systems), or are merely an implementation of existing cabinet
system-storage solution (e.g., EMC CLARiiON) already offered
as online storage solutions.
If "blades" are the most efficient density, interconnect,
thermal, and power solution for rackmount, load-balancing server
management, then the Massive Array of Idle Disks (MAID) as implemented
by Copan Systems in their Revolution 200T product may very well
be the model to follow for rackmount, near-line storage management.
MAID was born out of research done at the University of Colorado,
Boulder, based on an analysis of cost, density, and power consumption
options in storing of massive amounts of data (100+TB) for Unix
supercomputing applications. Ironically, although the whitepaper
(presented at USENIX January 2002) exposed some concepts for
solving the problems of near-line storage for supercomputing
clusters, the concept was clearly applicable to the growing
market for near-line solutions, such as VTL systems. Dr. Aolke
Guha, former VP and Chief Architect at StorageTek, founded Copan
Systems the same year.
Copan Systems introduced a physical form-factor and thermal/power
distribution approach for near-line, application-specific storage
in a very high density. No more than 25% of storage is powered
at any time and is physically interleaved among canisters and
shelves for optimal thermal distribution. Instead of using single
or even multi-drive hot-swap bays, 8 hot-swap canisters of 14
drives are loaded in a single, 4U, 19" rackmount shelf.
The 3.5"x1" drives sit vertically, 2x7, connecting
to the SATA backplane on bottom of the canister. The resulting
density is 896 drives for every 32U, the typical aggregate unit
size, leaving 8-10U for dual-redundant host systems in a typical
40-42U cabinet. Compare this to traditional storage density
in 33U (using 11x3U storage shelves of 8 drives across), where
only 88 drives can be accommodated -- 176 drives if mounted
front and back of an adequately deep cabinet. Even an ultra-dense
setup of 15 drives across shelves will only give a maximum of
165 drives or 330 mounted front and back.
By applying MAID principles in a managed, application-specific
form-factor, not only are the power usage (5.8 KW/h) and thermal
characteristics (20,000 BTU/h) nominal for a storage cabinet
of the same size with only 1/3 to 1/5 the drive density, but
power and thermal properties are distributed so commodity disks
will never exceed their environmental specifications (8x5, <40C,
vibration, etc.).
On the software end, Copan Systems' MAID approach typically
limits each fixed disk to 5,000 power-ons in its warrantied,
5-year lifespan (1/10 the typical rating of commodity disks).
It also manages mounts and even ensures that unused devices
are periodically powered on and exercised to prevent extended
off-line periods and to verify data integrity -- another
key issue when deploying commodity disks designed for desktops
used regularly. The only drawback to the solution as a whole
seems to be the weight. A Copan Systems Revolution 200T weighs
approximately 1350 kg (3,000 lb) for the full, 896-drive, 224-448
TB (using 250-500 GB drives) configuration. The entry point
for a single blade solution involves an investment of several
servers. The same is true of MAID near-line storage as the Copan
Systems' Revolution 200T starts at 56TB (2x112 drive shelves,
plus the 2 redundant hosts).
Conclusion
Most systems-oriented trade magazines have been debating traditional
tape media versus online or even off-line disk storage for backup
and recovery. Given the online environmental and off-line longevity
issues of commodity disk storage or costs involved with remote
site backup, the reality is that portable tape cartridge media
still remains the only cost-effective way to ensure long-term
data retention and disaster recovery. The problem with tape
backup is not the media but its common, legacy implementation
-- real-time directly from end-systems to end-tape devices,
which puts the entire backup operation at the mercy of the increasing
issues of capacity limitations and distribution load in the
same 12/60-hour backup window from years earlier.
Tape is an off-line recovery solution. End-systems should
never back up directly to tape, which will always be a capacity
and scheduling nightmare. End-systems should always back up
to near-line storage systems, which then have attached tape
devices/libraries. Those near-line systems then commit backups
to tape in non-real-time as the rotation and retention policies
dictate for off-line storage.
Immediate recovery is offered by the near-line storage, while
off-line, disaster recovery is still offered by tape. Pairing
near-line and off-line medias against near-line and off-line
recovery requirements is why virtual tape library (VTL) solutions
are always the most effective solutions over traditional, real-time
centralized backup, even those that offer some advanced capabilities
like buffering and local storage at the centralized server.
Apply this dissection of VTL solutions to your next backup
or disaster recovery need or to a problem in your existing implementation.
VTL products are the most transparent solutions that provide
the best of both near-line disk recovery and off-line tape recovery
in one. But, regardless of solution, any complete and well-designed
backup strategy always ensures both complete and timely backups
online with the guaranteed availability of a portable and long-lasting
off-line recovery option.
Links to More Information
Howard, Mark. "Storage Pipeline: Virtual Tape Libraries",
Network Computing, Sept. 16, 2004 -- http://www.nwc.com/showArticle.jhtml?articleID=47208530
FalconStor Certification Matrix -- http://www.falconstor.com/certification_matrix_active.asp
FalconStor VirtualTape Library Brochure -- http://www.falconstor.com/Brochures/VTLAppliance.pdf
Copan Systems Revolution 200T --http://www.copansys.com/products/specifications.htm
EMC Corporation CLARiiON -- http://www.emc.com/products/systems/clariion.jsp
MaXXan Systems SVT100 -- http://www.maxxan.com/product/virtual-tape.html
Network Engines VTL Appliance -- http://www.networkengines.com/
Bryan J. Smith has an educational background in computer
architecture (BSCpE, UCF) and has spent more than a dozen years
applying distributed systems and storage design principles.
Over the past 4 years, he has provided services as an independent
architect at a variety of clients in the defense, education,
and finance industries. Bryan and his wife, Lourdes, maintain
their permanent residence in Orlando. He can be reached at:
b.j.smith@ieee.org. |