Cover V14, i09
sep2005.tar

Dissecting Virtual Tape Libraries

Bryan J. Smith

For a company that will never have a data loss event, backup is optional. For companies that will experience a data loss event, recovery is not optional. So, invariably, backup of any and all data that may need to be recovered is not optional. Backup strategies are still widely debated, with many beginning to claim tape is a legacy solution that can and should be completely replaced by disk. Keeping recovery clearly in mind, while disk may solve many backup-time issues, disk still has serious environmental, longevity, and portability issues.

The focal point for the debate does not begin with media/medium. Given advancements in both operating system and networking technologies over the past decade, all recovery strategies now basically fall into three main categories:

  • Online -- Transparent solutions such as redundant array of independent disk (RAID) storage and system clustering/failover, as well as the availability of filesystem snapshots.
  • Off-line -- System storage or removable media for both short-term and long-term recovery needs. Traditionally, this is removable tape cartridge and, increasingly, both fixed and removable disk.
  • Remote -- An online variant, mirror data to systems on a remote network (typically over the Internet).

Online recovery is still at the mercy of filesystem corruption, let alone physical disaster, which negates RAID, clustering, and snapshots. Remote backup solves the physical disaster issue, but bandwidth is proportional with cost, which results in most remote backups being only a subset required for nominal operation. So, while both newer online and remote recovery solutions are a welcome supplement to the traditional, off-line recovery procedures, they can only augment off-line recovery, not replace it. This leaves most systems administrators still requiring a traditional backup, typically an off-line media.

Real-Time Tape Backup

Tape cartridge is still the preferred media for traditional backup and for taking data off line. The overwhelming majority of corporate data is still hosted in tape cartridges -- 85% according to a 2004 study by the Enterprise Storage Group. Tape cartridges are portable and fairly long-lasting, at least when compared to magnetic disk (and even most magneto-optical standards), especially fixed disk. These attributes are why tape continues to be the preferred media for off-line storage and recovery, typically in a sole and unified backup strategy.

Legacy experience has resulted in both tape backup products and procedures being built around complete, real-time backup. This is becoming more and more unmanageable due to four compounding issues, as illustrated in Figure 1.

  • Capacity -- Individual disk and total disk capacity deployed now far exceeds even the leading edge of tape or typical tape resources to back it all up. Initial cost of tape is at least one order of magnitude greater than disk.
  • Distribution -- Companies are deploying more application-specific servers, and the initial cost of tape devices prevents each one having their own tape device. Centralized backup results in the common N:1 system to tape device ratio, with N steadily increasing.
  • Window -- More data to back up from more servers, backup times have been extended, while the common 12/60-hour, daily/weekly window has not. Even if the system interconnects on backup servers and can handle the throughput to feed a tape device, the increased load on the network is typically detrimental.
  • Saturation -- The network is becoming saturated with just distributed backups. A network stream that cannot keep up with the tape drive not only uses the tape drive inefficiently, but it also introduces tape head and cartridge wear with excessive stop-rewind-write operations.

This traditional, "in real-time" approach to centralized tape backup is clearly not a viable backup strategy. While some problems can be mitigated by only backing up a subset of the data, this only re-introduces administrative and user headaches, all while still not backing up all data. Is tape the culprit? Or is it how tape is typically deployed? I'll come back to those questions in a minute.

Note that in this article, I will not discuss concepts such as of out-of-band or storage area networks (SAN), which some readers will point out can be used to address backup window and other constraints. Such discussions of network topology and options are beyond the focus of this article, and that topic has been covered in many other networking and storage articles. The primary goal of this article is to introduce sys admins (the primary audience, not network or storage administrators) to the idea of not directly backing up to tape from end-servers in real-time, as the rest of this article will dissect and expand upon.

Commodity Disk Issues

Online disk recovery is already an established staple and augments off-line media usage. The price of commodity, fixed disk drives -- hereafter referred to simply as "commodity disk" -- is driving its adoption for online and even off-line recovery media. Beyond just solving the initial tape device cost for smaller firms, the capacity, performance, and recovery speeds of commodity disk are also driving adoption among larger firms (see Table 1).

Unfortunately, commodity disk has some major issues for both online and off-line recovery. Looking past the immediate backup advantages, from a recovery standpoint, there are serious deficiencies with both the long-term online usage as well as off-line storage of a commodity disk.

  • Online commodity disks are designed for desktop systems with 8-to-5 operation, rated for only 50,000 restarts at 8 hours per operation (400,000 hours MTBF). This is due to many factors: leading-edge density, materials, and the precision of their mechanics (resulting in increased vibration). See the sidebar for a further discussion.
  • Off-line commodity disks are not designed to be transported or shut down for extended periods of time after initial operation, as mechanics will fail after minimal shock or periods of inactivity (e.g., a "knock" directly above the spindle is often required to start either a long inactive or previously unused, but aged, commodity disk).

Although a few, removable rigid disk (RRD) devices have entered the market and address some of the environmental issues, they lose much of their performance advantage and reintroduce capacity and cost issues. As such, RRD media is finding little use for anything other than end-user, point backups (like most consumer optical formats) where the initial cost of enterprise tape is cost prohibitive.

Near-Line Meets Tape

Despite the environmental issues in using commodity disk for online, 24x7 storage, it has been adopted in many network attached storage (NAS) and even a few storage area network (SAN) products. Typically, these have been limited deployments where cost is an issue and size is limited. Greater capacities and densities only worsen the environment for commodity disk, while increasing their number and the static cost realities that come with their increased failure over enterprise disk.

A newly reintroduced term in the storage world is near-line storage, with commodity disk (reborn as a "near-line" enterprise disk -- see sidebar) replacing the older definition typically referring to tape or other online media libraries. Today's near-line storage concept defines data available in tens of seconds to minutes from request, instead of the assumed millisecond from request made of online storage. By its very nature, near-line storage is an ideal application for commodity disk where only a number of devices are in use, thus reducing vibration, heat, and operational period.

A major application for near-line storage has always been backup/recovery, and today's commodity disk enhances the solution. A common, non-transparent, online recovery solution is filesystem snapshots, which allow administrators and even users to easily recover accidentally deleted files. A near-line recovery solution offers almost the same level of convenience (given a similar interface) with little added delay. A near-line solution also stores multiple, independent (or incremental) backups like near-instantaneous tape, but without relying on the consistency of the local filesystem.

It was not long before someone added tape devices directly to these near-line storage solutions and realized they could solve many problems associated with always relying on tape media for all backups. The near-line storage component becomes an ideal buffer for tape backup, something only today's most advanced backup servers already offer, by removing both the real-time and full capacity needs of most off-line recovery solutions with three, increasingly significant, advantages:

  • Window -- The actual system backup window does not have to match the commit to tape device device window. This also removes the requirement of an available tape device during the window.
  • Performance -- A near-line storage device with tape backup always receives data from systems and writes data to tape as fast as the interfaces allow. The bottleneck of one does not affect the other.
  • Non-Saturating -- This is the ultimate advantage, which heavily augments the other two. Every backup after the initial only needs to be an incremental backup of the filesystem, even when committing an actual, full backup to tape.

Compared and contrasted are Figure 2, an advanced, buffering backup server with tape devices, and Figure 3, the evolution of near-line storage with both virtual tape volumes and physical tape devices. Depending on the intelligence and buffering capability of the backup server, it may be able to reduce the amount of data actually transferred, increase the number of backup connections over the number of actual, physical tape devices, and can commit actual data to tape locally and a full 168 hours/week without affecting the network or its systems. But the near-line storage can offer several, additional backup and, more importantly, restore capabilities.

An administrator must consider the actual limit of how much data can be committed to physical tape in a full 168-hour week. In a traditional backup server, even if it has advanced buffering and other capabilities, it is still solely focused on committing any and all backups to tape. The near-line device offers up to six additional and compounding advantages over even the most advanced backup servers, depending on implementation:

  • Certainty -- No backup is ever missed; some backups just never get committed to tape.
  • Discretionary -- Tape committal is defined by necessity of data in off-line form, instead of being rushed by time-limited necessity to commit all backup to tape.
  • Priority -- Tape committal can also be prioritized by off-line data needs, off-line period frequency, or other off-line data requirements.
  • Hindsight -- There is always the option to off-line any previous backup as long as the near-line solution still stores it.
  • Recovery -- Any near-line store is still available for near-line recovery.
  • Verification -- Can be used to verify off-line tapes at any time, without any restore.

The small combination of taking the best features of near-line storage using disk and pairing them with the continued necessity of off-line recovery with tape results in huge gains over not only forced, real-time tape backup, but even advanced, buffering tape backup servers.

FalconStor IPStor

The Virtual Tape Library (VTL) was invented the second a Unix programmer hacked /dev/rmt to write to a remote disk buffer, or even earlier when an administrator piped a tarball through a remote shell that managed multiple files for multiple volumes. The "turn key" VTL solution invented over the past few years has married a more formalized version of this software approach to a near-line storage device with optional, physical tape devices (possibly tape libraries) attached. Implementations vary, with limited support for both virtual devices (as viewed from the client side) and physical devices (on the near-line storage side, or possibly sister backup servers), but most are based on Linux with entry-level solutions starting at just over $5,000.

For an evaluation of eight solutions ranging from $7,000 to more than $70,000, see the article "Storage Pipeline: Virtual Tape Libraries" by Mark Howard published in CMP Media's Network Computing magazine last fall. At the entry level, there will be limited virtual and physical device emulation (maybe only a few devices, 1-2 libraries), limited or generic expansion (single 1-5U rack appliance), and limited client support (i.e., little more than popular Windows client, maybe a generic Unix/Linux interface and/or NDMP) as standard. The most flexible VTL solutions in both virtual/physical tape devices and client support by a wide margin (i.e., the most universal Unix/Linux flavor compatible) are based on the VirtualTape Library solution in the FalconStor IPStor suite of products.

The primary host platform for FalconStor VTL is Red Hat Linux 7.3. FalconStor continues to provide OS updates for the end-of-life Red Hat Linux 7.3 product as a standard part of the VirtualTape Library product updates. The Linux command-line interface (CLI) aspects of the system are well hidden by the IPStor suite's graphical configuration and maintenance tools. It can be implemented as an IP node or iSCSI (storage over IP) or Fibre Channel (FC) SAN target appliance (internal as well as external storage via iSCSI and FC are supported for actual disk storage). A redundant, dual-VTL host configuration is supported as standard in all editions.

Depending on the edition, dozens of backup sessions can be supported with hundreds of virtualized devices simultaneously. Backup sessions may be instantiated by remote backup hosts, or the VTL may host the backup software directly (an option for system designers since the system is Linux). Although the purpose of any VTL is to emulate legacy, remote tape devices for virtually any software, FalconStor's solution is certified to support backup servers from BakBone, Computer Associates, Legato, Tivoli, Veritas, and many others (this will vary by end-user product implementation), along with NDMP and IBM iSeries protocols. The device/library support is extensive (EMC's CLARiiON implementation is in the hundreds), and direct, back-end export/import (between virtual and physical tapes) is supported to/from IBM 3494 and StorageTek ACSLS systems.

End-user product implementations of the FalconStor VTL include Copan Systems' Revolution 200T, EMC's CLARiiON solutions, MaXXan Systems' VT100 as well as custom solutions from Network Engines. Although the list price for FalconStor's VTL software is listed at $15,000-30,000, most end-user product implementations, as an integrated hardware and software solution, start at just over $20,000 -- scaling $2-3 per GB.

Massive Array of Idle Disks (MAID)

The commodity density, interconnect, and power design that is used today for online storage is also how the overwhelming majority of near-line storage solutions, including VTL, are implemented. Eight or more, hot-swappable, 3.5" disks are housed in a traditional 3U rackmount, connected via SCSI, SATA, or Serial Attached SCSI (SAS), possibly with an intelligent backplane that appears as a SCSI target or possibly SAN target via Fibre Channel or iSCSI. As such, most VTL implementations from the majority of vendors are either server appliances that take arbitrary, external SCSI, iSCSI or FC storage (e.g., MaXXan Systems), or are merely an implementation of existing cabinet system-storage solution (e.g., EMC CLARiiON) already offered as online storage solutions.

If "blades" are the most efficient density, interconnect, thermal, and power solution for rackmount, load-balancing server management, then the Massive Array of Idle Disks (MAID) as implemented by Copan Systems in their Revolution 200T product may very well be the model to follow for rackmount, near-line storage management. MAID was born out of research done at the University of Colorado, Boulder, based on an analysis of cost, density, and power consumption options in storing of massive amounts of data (100+TB) for Unix supercomputing applications. Ironically, although the whitepaper (presented at USENIX January 2002) exposed some concepts for solving the problems of near-line storage for supercomputing clusters, the concept was clearly applicable to the growing market for near-line solutions, such as VTL systems. Dr. Aolke Guha, former VP and Chief Architect at StorageTek, founded Copan Systems the same year.

Copan Systems introduced a physical form-factor and thermal/power distribution approach for near-line, application-specific storage in a very high density. No more than 25% of storage is powered at any time and is physically interleaved among canisters and shelves for optimal thermal distribution. Instead of using single or even multi-drive hot-swap bays, 8 hot-swap canisters of 14 drives are loaded in a single, 4U, 19" rackmount shelf.

The 3.5"x1" drives sit vertically, 2x7, connecting to the SATA backplane on bottom of the canister. The resulting density is 896 drives for every 32U, the typical aggregate unit size, leaving 8-10U for dual-redundant host systems in a typical 40-42U cabinet. Compare this to traditional storage density in 33U (using 11x3U storage shelves of 8 drives across), where only 88 drives can be accommodated -- 176 drives if mounted front and back of an adequately deep cabinet. Even an ultra-dense setup of 15 drives across shelves will only give a maximum of 165 drives or 330 mounted front and back.

By applying MAID principles in a managed, application-specific form-factor, not only are the power usage (5.8 KW/h) and thermal characteristics (20,000 BTU/h) nominal for a storage cabinet of the same size with only 1/3 to 1/5 the drive density, but power and thermal properties are distributed so commodity disks will never exceed their environmental specifications (8x5, <40C, vibration, etc.).

On the software end, Copan Systems' MAID approach typically limits each fixed disk to 5,000 power-ons in its warrantied, 5-year lifespan (1/10 the typical rating of commodity disks). It also manages mounts and even ensures that unused devices are periodically powered on and exercised to prevent extended off-line periods and to verify data integrity -- another key issue when deploying commodity disks designed for desktops used regularly. The only drawback to the solution as a whole seems to be the weight. A Copan Systems Revolution 200T weighs approximately 1350 kg (3,000 lb) for the full, 896-drive, 224-448 TB (using 250-500 GB drives) configuration. The entry point for a single blade solution involves an investment of several servers. The same is true of MAID near-line storage as the Copan Systems' Revolution 200T starts at 56TB (2x112 drive shelves, plus the 2 redundant hosts).

Conclusion

Most systems-oriented trade magazines have been debating traditional tape media versus online or even off-line disk storage for backup and recovery. Given the online environmental and off-line longevity issues of commodity disk storage or costs involved with remote site backup, the reality is that portable tape cartridge media still remains the only cost-effective way to ensure long-term data retention and disaster recovery. The problem with tape backup is not the media but its common, legacy implementation -- real-time directly from end-systems to end-tape devices, which puts the entire backup operation at the mercy of the increasing issues of capacity limitations and distribution load in the same 12/60-hour backup window from years earlier.

Tape is an off-line recovery solution. End-systems should never back up directly to tape, which will always be a capacity and scheduling nightmare. End-systems should always back up to near-line storage systems, which then have attached tape devices/libraries. Those near-line systems then commit backups to tape in non-real-time as the rotation and retention policies dictate for off-line storage.

Immediate recovery is offered by the near-line storage, while off-line, disaster recovery is still offered by tape. Pairing near-line and off-line medias against near-line and off-line recovery requirements is why virtual tape library (VTL) solutions are always the most effective solutions over traditional, real-time centralized backup, even those that offer some advanced capabilities like buffering and local storage at the centralized server.

Apply this dissection of VTL solutions to your next backup or disaster recovery need or to a problem in your existing implementation. VTL products are the most transparent solutions that provide the best of both near-line disk recovery and off-line tape recovery in one. But, regardless of solution, any complete and well-designed backup strategy always ensures both complete and timely backups online with the guaranteed availability of a portable and long-lasting off-line recovery option.

Links to More Information

Howard, Mark. "Storage Pipeline: Virtual Tape Libraries", Network Computing, Sept. 16, 2004 -- http://www.nwc.com/showArticle.jhtml?articleID=47208530

FalconStor Certification Matrix -- http://www.falconstor.com/certification_matrix_active.asp

FalconStor VirtualTape Library Brochure -- http://www.falconstor.com/Brochures/VTLAppliance.pdf

Copan Systems Revolution 200T --http://www.copansys.com/products/specifications.htm

EMC Corporation CLARiiON -- http://www.emc.com/products/systems/clariion.jsp

MaXXan Systems SVT100 -- http://www.maxxan.com/product/virtual-tape.html

Network Engines VTL Appliance -- http://www.networkengines.com/

Bryan J. Smith has an educational background in computer architecture (BSCpE, UCF) and has spent more than a dozen years applying distributed systems and storage design principles. Over the past 4 years, he has provided services as an independent architect at a variety of clients in the defense, education, and finance industries. Bryan and his wife, Lourdes, maintain their permanent residence in Orlando. He can be reached at: b.j.smith@ieee.org.