Poor Man's High Availability
Ron Jachim
High availability (HA) is one of those nebulous terms that means different things to different people. A reasonable definition is: high availability takes additional precautions to improve the effective uptime of computer systems. The extent of these "additional precautions" usually depends on the business needs of the organization. In most cases, HA includes multiple layers of redundancy and carries a correspondingly high cost to the organization, both in terms of hardware expense and management complexity. In this article, I explore various elements of HA and depict an alternative approach for situations in which conventional HA is either not affordable to the organization or not essential to the business requirements.
HA Elements Before delving into the specifics of my alternative approach, it is useful to review some of the elements of HA - those areas of systems that are most likely to fail. The most vulnerable part of a computer system is generally considered to be the disk drive (or drives). Additional precautions you can take with disks include mirroring, parity checking, and striping of data so that a single disk-drive failure will not result in data loss. This is generally called RAID, or Redundant Array of Inexpensive Disks. RAID is routinely used on servers and is occasionally used on clients. Similarly, more advanced RAID systems may employ redundant RAID controllers as a means of further reducing potential downtime.
A second common point of failure is the power supply. Many servers have dual power supplies to avoid the effect of a power supply failure. Because disk and power supply failures are so common (relatively speaking), and because inexpensive solutions are available for these types of failures, these types of redundancies are often the only ones used in many environments. More advanced servers, however, may include additional component-level redundancies to increase the HA capabilities of the system.
Moving beyond coverage of the most basic redundancies usually involves clustering of servers. This can be done in both the UNIX and NT worlds, and with most proprietary mainframes, too. In short, clustering combines multiple server computers into a single entity. In practical terms, you might have a cluster of three Alpha-based servers running one application. If any component causes one of the systems to fail, the others will continue to run the application, and users on the failed server will be moved to one of the other systems within the cluster. Although this may result in the application's running more slowly, at least it will continue to run.
These two scenarios represent the typical extremes of redundancy. Of course, you could also do nothing or you could invest in very expensive server that fully duplicates every hardware component (à la Tandem). What has not been reflected above is some sort of middle ground. Is there a reasonably high availability solution that goes beyond simple disk and power supply redundancy without incurring the expense of dedicating duplicate servers in a clustered environment? The answer is yes.
If you have a development/debug system and a production system, why not leverage your development/debug system into a backup for your production system? The penalty in hardware may be just a second hard disk for one or both of your systems. You'll gain peace of mind and sleep better at night.
This article builds on a very simple concept. Make a copy of your root partition so you have it if a problem occurs. Starting with the simplest case, you would use dd to make a spare copy of a disk and store it offline. We will build on this to cover a variety of operating environments. Advantages and disadvantages of each method will be discussed. Everything that follows is based on the concept of relatively high availability. I assume that your boss has rejected an expensive clustered environment with redundant hardware and software to manage everything, so you need to make do in your current environment.
Alternately, if you are lucky enough to have a dedicated debug system, this setup will ensure that your debug system is a very good representation of your production system. You could just use RAID, but its protection will not be as good in many of the examples below. The advantages and disadvantages of using RAID will be discussed in each of the examples.
This technique is a kind of "baling wire and chewing gum" solution that can be quite appropriate in many (most?) environments. What I present is a way to duplicate the key elements of a system so that for the price of a reboot, you can have your users back on line. This is achieved by substituting either at the disk or the server level. Although your debug system probably won't compare with your production system in terms of speed, at least your users will be operational.
Single System With a single system, the best you can do is have a backup copy of your system disk ready to go. This copy can be kept in the server or on the shelf. If your system drive fails, you can bring the system up with your spare drive. Obviously, mirroring your system disk would be a much simpler solution, but if your system were to take a major hit, both the original system disk and its mirror would likely crash. Mirroring adds an additional level of complexity that many people are uncomfortable with for their system disk. Furthermore, many environments do not permit mirrored system disks without the purchase of additional hardware or software.
The simplest way to protect your server is to disk-to-disk copy your boot disk to a spare disk and remove the spare disk from your system. Figure 1 illustrates a typical single-server environment. Listing 1 shows a script to perform the copy and verification of the system disk. For an offline spare, picture the alternate boot disk sitting in a safe place.
Although simple in concept, it is difficult to actually implement this plan. Every time you make a significant change to your normal system disk, you will have to insert your offline spare system disk, perform the disk-to-disk copy to the alternate disk, then remove the alternate disk again. Depending on the hot swap capabilities of your system, this could be a fairly complicated procedure.
If you had a system disk stored somewhere that was ready to insert into your system, you could be up and running fairly quickly. Of course, you would have to do something about your data disk, but with your system disk up, you could begin providing some services to your users while the other services were worked on.
Having an offline spare is convenient, but the process of making it and keeping it current is awkward, to say the least. An alternative, especially for people who cannot or will not mirror their system disk, is to manually make a copy of the normal boot disk from time to time. This results in an online spare system disk that can be stored in the server. The /etc/fstab file can be appropriately edited in advance so that you can boot directly from the alternate boot disk without making any hardware changes. This script is shown in Listing 2.
This procedure makes the disk-to-disk copy script more complicated, but from an operational perspective, it is less complicated than having a current offline spare system disk. If there is a problem with the normal boot disk, simply shut down your system and reboot it from the alternate boot disk. This reboot will probably have to be done from the console. The exact details will vary from system to system.
Dual System With two systems available, your options increase. If your second system is just gathering dust, by all means cluster it. If your second system is normally used as your debug system, you can still take advantage of it for emergencies. Here we have the advantage of not only a redundant disk, but a completely redundant system.
With this level of redundancy, you can handle two servers as shown in Figure 2. Note that there are two servers with identical architecture, disk numbering conventions, etc. This setup is extremely important in this example. Each system has its own local disks and presumably a tape drive and CD-ROM drive. Each system is connected to the network, and to the same external disk storage. Only one of these systems could actually use the external storage at any given time. In the example, Kmita is normally the debug server and Babinitch is normally the production server. You should also be aware of IP address conflicts that could occur if you bring up your spare server as your production server while the production server is still operational.
Because over time, the systems will begin to deviate, it is important to use Kmita's boot disk as the basis for booting the Kmita server - it is irrelevant whether that piece of hardware will be viewed as Kmita or Babinitch. Any file that makes it assume the personality of Babinitch must be identified and copied on top of the Kmita files. This procedure is best done with a script such as the one shown in Listing 3. Note that this script will need to be fine-tuned for your environment, but it will provide a solid foundation upon which to build. Like the previous examples, this technique provides relatively high availability of the boot disk. In this case, we also provide redundancy of server hardware. The only thing not redundant in this example is the non-system disk(s).
By placing all non-system data on an external RAID storage device, you can also improve the non-system data reliability through your preferred RAID storage method. It will take some restraint to place all of your non-system files in your external RAID, but there are payoffs. It would help to know exactly which files that each of your applications places on the system disk.
The external storage is organized however is most appropriate for the type of disk access that will occur. It could be connected to the server using Fibre Channel or a SCSI interface. It will take some planning to get all of your applications, both data and program files, onto the external storage.
It helps to have identical systems with identically arranged disks. In that case, you can simply disk-to-disk copy the system disk from one system to the other using a command sequence shown in Listing 3. Then, if Babinitch crashes, you can shut down Kmita and reboot it using the alternate system disk rather than the normal system disk. In this case, Kmita would become Babinitch. You will probably have to clean up the filesystems of the external storage if the shutdown is not graceful, but you can be up and running quickly.
Few administrators have this level of control over their environments, and it is much more usual to have different classes of systems for production and debug purposes. In most instances, you will have the same or a very similar version of UNIX running on each. This more complex and realistic environment is shown in Figure 3. Note the differences. The internal disks are on different SCSI chains in each system due to architectural differences in the systems. These same architectural differences will cause device driver and other operating system level differences. The possibilities are so varied and system specific here that providing a script is pointless.
Conclusion Although the examples presented here show a dual boot arrangement in a Solaris environment, the principle should work in other homogeneous UNIX environments. You should very carefully review these scripts and then write your own for your operating environment. Each of these scripts does UNIX file handling at the most basic level and could result in destroyed data if they are not suitable for your operating environment. Please review carefully before using and make sure that your systems are backed up.
In theory, this setup should also work in a heterogeneous environment, but it would be less useful. Typically, the need for high availability is to make data accessible through a database or other similar server. If one system is a Digital Alpha and the other is a Sun SPARC, the architectural differences would necessitate the different executables, device drivers, etc. At the very least, database optimization would be different on each architecture.
About the Author
Ron Jachim is Manager of Systems for the Barbara Ann Karmanos Cancer Institute where he is responsible for the systems half of the Information Systems Group. He has fifteen years of networking experience and both a B.A. and an M.S. in Computer Science. His thesis was on fuzzy queries. He can be reached at: rjachim@acm.org.
|