Cover V13, i11

Article

nov2004.tar

Contingency Planning: Lessons Learned from the 9/11 Tragedy

Lisa M. Jaworski

The terrorist attacks that occurred on September 11, 2001, resulted in terrible human loss. Additionally, many buildings were destroyed or damaged to the extent that they had to be condemned. From an Information Technology (IT) perspective, networks were brought down, equipment and cabling were obliterated, and on-site and local backup tapes were destroyed. Because of the lengthy, ensuing chaos in the local area, it was very difficult for businesses, whose key IT functions were disabled, to bring disaster recovery personnel into the area. An unknown number of users lost Internet connectivity because their Internet Service Providers (ISPs) had points of presence in the World Trade Center [14].

Most of the information published on the Internet about contingency planning focuses on the overall process. Plan templates are available and, although a few articles on lessons learned from the events of 9/11 appear, they do not focus on practical plan development and implementation issues. They rehash what is already considered to be industry best practices. Other related articles focus on the need for insurance. For example, event insurance for invitation-only events such as corporate conferences should be considered [5]. Proper insurance coverage is certainly necessary, but the scope of contingency planning lessons must be much wider in scope.

This article documents as many of the practical, hands-on lessons learned as possible. Much of this information was made available through personal conversations with senior and mid-level IT managers responsible for contingency planning in their respective organizations. Both government and private perspectives were gathered.

Lessons Learned

The following paragraphs discuss contingency planning concerns that were not generally considered part of industry best practices before the 9/11 tragedy. The order in which these concerns are presented is not indicative of their importance. This article assumes that existing best practices related to contingency planning are already in place (e.g., those published by the National Institute of Standards and Technology (NIST) and others [2, 6, 11, 12]). Some examples of best practices are the availability of detailed, written procedures for configuring servers, up-to-date call lists, and an established order of priority for restoring systems.

Locate Backup Facilities in Different Geographic Regions

It might seem obvious that backup facilities should be located in a different geographic region than the primary facility, but this was evidently not considered standard practice before 9/11. NetworkWorld Fusion reported that while Genuity has a backup facility, at the time of the tragedy it was located only 12 miles away from the primary facility [14]. Such close proximity invites vulnerability, depending on the nature of the disaster scenario. A disaster that affects the primary site could easily affect the backup site. For this reason, international service provider Equant maintains Network Operation Centers (NOCs) in Reston, VA, Paris, France, and Singapore [14].

Although the federal government now wants a minimum distance of 300 miles between primary and backup facilities, some agencies are resisting this idea because of the high cost associated with establishing a geographically removed site. Note that regional data centers require their own contingency plans and these must be coordinated with overall enterprise-level plans [6]. To preclude the possibility of a directed attack, either network- or physically based, some government agencies and companies (e.g., AT&T) do not publicly reveal the location of their backup sites [14].

Organize an Alternate, Geographically Separated Response Team

Quite often, organizations will identify candidates for disaster recovery teams from among the individuals who live in the immediate geographic area as the facility or facilities for which the plan is written and who already work for, or support, that facility. This certainly makes good common sense; however, if there is widespread death resulting from a catastrophic event in this location, the people comprising the response teams may very well be among those who are killed or disabled.

Disruptions or gaps in the regular chain of command should be anticipated [8]. Even if this worst-case scenario does not occur, the on-site workers may not be mentally ready to do their jobs, as was the case for many people in the Ground Zero area [3, 9]. Organizations should establish two teams of individuals to fill these roles -- one in the immediate vicinity and the other located at least 100 miles away from the primary facility. This team would be different from one that would be in place at an alternate hot, warm, or cold site. This team's function would be identical to that of the people who would normally be on-site at the primary facility.

The idea is to for the second team to be located close enough so they can travel to the facility without undue difficulty, yet far enough away that they are unlikely to be victims of the catastrophe. Training should be identical for both teams and both should be involved in the annual testing of the organizations' contingency-related plans, processes, and procedures. Driving directions to the facility to be used by the second team should be reviewed as part of the annual test. This team, as well as the primary on-site team, should have building and site maps that show utility shut-off points, gas lines, exits, stairways, designated escape routes, restricted areas, and high-value items [13]. It is imperative that these maps be tightly held and protected as company confidential.

Identify Alternate, Geographically Separated Storage and Staging Areas

For reasons similar to those necessitating a second response team, organizations should identify a second local off-site storage area for backup tapes and critical hardware items. It should be located at least 100 miles away from the site being supported. This storage facility should be in the same general area as the alternate local response team. Many establishments keep backup tapes in the same city or town as the facility at which they would be used.

This practice should be continued because close proximity of the backups to the facility will allow a faster recovery time. However, if a catastrophic event is so widespread that several square miles in the city or town are affected, the local off-site storage facility might also be affected. This was the case with Verizon after the 9/11 attacks [7]. MasterCard stated after 9/11 that it would locate backups within three to four hours driving range [9]. If a private corporation is highly IT dependent and its primary facility and all of its backups are destroyed, it is highly likely that the firm will be forced to go out of business.

If the disaster scenario necessitates the activation of the alternate response team, as discussed above, then it logically follows that this team will need a local staging area. Assuming that the alternate team will be driving to the primary site, they will need an area to assemble equipment and load it into their vehicles. For convenience, the staging area should be close to the alternate local off-site storage facility. Depending on the nature of the emergency, the team may need to stay at the primary facility for an extended time period, so they will need to bring extra luggage for clothes and possibly food, water, and temporary shelter-related items.

The team should bring everything to the staging area and check each item off a list before loading it. AT&T reported that staff drove to New York from as far away as Jacksonville, Florida [14]. Kemper Casualty Company had computer and telephone technicians drive from Illinois to New Jersey with tools and equipment such as telephone switch parts and server components [8]. If the organization's IT budget permits, organizations should consider employing mobile backup facilities [13].

Establish a Senior Management Help Desk

Quickly establishing a senior management help desk after a disaster is important for several reasons. First, it provides a fallback set of managers that the disaster recovery team can contact in the event that key local managers or recovery team members are missing from the recovery management chain, as documented in the contingency plan, due to injury or death. Second, such a help desk facilitates quick decision making, which is needed in a crisis. For instance, being able to get fast management approval on purchasing requests can be important to meeting established recovery time frames. This help desk would also serve as the focal point for the media. Finally, this help desk serves to keep the recovery teams focused on the established recovery priorities.

Even the very best contingency plan will not address every issue that can come up during an emergency. This is the nature of Murphy's Law. The senior management help desk will be able to review proposed solutions to such issues and ensure that they are consistent with the other recovery efforts. Note that this is actually a lesson learned from the United States House of Representatives after the anthrax attacks, rather than the 9/11 tragedy.

Identify Alternate Vendors and Shipping Routes

Companies should establish "quick ship" contracts with vendors for product and equipment replacements in the event of an emergency [3]. They should also anticipate that vendors may be saturated in an emergency situation, so they should identify alternate vendors and establish "as needed" contracts with them. Vendor contact information and contract numbers must be maintained. If a small local vendor is used, organizations should inquire as to their contingency preparations. It should be expected that a vendor would divulge information only if a nondisclosure agreement (NDA) is in place. It is quite possible that the same disaster scenarios will affect these companies, just as they affect the organizations they support.

Organizations also need to prepare a prioritized list of the equipment types and vendors they use to support each equipment category. According to NetworkWorld Fusion, some industry analysts say that up to 60% of an organization's critical data is stored on individual laptops and desktops [13]. After 9/11, SunGard confirmed that peripherals such as printers were commonly overlooked [3]. The lack of certain equipment, however mundane (e.g., copiers) can affect productivity. Kemper Casualty Company stated that they were able to get immediate delivery of 100 laptops and 200 monitors from their vendor, along with printers and fax machines [8]. In the first three weeks after 9/11, Compaq shipped 2,500 PCs to Lehman Brothers [4].

Assess Locations of Carrier Circuits

The potential for carrier failures should be addressed in an organization's overall network strategy as well as its contingency plan. Equant lost connectivity to Canada after the 9/11 attacks because the circuit they had purchased for network diversity reasons was routed through Wall Street; however, it was also connected to Toronto and Montreal [14]. Equant stated that they did not know this was the case before 9/11. Equant has since reexamined local connectivity for all of its networks [14].

Redundancy and Data Mirroring Are Crucial to High-Availability Systems

Assuming that an organization's IT budget can accommodate the high cost of redundancy for its high availability systems, this is key to fast recovery time. Lehman Brothers was able to recover its IT capabilities quickly after the 9/11 attacks because they had redundant networks in both Manhattan and New Jersey [4]. "Every application that ran in New York also ran in New Jersey. All wide-area links were completely duplicated." They had lost access to everything in New York, but were able to access all of their other branches through New Jersey [4].

CBS Marketwatch.com has said that the 9/11 property loss changed their views on data replication [3]. "The data for their live tickers used to flow into one data center. Now it flows into all three." Southwestern Bell has installed SONET rings and multiple OC-12 links for redundancy [9]. Upon post-9/11 review of its contingency plans, MasterCard expects to expand its use of data mirroring, real-time data duplication, much more significantly [9]. Their goal is to reduce recovery time from 24 hours to two hours [9].

Establish an Emergency Communication Plan

Telephone service was down in many areas on the day of the tragedy, either because wires were down or because service requests saturated the system, as was the case with cell phones. Many organizations reported that BlackBerry wireless devices were their only effective means of communicating for several days following the 9/11 tragedy. Bob Schwartz, managing director and Chief Technology Officer for Lehman Brothers during the tragedy, said that as he went down the stairs in Tower One, all he had to activate the disaster recovery plan and alert other managers was his BlackBerry pager [4].

Organizations need to consider how key members of the management staff and the response team could communicate if phones are out and the email system is unavailable. BlackBerries have proven to be an effective alternative, but organizations should continually monitor the marketplace to identify other options, too. The main point is that organizations cannot rely on phones and email only. At this time, it is unclear whether BlackBerries would become less effective as a means of emergency communication as the market saturates. It should be noted that the BlackBerry uses a store and forward communication method; they keep trying to send a message even if the sender is not continually pressing the send button. Whatever type of device is selected, inter- and intra-team communications with them should be tested as part of the overall contingency plan testing process.

Minimally, senior managers and all members of the disaster recovery teams should be assigned some sort of backup communication device. Senior managers in the Human Resources Department should be included on the distribution list for these devices so they can begin making arrangements for counseling and other forms of employee assistance in the wake of a disaster or catastrophic event. Human Resources can assist with contingency plan formulation by identifying resources in the local community, such as public safety and utilities, as well as a list of mental health professionals who can assist with post-disaster counseling.

Utilization of local hotels as emergency work areas should not be overlooked. Lehman Brothers used Sheraton Manhattan to give their people a place to work [4]. The hotel's ballroom became an IT hub with Virtual Private Network (VPN) connectivity to New Jersey. Human Resources can also distribute wallet-size emergency phone numbers for personnel and their families and instruct employees on the need for personal emergency plans and kits, as described below.

Do Not Assume Air Transportation Will Be Available

All U.S. planes were grounded for several days after the 9/11 attacks and international planes attempting to enter U.S. airspace were turned away. Many organizations' contingency plans relied on the assumption that airlines would be operational to transport both people and equipment, as needed. MasterCard is one firm that had assumed this, and they reportedly are now updating contingency plans to account for a potential lack of air travel [9]. Other means of transportation such as rail and trucking lines were not identified much less contracted.

This issue also pertains to vendor shipping routes. Long-distance driving directions for key organizational managers and disaster recovery team members were typically not available or, if they were, they were not verified as part of the contingency plan testing process. The lesson here is that multiple means of transportation need to be identified and contracted for on an as-needed, emergency basis.

Identify a Meeting Place Away from Your Facility

Many businesses that operated in the World Trade Center had established meeting areas at which personnel would gather after evacuating their offices so that managers could take a head count of work force members and subsequently alert authorities if anyone was missing. In many cases, the meeting spot was in the basement of the building. It would appear that the idea of the whole building collapsing did not seem significant but, unfortunately, we now know that this is a viable scenario. Emergency meeting spots should be at least a block or two away from the office building to preclude people from being injured in a structural collapse.

Conduct Monthly Evacuation Drills

Evacuation drills, often referred to as fire drills, should be conducted periodically, and the feasibility of the assigned meeting spot(s) should be evaluated after such drills. Drills should be both announced and unannounced. As an organization grows, additional meeting spots may be needed. It is suggested that announced drills be conducted monthly. In his book The Myth of Homeland Security, Marcus Ranum states that the discipline of constant drills will stand you in good stead even in a completely different kind of emergency [15].

Everyone Needs a Personal Emergency Plan

In the confusion after the attacks, many parents could not get to their children's daycare centers to pick up their kids. Because phone service was largely unavailable, the parents could not call each other to see if one of them had the kids. To help reduce these types of uncertainties, each household should have a personal emergency plan [1]. Such a plan would identify a meeting spot outside of the house; location of emergency bug-out kit; location of will, trust documents, and other important papers; copies of medical prescriptions; lists of each person's allergies; and anything else that would be needed in an emergency.

An out-of-state third party should be identified to serve as a message relay center for the family members. If cell phones are down, a parent could get on a land line and leave messages with this third party. Each adult family member should have a will, which should identify guardians for the children in the event that both parents are killed, durable power of attorney for healthcare, and one for financial decisions. Each family member should also have a wallet-sized card that has all important phone numbers on it as well as current photographs of the other family members.

All Personnel Need Emergency Kits

Many survivors of the 9/11 tragedy were horribly burned and, at the time of this writing, some continue to face additional surgeries and rehabilitation therapy. Organizations must educate employees on the need for employees to put together a portable emergency kit that they can keep in their work areas. Such a kit should contain a personal-sized fire extinguisher, fire blanket, flashlight and batteries, an escape ladder, safety glasses, bottle of water, mask to help prevent smoke inhalation injuries, detailed street map of the city or town, first aid kit, and other appropriate items. These should all fit into a large tote bag that a person can grab and run with.

As part of its responsibility in this area, organizations should hold quarterly seminars that discuss building evacuation routes, identify primary and alternate meeting spots, warn employees against using elevators in an emergency, particularly one involving fire, and provide contact information for employees' loved ones in the event of an emergency. Identification of evacuation routes is especially important in skyscraper buildings.

Conclusion

Perhaps the greatest lesson that the 9/11 tragedy taught us is not to underestimate the threats to our nation or our people. The idea of initiating attacks on buildings using airplanes as weapons was known long before 2001; however, most people thought the notion so outlandish that they believed it could never happen. It has happened, and we must learn from this experience. SunGard, a leading business continuity services firm, has stated that the main problem they saw after 9/11 with companies' contingency plans was that the scope had not been completely or accurately defined [3]. This certainly lends credence to the need to expect the unexpected. We must take this lesson to heart.

As a final note, the idea of distributing lethal pathogens such as anthrax by mail, which occurred shortly after 9/11, was once considered as improbable as using airplanes to perpetuate physical attacks. Thus, in addition to the common sense advice presented in this article, organizations should also identify and address scenarios targeting key personnel as part of the contingency planning process. Experts say that contingency plans should include remote management in case of biological attack [3].

Dedication

This article is dedicated to the many heroic men and women who died on September 11, 2001.

References

[1] Department of Homeland Security. 2004. Emergencies and Disasters. Washington, DC: Department of Homeland Security. Published on the Internet at: http://www.dhs.gov/dhspublic/.

[2] Department of Justice. August 21, 2001. Department of Justice Contingency Planning Template Instructions. Gaithersburg, MD: NIST. Published on the Internet at: http://csrc.nist.gov/fasp/FASPDocs/contingency-plan/contingencyplan-template-instructions.doc.

[3] Fontana, John & Connor, Deni. November 26, 2001. Disaster Recovery Then and Now. NetworkWorld Fusion. Published on the Internet at: http://www.nwfusion.com/research/2001/1126featside1.html.

[4] Gaudin, Sharon. November 26, 2001. Lehman Brothers' Network Survives. NetworkWorld Fusion. Published on the Internet at: http://www.nwfusion.com/research/2001/1126feat.html.

[5] Houston, Carey. 2004. Lessons Learned. Calgary, Canada: PRIMEDIA Business Magazines & Media, Inc. Published on the Internet at: http://technologymeetings.com.

[6] Legato Systems, Inc. February 1, 2002. The New Art of Business Continuance Planning: Lessons Learned in a Changed World. Framingham, MA: CXO Media, Inc. Published on the Internet at: http://www.cio.com/sponsors/020102legato/.

[7] Leung, Linda. May 13, 2002. Prepare For Emergency. NetworkWorld Fusion. Published on the Internet at: http://www.nwfusion.com/research/2002/0513man.html.

[8] MacSweeney, Greg. September 11, 2002. One Year Later, 9/11 Disaster Recovery Memories Still Fresh. Insurance & Technology Online. Published on the Internet at: http://www.insurancetech.com.

[9] Messmer, Ellen. December 2, 2002. MasterCard Factors 9/11 into Disaster-Recovery Plan. NetworkWorld Fusion. Published on the Internet at: http://www.nwfusion.com/news/2002/1202mastercard.html.

[10] NetworkWorld, Inc. Undated. Disaster Recovery. NetworkWorld Fusion. Published on the Internet at: http://www.nwfusion.com/research/disasterrecov.html.

[11] NIST. June 2002. Contingency Planning Guide for Information Technology Systems. NIST Special Publication 800-34. Gaithersburg, MD: NIST.

[12] NIST. June 2002. Contingency Planning Guide For Information Technology Systems. Elizabeth B. Lennon (Editor). Gaithersburg, MD: NIST Information Technology Laboratory. Published on the Internet at: http://csrc.nist.gov/publications/nistbul/itl06-02.txt.

[13] Ohlson, Kathleen. November 26, 2001. Planning for the Worst: Bring in the Best. NetworkWorld Fusion. Published on the Internet at: http://www.nwfusion.com/research/2001/1126featside5.html.

[14] Pappalardo, Denise & Marsan, Carolyn Duffy. November 26, 2001. How Ready Are the Nation's Networks? NetworkWorld Fusion. Published on the Internet at: http://www.nwfusion.com/research/2001/1126featside3.html.

[15] Ranum, Marcus J. 2004. The Myth of Homeland Security. Indianapolis, IN: Wiley Publishing, Inc.

An employee of Science Applications International Corporation (SAIC), Lisa Jaworski has more than 20 years of security engineering experience on commercial and government projects. She is a key player in the development of SAIC's standardized approach to Critical Infrastructure Protection (CIP). She is also an expert in Health Insurance Portability and Accountability Act (HIPAA) security and privacy requirements. She is one of the authors of NIST's Computer Security Handbook and she was part of the team that had connected the White House to the Internet. Per Government invitation, she has spoken on information warfare at FedCIRC. She can be contacted at: jaworskil@saic.com.