Chapter
11:
PREPARING
FOR CONTINGENCIES AND DISASTERS
A computer security contingency is an event with the potential
to disrupt computer operations, thereby disrupting critical mission
and business functions. Such an event could be a power outage, hardware
failure, fire, or storm. If the event is very destructive, it is often
called a disaster.84
Contingency
planning directly supports an organization's goal of continued
operations. Organizations practice contingency planning because
it makes good business sense. |
To avert potential contingencies
and disasters or minimize the damage they cause organizations can
take steps early to control the event. Generally called contingency
planning85, this activity is closely
related to incident handling, which primarily addresses malicious
technical threats such as hackers and viruses.86
Contingency planning involves
more than planning for a move offsite after a disaster destroys a
data center. It also addresses how to keep an organization's critical
functions operating in the event of disruptions, both large and small.
This broader perspective on contingency planning is based on the distribution
of computer support throughout an organization.
This chapter presents the
contingency planning process in six steps:87
- Identifying
the mission- or business-critical functions.
- Identifying
the resources that support the critical functions.
- Anticipating
potential contingencies or disasters.
- Selecting
contingency planning strategies.
- Implementing
the contingency strategies.
- Testing
and revising the strategy.
11.1 Step 1: Identifying
the Mission- or Business-Critical Function
This
chapter refers to an organization as having critical mission
or business functions. In government organizations, the
focus is normally on performing a mission, such as providing citizen
benefits. In private organizations, the focus is normally on conducting
a business, such as manufacturing widgets. |
Protecting the continuity
of an organization's mission or business is very difficult if it is
not clearly identified. Managers need to understand the organization
from a point of view that usually extends beyond the area they control.
The definition of an organization's critical mission or business functions
is often called a business plan.
Since the development of
a business plan will be used to support contingency planning, it is
necessary not only to identify critical missions and businesses, but
also to set priorities for them. A fully redundant capability
for each function is prohibitively expensive for most organizations.
In the event of a disaster, certain functions will not be performed.
If appropriate priorities have been set (and approved by senior management),
it could mean the difference in the organization's ability to survive
a disaster.
11.2 Step 2: Identifying
the Resources That Support Critical Functions
In
many cases, the longer an organization is without a resource,
the more critical the situation becomes. For example, the longer
a garbage collection strike lasts, the more critical the situation
becomes. |
After identifying critical
missions and business functions, it is necessary to identify the
supporting resources, the time frames in which each resource is
used (e.g., is the resource needed constantly or only at the end
of the month?), and the effect on the mission or business of the
unavailability of the resource. In identifying resources, a traditional
problem has been that different managers oversee different resources.
They may not realize how resources interact to support the organization's
mission or business. Many of these resources are not computer
resources. Contingency planning should address all the resources
needed to perform a function, regardless whether they directly relate
to a computer.88
The analysis of needed
resources should be conducted by those who understand how the function
is performed and the dependencies of various resources on other
resources and other critical relationships. This will allow an organization
to assign priorities to resources since not all elements
of all resources are crucial to the critical functions.
11.2.1 Human Resources
Resources
That Support Critical Functions
Human
Resources
Processing Capability
Computer-Based Services
Data and Applications
Physical Infrastructure
Documents and Papers
|
People are perhaps an
organization's most obvious resource. Some functions require the
effort of specific individuals, some require specialized expertise,
and some only require individuals who can be trained to perform
a specific task. Within the information technology field, human
resources include both operators (such as technicians or system
programmers) and users (such as data entry clerks or information
analysts).
11.2.2 Processing Capability
Contingency
Planning Teams
To understand
what resources are needed from each of the six resource categories
and to understand how the resources support critical functions,
it is often necessary to establish a contingency planning
team. A typical team contains representatives from various
organizational elements, and is often headed by a contingency
planning coordinator. It has representatives from the following
three groups:
- business-oriented
groups , such as representatives from functional areas;
- facilities
management; and
- technology
management.
Various
other groups are called on as needed including financial management,
personnel, training, safety, computer security, physical security,
and public affairs.
|
Traditionally contingency
planning has focused on processing power (i.e., if the data center
is down, how can applications dependent on it continue to be processed?).
Although the need for data center backup remains vital, today's
other processing alternatives are also important. Local area networks
(LANs), minicomputers, workstations, and personal computers in all
forms of centralized and distributed processing may be performing
critical tasks.
11.2.3 Automated Applications
and Data
Computer systems run
applications that process data. Without current electronic versions
of both applications and data, computerized processing may not be
possible. If the processing is being performed on alternate hardware,
the applications must be compatible with the alternate hardware,
operating systems and other software (including version and configuration),
and numerous other technical factors. Because of the complexity,
it is normally necessary to periodically verify compatibility. (See
Step 6, Testing and Revising.)
11.2.4 Computer-Based
Services
An organization uses
many different kinds of computer-based services to perform its functions.
The two most important are normally communications services and
information services. Communications can be further categorized
as data and voice communications; however, in many organizations
these are managed by the same service. Information services include
any source of information outside of the organization. Many of these
sources are becoming automated, including on-line government and
private databases, news services, and bulletin boards.
11.2.5 Physical Infrastructure
For people to work effectively,
they need a safe working environment and appropriate equipment and
utilities. This can include office space, heating, cooling, venting,
power, water, sewage, other utilities, desks, telephones, fax machines,
personal computers, terminals, courier services, file cabinets,
and many other items. In addition, computers also need space and
utilities, such as electricity. Electronic and paper media used
to store applications and data also have physical requirements
11.2.6 Documents and
Papers
Many functions rely on
vital records and various documents, papers, or forms. These records
could be important because of a legal need (such as being able to
produce a signed copy of a loan) or because they are the only record
of the information. Records can be maintained on paper, microfiche,
microfilm, magnetic media, or optical disk.
11.3 Step 3: Anticipating
Potential Contingencies or Disasters
Although it is impossible
to think of all the things that can go wrong, the next step
is to identify a likely range of problems. The development of scenarios
will help an organization develop a plan to address the wide range
of things that can go wrong.
Scenarios should include
small and large contingencies. While some general classes of contingency
scenarios are obvious, imagination and creativity, as well as research,
can point to other possible, but less obvious, contingencies. The
contingency scenarios should address each of the resources described
above. The following are examples of some of the types of
questions that contingency scenarios may address:
Examples
of Some Less Obvious Contingencies
1. A computer
center in the basement of a building had a minor problem with
rats. Exterminators killed the rats, but the bodies were not
retrieved because they were hidden under the raised flooring
and in the pipe conduits. Employees could only enter the data
center with gas masks because of the decomposing rats.
2. After
the World Trade Center explosion when people reentered the
building, they turned on their computer systems to check for
problems. Dust and smoke damaged many systems when they were
turned on. If the systems had been cleaned first, there
would not have been significant damage.
|
Human Resources:
Can people get to work? Are key personnel willing to cross a picket
line? Are there critical skills and knowledge possessed by one person?
Can people easily get to an alternative site?
Processing Capability:
Are the computers harmed? What happens if some of the computers
are inoperable, but not all?
Automated Applications
and Data: Has data integrity been affected? Is an application
sabotaged? Can an application run on a different processing platform?
Computer-Based Services:
Can the computers communicate? To where? Can people communicate?
Are information services down? For how long?
Infrastructure:
Do people have a place to sit? Do they have equipment to do their
jobs? Can they occupy the building?
Documents/Paper:
Can needed records be found? Are they readable?
11.4 Step 4: Selecting
Contingency Planning Strategies
The next step is to plan
how to recover needed resources. In evaluating alternatives, it
is necessary to consider what controls are in place to prevent and
minimize contingencies. Since no set of controls can cost-effectively
prevent all contingencies, it is necessary to coordinate prevention
and recovery efforts.
A contingency planning
strategy normally consists of three parts: emergency response, recovery,
and resumption.89 Emergency response
encompasses the initial actions taken to protect lives and limit
damage. Recovery refers to the steps that are taken to continue
support for critical functions. Resumption is the return
to normal operations. The relationship between recovery and resumption
is important. The longer it takes to resume normal operations, the
longer the organization will have to operate in the recovery mode.
Example
1: If the system administrator for
a LAN has to be out of the office for a long time (due to
illness or an accident), arrangements are made for the system
administrator of another LAN to perform the duties. Anticipating
this, the absent administrator should have taken steps beforehand
to keep documentation current. This strategy is inexpensive,
but service will probably be significantly reduced on both
LANs which may prompt the manager of the loaned administrator
to partially renege on the agreement.
Example
2: An organization depends on an on-line information service
provided by a commercial vendor. The organization is no longer
able to obtain the information manually (e.g., from a reference
book) within acceptable time limits and there are no other
comparable services. In this case, the organization relies
on the contingency plan of the service provider. The organization
pays a premium to obtain priority service in case the service
provider has to operate at reduced capacity.
Example
#3: A large mainframe data center has a contract with
a hot site vendor, has a contract with the telecommunications
carrier to reroute communications to the hot site, has plans
to move people, and stores up-to-date copies of data, applications
and needed paper records off-site. The contingency plan is
expensive, but management has decided that the expense is
fully justified.
Example
#4. An organization distributes its processing among two
major sites, each of which includes small to medium processors
(personal computers and minicomputers). If one site is lost,
the other can carry the critical load until more equipment
is purchased. Routing of data and voice communications can
be performed transparently to redirect traffic. Backup copies
are stored at the other site. This plan requires tight control
over the architectures used and types of applications that
are developed to ensure compatibility. In addition, personnel
at both sites must be cross-trained to perform all functions.
|
The selection of a strategy
needs to be based on practical considerations, including feasibility
and cost. The different categories of resources should each be considered.
Risk assessment can be used to help estimate the cost of options
to decide on an optimal strategy. For example, is it more expensive
to purchase and maintain a generator or to move processing to an
alternate site, considering the likelihood of losing electrical
power for various lengths of time? Are the consequences of a loss
of computer-related resources sufficiently high to warrant the cost
of various recovery strategies? The risk assessment should focus
on areas where it is not clear which strategy is the best.
In developing contingency
planning strategies, there are many factors to consider in addressing
each of the resources that support critical functions. Some examples
are presented in the sidebars.
11.4.1 Human Resources
To ensure an organization
has access to workers with the right skills and knowledge, training
and documentation of knowledge are needed. During a major contingency,
people will be under significant stress and may panic. If the contingency
is a regional disaster, their first concerns will probably be their
family and property. In addition, many people will be either unwilling
or unable to come to work. Additional hiring or temporary services
can be used. The use of additional personnel may introduce security
vulnerabilities.
Contingency planning,
especially for emergency response, normally places the highest emphasis
on the protection of human life.
11.4.2 Processing Capability
Strategies for processing
capability are normally grouped into five categories: hot site;
cold site; redundancy; reciprocal agreements; and hybrids. These
terms originated with recovery strategies for data centers but can
be applied to other platforms.
1. Hot site --
A building already equipped with processing capability and other
services.
2. Cold site --
A building for housing processors that can be easily adapted for
use.
3. Redundant site
-- A site equipped and configured exactly like the primary site.
(Some organizations plan on having reduced processing capability
after a disaster and use partial redundancy. The stocking of spare
personal computers or LAN servers also provides some redundancy.)
4. Reciprocal agreement
-- An agreement that allows two organizations to back each other
up. (While this approach often sounds desirable, contingency planning
experts note that this alternative has the greatest chance of failure
due to problems keeping agreements and plans up-to-date as systems
and personnel change.)
5. Hybrids --
Any combinations of the above such as using having a hot site as
a backup in case a redundant or reciprocal agreement site is damaged
by a separate contingency.
Recovery may include
several stages, perhaps marked by increasing availability of processing
capability. Resumption planning may include contracts or the ability
to place contracts to replace equipment.
11.4.3 Automated Applications
and Data
The
need for computer security does not go away when an organization
is processing in a contingency mode. In some cases, the need
may increase due to sharing processing facilities, concentrating
resources in fewer sites, or using additional contractors and
consultants. Security should be an important consideration when
selecting contingency strategies. |
Normally, the primary
contingency strategy for applications and data is regular backup
and secure offsite storage. Important decisions to be addressed
include how often the backup is performed, how often it is stored
off-site, and how it is transported (to storage, to an alternate
processing site, or to support the resumption of normal operations).
11.4.4 Computer-Based
Services
Service providers may
offer contingency services. Voice communications carriers often
can reroute calls (transparently to the user) to a new location.
Data communications carriers can also reroute traffic. Hot sites
are usually capable of receiving data and voice communications.
If one service provider is down, it may be possible to use another.
However, the type of communications carrier lost, either local or
long distance, is important. Local voice service may be carried
on cellular. Local data communications, especially for large volumes,
is normally more difficult. In addition, resuming normal operations
may require another rerouting of communications services.
11.4.5 Physical Infrastructure
Hot sites and cold sites
may also offer office space in addition to processing capability
support. Other types of contractual arrangements can be made for
office space, security services, furniture, and more in the event
of a contingency. If the contingency plan calls for moving offsite,
procedures need to be developed to ensure a smooth transition back
to the primary operating facility or to a new facility. Protection
of the physical infrastructure is normally an important part of
the emergency response plan, such as use of fire extinguishers or
protecting equipment from water damage.
11.4.6 Documents and
Papers
The primary contingency
strategy is usually backup onto magnetic, optical, microfiche, paper,
or other medium and offsite storage. Paper documents are generally
harder to backup than electronic ones. A supply of forms and other
needed papers can be stored offsite.
11.5 Step
5: Implementing the Contingency Strategies
Once the contingency
planning strategies have been selected, it is necessary to make
appropriate preparations, document the strategies, and train employees.
Many of these tasks are ongoing.
11.5.1 Implementation
Much preparation is needed
to implement the strategies for protecting critical functions and
their supporting resources. For example, one common preparation
is to establish procedures for backing up files and applications.
Another is to establish contracts and agreements, if the
contingency strategy calls for them. Existing service contracts
may need to be renegotiated to add contingency services. Another
preparation may be to purchase equipment, especially to support
a redundant capability.
Backing
up data files and applications is a critical part of virtually
every contingency plan. Backups are used, for example, to restore
files after a personal computer virus corrupts the files or
after a hurricane destroys a data processing center. |
It is important to keep
preparations, including documentation, up-to-date. Computer systems
change rapidly and so should backup services and redundant equipment.
Contracts and agreements may also need to reflect the changes. If
additional equipment is needed, it must be maintained and periodically
replaced when it is no longer dependable or no longer fits the organization's
architecture.
Preparation should also
include formally designating people who are responsible for various
tasks in the event of a contingency. These people are often referred
to as the contingency response team. This team is often composed
of people who were a part of the contingency planning team.
There are many important
implementation issues for an organization. Two of the most important
are 1) how many plans should be developed? and 2) who prepares each
plan? Both of these questions revolve around the organization's
overall strategy for contingency planning. The answers should be
documented in organization policy and procedures.
How many plans?.
Relationship
Between Contingency Plans and Computer Security Plans
For small
or less complex systems, the contingency plan may be a part
of the computer security plan. For larger or more complex
systems, the computer security plan could contain a brief
synopsis of the contingency plan, which would be a separate
document.
|
Some organizations
have just one plan for the entire organization, and others have
a plan for every distinct computer system, application, or other
resource. Other approaches recommend a plan for each business or
mission function, with separate plans, as needed, for critical resources.
The answer
to the question, therefore, depends upon the unique circumstances
for each organization. But it is critical to coordinate between
resource managers and functional managers who are responsible for
the mission or business.
Who Prepares
the Plan?
If an organization
decides on a centralized approach to contingency planning, it may
be best to name a contingency planning coordinator. The coordinator
prepares the plans in cooperation with various functional and resource
managers. Some organizations place responsibility directly with
the functional and resource managers.
11.5.2
Documenting
The contingency
plan needs to be written, kept up-to-date as the system and other
factors change, and stored in a safe place. A written plan is critical
during a contingency, especially if the person who developed the
plan is unavailable. It should clearly state in simple language
the sequence of tasks to be performed in the event of a contingency
so that someone with minimal knowledge could immediately begin to
execute the plan. It is generally helpful to store up-to-date copies
of the contingency plan in several locations, including any off-site
locations, such as alternate processing sites or backup data storage
facilities.
11.5.3
Training
All personnel
should be trained in their contingency-related duties. New personnel
should be trained as they join the organization, refresher training
may be needed, and personnel will need to practice their skills.
Training
is particularly important for effective employee response during
emergencies. There is no time to check a manual to determine correct
procedures if there is a fire. Depending on the nature of the emergency,
there may or may not be time to protect equipment and other assets.
Practice is necessary in order to react correctly, especially when
human safety is involved.
11.6 Step
6: Testing and Revising
Contingency
plan maintenance can be incorporated into procedures for change
management so that upgrades to hardware and software are reflected
in the plan.
|
A contingency
plan should be tested periodically because there will undoubtedly
be flaws in the plan and in its implementation. The plan will become
dated as time passes and as the resources used to support critical
functions change. Responsibility for keeping the contingency plan
current should be specifically assigned. The extent and frequency
of testing will vary between organizations and among systems. There
are several types of testing, including reviews, analyses, and simulations
of disasters.
A review
can be a simple test to check the accuracy of contingency plan documentation.
For instance, a reviewer could check if individuals listed are still
in the organization and still have the responsibilities that caused
them to be included in the plan. This test can check home and work
telephone numbers, organizational codes, and building and room numbers.
The review can determine if files can be restored from backup tapes
or if employees know emergency procedures.
The
results of a "test" often implies a grade assigned
for a specific level of performance, or simply pass or fail.
However, in the case of contingency planning, a test should
be used to improve the plan. If organizations do not use this
approach, flaws in the plan may remain hidden and uncorrected. |
An analysis
may be performed on the entire plan or portions of it, such as emergency
response procedures. It is beneficial if the analysis is performed
by someone who did not help develop the contingency plan
but has a good working knowledge of the critical function and supporting
resources. The analyst(s) may mentally follow the strategies in
the contingency plan, looking for flaws in the logic or process
used by the plan's developers. The analyst may also interview functional
managers, resource managers, and their staff to uncover missing
or unworkable pieces of the plan.
Organizations
may also arrange disaster simulations. These tests provide
valuable information about flaws in the contingency plan and provide
practice for a real emergency. While they can be expensive, these
tests can also provide critical information that can be used to
ensure the continuity of important functions. In general, the more
critical the functions and the resources addressed in the contingency
plan, the more cost-beneficial it is to perform a disaster simulation.
11.7 Interdependencies
Since all
controls help to prevent contingencies, there is an interdependency
with all of the controls in the handbook.
Risk
Management provides a tool for analyzing the security costs
and benefits of various contingency planning options. In addition,
a risk management effort can be used to help identify critical resources
needed to support the organization and the likely threat to those
resources. It is not necessary, however, to perform a risk assessment
prior to contingency planning, since the identification of critical
resources can be performed during the contingency planning process
itself.
Physical
and Environmental Controls help prevent contingencies. Although
many of the other controls, such as logical access controls, also
prevent contingencies, the major threats that a contingency plan
addresses are physical and environmental threats, such as fires,
loss of power, plumbing breaks, or natural disasters.
Incident
Handling can be viewed as a subset of contingency planning.
It is the emergency response capability for various technical threats.
Incident handling can also help an organization prevent future incidents.
Support
and Operations in most organizations includes the periodic backing
up of files. It also includes the prevention and recovery from more
common contingencies, such as a disk failure or corrupted data files.
Policy
is needed to create and document the organization's approach to
contingency planning. The policy should explicitly assign responsibilities.
11.8 Cost
Considerations
The cost
of developing and implementing contingency planning strategies can
be significant, especially if the strategy includes contracts for
backup services or duplicate equipment. There are too many options
to discuss cost considerations for each type.
One contingency cost
that is often overlooked is the cost of testing a plan. Testing
provides many benefits and should be performed, although some of
the less expensive methods (such as a review) may be sufficient
for less critical resources.
References
Alexander, M. ed. "Guarding
Against Computer Calamity." Infosecurity News. 4(6), 1993.
pp. 26-37.
Coleman, R. "Six
Steps to Disaster Recovery." Security Management. 37(2),
1993. pp. 61-62.
Dykman, C., and C. Davis,
eds. Control Objectives - Controls in an Information Systems
Environment: Objectives, Guidelines, and Audit Procedures, fourth
edition. Carol Stream, IL: The EDP Auditors Foundation, Inc., 1992
(especially Chapter 3.5).
Fites, P., and M. Kratz,
Information Systems Security: A Practitioner's Reference.
New York, NY: Van Nostrand Reinhold, 1993 (esp. Chapter 4, pp. 95-112).
FitzGerald, J. "Risk
Ranking Contingency Plan Alternatives." Information Executive.
3(4), 1990. pp. 61-63.
Helsin, C. "Business
Impact Assessment." ISSA Access. 5(3), 1992, pp. 10-12.
Isaac, I. Guide on
Selecting ADP Backup Process Alternatives. Special Publication
500-124. Gaithersburg, MD: National Bureau of Standards, November
1985.
Kabak, I., and Beam,
T. "On the Frequency and Scope of Backups." Information
Executive, 4(2), 1991. pp. 58-62.
Kay, R. "What's
Hot at Hotsites?" Infosecurity News, 4(5), 1993. pp.
48-52.
Lainhart, J., and Donahue,
M. Computerized Information Systems (CIS) Audit Manual: A Guideline
to CIS Auditing in Governmental Organizations. Carol Stream,
IL: The EDP Auditors Foundation Inc., 1992.
National Bureau of Standards.
Guidelines for ADP Contingency Planning. Federal Information
Processing Standard 87. 1981.
Rhode, R., and Haskett,
J. "Disaster Recovery Planning for Academic Computing Centers."
Communications of the ACM. 33(6), 1990. pp. 652-657.
Footnotes:
84.
There is no distinct dividing line between disasters and other contingencies.
85. Other names include disaster recovery, business
continuity, continuity of operations, or business resumption planning.
86. Some organizations include incident handling
as a subset of contingency planning. The relationship is further discussed
in Chapter 12, Incident Handling.
87. Some organizations and methodologies may use
a different order, nomenclature, number, or combination of steps.
The specific steps can be modified, as long as the basic functions
are addressed.
88. However, since this is a computer security handbook,
the descriptions here focus on the computer-related resources. The
logistics of coordinating contingency planning for computer-related
and other resources is an important consideration.
89. Some organizations divide a contingency strategy
into emergency response, backup operations, and recovery. The different
terminology can be confusing (especially the use of conflicting definitions
of recovery), although the basic functions performed are the
same.
|