Managing
Enterprise Alerts
John Spurgeon and Ed Schaefer
We currently have more than 30 shell scripts monitoring the health
of the worldwide enterprise servers and databases that we manage.
Each of these scripts generates messages that are sent to various
users. In the past, controlling these alerts was an administrative
nightmare. To better control which users receive which alerts and
when, we developed the Enterprise Alert Process. The process, written
entirely in Korn shell, includes the following utilities:
create_alert_file -- The shell scripts that monitor the
system call this utility, which creates an ASCII alert file in the
new alerts directory. The alert file name is a combination of server
name, date, time, priority, and a class id. That same data as well
as a subject line comprise the header of the alert file. Any input
piped to the create_alert_file utility appends to the alert file.
send_alerts -- This utility sends alert messages to email
addresses. send_alerts consults a configuration file that determines
which addresses receive the alert and when. Once processed, send_alerts
moves alert files from the new alerts to the sent alerts directory.
Typically, send_alerts executes periodically from cron, but in the
case of high-priority alerts, for example, the program that calls
the create_alert_file utility may immediately call send_alerts.
enable_alerts/disable_alerts -- In the event of scheduled
events, such as shutting down the database for maintenance, alerts
can be turned on and off by executing the enable_alerts and disable_alerts
scripts, respectively.
send_summary -- The send_summary script periodically sends
an email summarizing recently sent alerts.
Creating an Alert with privileged_login
The privileged_login script (Listing 1) demonstrates how easy
it is to create an alert. This script creates an alert every time
someone logs in as the root or admin user. Place this stub in the
/etc/profile or somewhere else in the environment:
export TERM_USER=$(who -m | cut -d' ' -f1)
export REAL_USER=$(id | cut -d\( -f2 | cut -d\) -f1)
case $REAL_USER in
root|admin)
if /opt/logins/bin/privileged_login
then
:
else
exit $?
fi
;;
esac
# end stub
If the user su's to root or admin, the privileged_login script
executes. Within privileged_login, if the user chooses to continue
at the prompt, the following code stub executes the create_alert_file
script (Listing 2):
sudo -u monitor /opt/monitor/bin/create_alert_file \
-p high \
-c $(basename $0)-$$ \
-s "$TERM_USER became $REAL_USER" \
< /dev/null
# end stub
Permissions and Security
We want users other than root to execute the enterprise alert
utilities. For security purposes, the owner and group of the utilities
and the associated directory structure is "monitor". In the code
above, executing the create_alert_file script via the sudo utility
ensures that new alert files are created with the desired owner
and group.
Executing create_alert_file
The create_alert_file utility generates an alert file in the $NEW_ALERTS_DIR,
with a name based on host, date, priority, and class id. The priority,
class id, host, and subject are command-line options -p,
-c, -h, and -s, respectively.
Data from the command-line options echoed in a here document forms
the file's heading, provided the alert hasn't been disabled by class
id, host, or priority. Optionally, redirected or piped input is
added to the file:
cat >> $alert_file
In the privileged_login script, the create_alert_file call redirects
/dev/null to the utility creating a file with just the heading information.
However, the following example:
sudo -u monitor echo "end privileged user example"| \
/opt/monitor/bin/create_alert_file \
-p high \
-c $(basename $0)-$$ \
-s "$TERM_USER became $REAL_USER"
could create a file named:
-rw-r--r-- monitor monitor doomsday.031124.122244.high.privileged_login-22780
The contents of the file would be:
03-11-24 12:22:44
host=doomsday
priority=high
class=root_profile-eds.22780
subject: root became eds
end privileged user example
Enable/disable Events
You'll also need to disable alerts. For example, you wouldn't
need a "database-is-down" alert during scheduled maintenance. The
disable_alerts utility (Listing 3) supports disabling alerts by
class id (option -c), host (option -h), or priority
(option -p).
For example, the following command:
sudo -u monitor disable_alerts -h doomsday
prevents any alert messages from being sent from the doomsday server.
The script creates a flag file named "doomsday" in the $dir/host directory.
Recall that the create_alert_file utility checks this host directory
for the existence of the proper host flag file; if it finds such a
file, the alert will not be generated.
Although disabling an alert involves creating a file with the
touch command, enabling the alert happens simply by removing
the file. The basename of the executing script determines whether
the flag file is created or removed:
case $(basename $0) in
'disable_alerts') command='touch' ;;
'enable_alerts') command='rm -f' ;;
esac
Because enable_alerts shares the same source code as disable_alerts,
the two files can be hard links, or one file can be a symbolic link
that points to the other. In the tarball, we used hard links.
Processing Alerts with send_alerts
The send_alerts shell script (Listing 4) processes any alerts
existing in the $NEW_ALERTS_DIR directory. This is the program design
logic:
1. Source the globals file.
2. Process the command-line options with the get_options function.
3. Filter duplicate alerts with the filter_alerts script.
For each alert file:
4. Determine whether the alert should be sent based on its priority
or class id.
5. Process each block of the configuration file. After all blocks
have been processed, send the alert to all eligible email addresses.
6. Move the alert to the sent directory.
Sourcing the send_globals File
All the shell utilities share many of the same global constants.
The send_globals (Listing 5) sources the setenv_monitor file (Listing
6), which in turn sources the setenv_path file (Listing 7).
The send_globals file contains the get_alert_attribute, get_options,
strip_config_file, check_priority, check_class, should_include,
should_send, and get_config function definitions.
Setting Supported Options
The get_options function supports -c, -p, and -f,
which are the class id, priority, and alert config file options,
respectively. The class id option allows processing only one class
of alert -- not every alert that exists in the new alerts directory.
The priority option supports sending alerts by a particular priority:
high, medium, low, or info.
Removing Redundant Alert Files
Utilities such as privileged_login may create duplicate alerts
between invocations of the send_alerts script. The filter_alerts
script (Listing 8), which is called from send_alerts before the
configuration process, eliminates redundant alerts.
For each alert file sorted in reverse order, the filter_alert
script builds an attribute string to identify duplicate alerts.
If any duplicates exist, the filter_alert script moves the duplicate
files to the sent directory without sending an alert.
Explaining the Configuration File
A configuration file contains blocks of information separated
by one or more blank lines. These blocks specify criteria to determine
which alerts should be sent to whom. By default, the file $MONITOR_DIR/etc/monitor.conf
file is used. Alternate configuration files can be specified using
the -f option when calling the send_alerts or send_summary
scripts.
Configuration blocks are generally independent of each other,
and their affect is additive. For example, if block 1 says to send
high-priority alerts to user "eds", and block 2 says to send medium-priority
alerts to user "eds", then both high and medium alerts will be sent
to that user. However, when send_alerts is called, a given alert
is sent to a particular address only once, regardless of how many
times the same alert/address pair is specified by criteria in different
configuration blocks.
For a complete description of all configuration parameters as
well as detailed examples, see the configuration example file (Listing
10).
Processing the Configuration File One Block at a Time
Before processing the configuration file, all comments are removed
by calling the strip_config_file function. Then a while loop copies
the configuration blocks one at a time to temporary file $TEMP_CONFIG_FILE.
The get_conf function call (located in send_globals) parses $TEMP_CONFIG_FILE
setting the keyword values required to process the alert.
The alert is eligible to be sent if the message_type key word
equals "alert" and the should_include and should_send functions
return true. The should_include function processes the priorities,
exclude, and include keywords. The should_send function processes
the hosts and the date and time objects.
If the alert priority passes the eligibility requirements, build
the "mailx" subject line based on whether the alert is a page or
a regular email. Finally, execute the mailx command, which
sends the alert.
Summarizing Sent Alerts with send_summary
Periodically, you'll want to send a copy of all the generated
alerts for a server using the send_summary script (Listing 9). This
script, typically called from cron, summarizes recently sent alerts,
and sends an email message with a date, an alerts header, a table
of contents, and the details. Here is an example:
Mon Dec 15 14:20:59 PST 2003
Alerts on doomsday from the past day
1. doomsday.031215.121451.high.root_profile-eds.1521.031215.122001 (1)
2. doomsday.031215.102015.high.sh_history-25229.eds.031215.102805 (1)
3. doomsday.031215.101908.high.sh_history-25105.eds.031215.102004 (1)
4. doomsday.031215.101846.high.root_profile-eds.25264.031215.102003 (1)
5. doomsday.031215.101711.high.root_profile-eds.25140.031215.102002 (1)
6. doomsday.031215.083053.high.root_profile-jspurgeo.21145.031215.084002 (1)
7. doomsday.031215.000001.high.sh_history-1856.jspurgeo.031215.000501 (1)
8. doomsday.031214.214634.high.root_profile-jspurgeo.1891.031214.215001 (1)
Details:
.
.
.
While not listed, the details section above would contain the contents
of all 8 messages displayed. The number in parentheses is the number
of alert files created on a given host with the same priority and
class id.
Like the send_alerts script, the send_summary script is also driven
by a block in the configuration file. Refer to the sample configuration
file (Listing 10) for a summary example.
What Is in the Tarball
In our world, the alert utilities are in the /opt directory; the
file permissions are all read/execute, and owner and group are user
monitor. We created the tarball, which is available from the Sys
Admin Web site, relative to the /opt directory to ease placing
the files wherever you desire:
opt/monitor/bin/create_alert_file
opt/monitor/bin/enable_alerts
opt/monitor/bin/disable_alerts
opt/monitor/bin/send_alerts
opt/monitor/bin/send_globals
opt/monitor/bin/setenv_monitor
opt/monitor/bin/filter_alerts
opt/monitor/bin/send_summary
opt/logins/bin/privileged_login
opt/logins/bin/setenv_path
opt/monitor/etc/monitor.conf.example
Conclusion
In this article, we've presented our particular solution for handling
alerts. Is it definitive? Certainly not. Some enhancements you might
consider are:
1. Implement a method for mapping user names to email addresses.
This means developing your own username_to_email_address and username_to_pager_address
functions located in the send_globals file.
2. Implement a method for mapping sudoers aliases to user names.
This means developing your own sudoers_alias_to_users and sudoers_alias_to_hosts
functions also located in send_globals.
3. Consider using rsync, scp, ftp, or some other protocol to collect
new and/or sent alerts files on a centralized host.
4. Write a utility to view new and sent alerts.
5. Improve upon on rudimentary day and time parameters.
References
sudo homepage -- http://www.courtesan.com/sudo/
Powers, Shelley, Jerry Peek, et al. 2003. Unix Power Tools.
Sebastopol, CA: O'Reilly & Associates, Inc.
John Spurgeon is a software developer and systems administrator
for Intel's Factory Integrated Information Systems, FIIS, in Aloha,
Oregon. Outside of work, he enjoys turfgrass management, triathlons,
and spending time with his family.
Ed Schaefer is a frequent contributor to Sys Admin.
He is a software developer and DBA for Intel's Factory Integrated
Information Systems, FIIS, in Aloha, Oregon. Ed also edits the monthly
Shell Corner column on UnixReview.com. He can be reached at: shellcorner@comcast.net.
|