Cover V13, i04
apr2004.tar

Managing Enterprise Alerts

John Spurgeon and Ed Schaefer

We currently have more than 30 shell scripts monitoring the health of the worldwide enterprise servers and databases that we manage. Each of these scripts generates messages that are sent to various users. In the past, controlling these alerts was an administrative nightmare. To better control which users receive which alerts and when, we developed the Enterprise Alert Process. The process, written entirely in Korn shell, includes the following utilities:

create_alert_file -- The shell scripts that monitor the system call this utility, which creates an ASCII alert file in the new alerts directory. The alert file name is a combination of server name, date, time, priority, and a class id. That same data as well as a subject line comprise the header of the alert file. Any input piped to the create_alert_file utility appends to the alert file.

send_alerts -- This utility sends alert messages to email addresses. send_alerts consults a configuration file that determines which addresses receive the alert and when. Once processed, send_alerts moves alert files from the new alerts to the sent alerts directory. Typically, send_alerts executes periodically from cron, but in the case of high-priority alerts, for example, the program that calls the create_alert_file utility may immediately call send_alerts.

enable_alerts/disable_alerts -- In the event of scheduled events, such as shutting down the database for maintenance, alerts can be turned on and off by executing the enable_alerts and disable_alerts scripts, respectively.

send_summary -- The send_summary script periodically sends an email summarizing recently sent alerts.

Creating an Alert with privileged_login

The privileged_login script (Listing 1) demonstrates how easy it is to create an alert. This script creates an alert every time someone logs in as the root or admin user. Place this stub in the /etc/profile or somewhere else in the environment:

export TERM_USER=$(who -m | cut -d' ' -f1)
export REAL_USER=$(id | cut -d\( -f2 | cut -d\) -f1)

case $REAL_USER in
   root|admin)
       if /opt/logins/bin/privileged_login
       then
          :
       else
          exit $?
       fi
       ;;
esac
# end stub

If the user su's to root or admin, the privileged_login script executes. Within privileged_login, if the user chooses to continue at the prompt, the following code stub executes the create_alert_file script (Listing 2):

sudo -u monitor /opt/monitor/bin/create_alert_file \
-p high \
-c $(basename $0)-$$ \
-s "$TERM_USER became $REAL_USER" \
< /dev/null
# end stub
Permissions and Security

We want users other than root to execute the enterprise alert utilities. For security purposes, the owner and group of the utilities and the associated directory structure is "monitor". In the code above, executing the create_alert_file script via the sudo utility ensures that new alert files are created with the desired owner and group.

Executing create_alert_file

The create_alert_file utility generates an alert file in the $NEW_ALERTS_DIR, with a name based on host, date, priority, and class id. The priority, class id, host, and subject are command-line options -p, -c, -h, and -s, respectively.

Data from the command-line options echoed in a here document forms the file's heading, provided the alert hasn't been disabled by class id, host, or priority. Optionally, redirected or piped input is added to the file:

cat >> $alert_file
In the privileged_login script, the create_alert_file call redirects /dev/null to the utility creating a file with just the heading information. However, the following example:

sudo -u monitor echo "end privileged user example"| \
/opt/monitor/bin/create_alert_file \
-p high \
-c $(basename $0)-$$ \
-s "$TERM_USER became $REAL_USER"
could create a file named:

-rw-r--r--  monitor  monitor doomsday.031124.122244.high.privileged_login-22780
The contents of the file would be:

03-11-24 12:22:44
host=doomsday
priority=high
class=root_profile-eds.22780
subject: root became eds

end privileged user example
Enable/disable Events

You'll also need to disable alerts. For example, you wouldn't need a "database-is-down" alert during scheduled maintenance. The disable_alerts utility (Listing 3) supports disabling alerts by class id (option -c), host (option -h), or priority (option -p).

For example, the following command:

sudo -u monitor disable_alerts -h doomsday
prevents any alert messages from being sent from the doomsday server. The script creates a flag file named "doomsday" in the $dir/host directory. Recall that the create_alert_file utility checks this host directory for the existence of the proper host flag file; if it finds such a file, the alert will not be generated.

Although disabling an alert involves creating a file with the touch command, enabling the alert happens simply by removing the file. The basename of the executing script determines whether the flag file is created or removed:

case $(basename $0) in
    'disable_alerts') command='touch' ;;
    'enable_alerts') command='rm -f' ;;
esac
Because enable_alerts shares the same source code as disable_alerts, the two files can be hard links, or one file can be a symbolic link that points to the other. In the tarball, we used hard links.

Processing Alerts with send_alerts

The send_alerts shell script (Listing 4) processes any alerts existing in the $NEW_ALERTS_DIR directory. This is the program design logic:

1. Source the globals file.

2. Process the command-line options with the get_options function.

3. Filter duplicate alerts with the filter_alerts script.

For each alert file:

4. Determine whether the alert should be sent based on its priority or class id.

5. Process each block of the configuration file. After all blocks have been processed, send the alert to all eligible email addresses.

6. Move the alert to the sent directory.

Sourcing the send_globals File

All the shell utilities share many of the same global constants. The send_globals (Listing 5) sources the setenv_monitor file (Listing 6), which in turn sources the setenv_path file (Listing 7).

The send_globals file contains the get_alert_attribute, get_options, strip_config_file, check_priority, check_class, should_include, should_send, and get_config function definitions.

Setting Supported Options

The get_options function supports -c, -p, and -f, which are the class id, priority, and alert config file options, respectively. The class id option allows processing only one class of alert -- not every alert that exists in the new alerts directory.

The priority option supports sending alerts by a particular priority: high, medium, low, or info.

Removing Redundant Alert Files

Utilities such as privileged_login may create duplicate alerts between invocations of the send_alerts script. The filter_alerts script (Listing 8), which is called from send_alerts before the configuration process, eliminates redundant alerts.

For each alert file sorted in reverse order, the filter_alert script builds an attribute string to identify duplicate alerts. If any duplicates exist, the filter_alert script moves the duplicate files to the sent directory without sending an alert.

Explaining the Configuration File

A configuration file contains blocks of information separated by one or more blank lines. These blocks specify criteria to determine which alerts should be sent to whom. By default, the file $MONITOR_DIR/etc/monitor.conf file is used. Alternate configuration files can be specified using the -f option when calling the send_alerts or send_summary scripts.

Configuration blocks are generally independent of each other, and their affect is additive. For example, if block 1 says to send high-priority alerts to user "eds", and block 2 says to send medium-priority alerts to user "eds", then both high and medium alerts will be sent to that user. However, when send_alerts is called, a given alert is sent to a particular address only once, regardless of how many times the same alert/address pair is specified by criteria in different configuration blocks.

For a complete description of all configuration parameters as well as detailed examples, see the configuration example file (Listing 10).

Processing the Configuration File One Block at a Time

Before processing the configuration file, all comments are removed by calling the strip_config_file function. Then a while loop copies the configuration blocks one at a time to temporary file $TEMP_CONFIG_FILE.

The get_conf function call (located in send_globals) parses $TEMP_CONFIG_FILE setting the keyword values required to process the alert.

The alert is eligible to be sent if the message_type key word equals "alert" and the should_include and should_send functions return true. The should_include function processes the priorities, exclude, and include keywords. The should_send function processes the hosts and the date and time objects.

If the alert priority passes the eligibility requirements, build the "mailx" subject line based on whether the alert is a page or a regular email. Finally, execute the mailx command, which sends the alert.

Summarizing Sent Alerts with send_summary

Periodically, you'll want to send a copy of all the generated alerts for a server using the send_summary script (Listing 9). This script, typically called from cron, summarizes recently sent alerts, and sends an email message with a date, an alerts header, a table of contents, and the details. Here is an example:

Mon Dec 15 14:20:59 PST 2003

Alerts on doomsday from the past day

1. doomsday.031215.121451.high.root_profile-eds.1521.031215.122001 (1)
2. doomsday.031215.102015.high.sh_history-25229.eds.031215.102805 (1)
3. doomsday.031215.101908.high.sh_history-25105.eds.031215.102004 (1)
4. doomsday.031215.101846.high.root_profile-eds.25264.031215.102003 (1)
5. doomsday.031215.101711.high.root_profile-eds.25140.031215.102002 (1)
6. doomsday.031215.083053.high.root_profile-jspurgeo.21145.031215.084002 (1)
7. doomsday.031215.000001.high.sh_history-1856.jspurgeo.031215.000501 (1)
8. doomsday.031214.214634.high.root_profile-jspurgeo.1891.031214.215001 (1)

Details:
 .
 .
 .
While not listed, the details section above would contain the contents of all 8 messages displayed. The number in parentheses is the number of alert files created on a given host with the same priority and class id.

Like the send_alerts script, the send_summary script is also driven by a block in the configuration file. Refer to the sample configuration file (Listing 10) for a summary example.

What Is in the Tarball

In our world, the alert utilities are in the /opt directory; the file permissions are all read/execute, and owner and group are user monitor. We created the tarball, which is available from the Sys Admin Web site, relative to the /opt directory to ease placing the files wherever you desire:

opt/monitor/bin/create_alert_file
opt/monitor/bin/enable_alerts
opt/monitor/bin/disable_alerts
opt/monitor/bin/send_alerts
opt/monitor/bin/send_globals
opt/monitor/bin/setenv_monitor
opt/monitor/bin/filter_alerts
opt/monitor/bin/send_summary
opt/logins/bin/privileged_login
opt/logins/bin/setenv_path
opt/monitor/etc/monitor.conf.example
Conclusion

In this article, we've presented our particular solution for handling alerts. Is it definitive? Certainly not. Some enhancements you might consider are:

1. Implement a method for mapping user names to email addresses. This means developing your own username_to_email_address and username_to_pager_address functions located in the send_globals file.

2. Implement a method for mapping sudoers aliases to user names. This means developing your own sudoers_alias_to_users and sudoers_alias_to_hosts functions also located in send_globals.

3. Consider using rsync, scp, ftp, or some other protocol to collect new and/or sent alerts files on a centralized host.

4. Write a utility to view new and sent alerts.

5. Improve upon on rudimentary day and time parameters.

References

sudo homepage -- http://www.courtesan.com/sudo/

Powers, Shelley, Jerry Peek, et al. 2003. Unix Power Tools. Sebastopol, CA: O'Reilly & Associates, Inc.

John Spurgeon is a software developer and systems administrator for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. Outside of work, he enjoys turfgrass management, triathlons, and spending time with his family.

Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. Ed also edits the monthly Shell Corner column on UnixReview.com. He can be reached at: shellcorner@comcast.net.