Article

dec2004.tar

Solaris^TM BSM Auditing

Hal Pomeranz

When enabled, the Solaris Basic Security Module (BSM) can create an extremely detailed audit trail for all processes on the system. The level of auditing produced is at the level required by systems attempting to achieve the DoD "C2" level certification. The simplest description of BSM auditing that I've been able to come up with is to imagine running truss -- the Solaris system call tracing tool -- on every single process on the system and saving the resulting output to a file. BSM actually provides even more detailed information than that.

While this information can be incredibly useful for both systems administration and forensic purposes, we're obviously talking about enormous amounts of data here. As a lower bound, consider that the small workgroup server that runs my two-person consulting business generates about 120 MB of BSM logs per month using the configuration discussed later in this article. A machine that actually does something, like a compute server or end-user workstation, will typically generate perhaps an order of magnitude more information than my relatively meager audit trail. It's a good thing disk space is cheap these days.

While Sun's BSM reference [5] is quite complete, the initial stumbling block for BSM has always been a lack of clearly defined standards or recommendations on which events are necessary or useful to audit and that just generate "noise" in the audit logs. Both DISA [1] and Sun [2] have made their recommendations, and recently part of my work with the Center for Internet Security has been to rationalize these recommendations along with other input to produce some consensus standards [3]. This article is both an introduction to BSM auditing on Solaris and a discussion of the configuration recommendations we've developed via the CIS consensus project.

BSM Basics

BSM is not enabled by default under Solaris. The administrator is required to run the "bsmconv" script to set up the initial auditing environment for the system. Sun strongly recommends running the bsmconv script only in single-user mode, but honestly I always do my BSM configuration in multi-user mode and have never encountered any problems. Your mileage, as always, may vary.

bsmconv creates a number of files in the /etc/security directory. The audit_startup script is invoked at boot time and sets a number of different audit policies for the system. The audit_control file is the primary configuration file for BSM. The audit_class and audit_event files can be used when more fine-grained control of the audit configuration is required. There are a number of other BSM-related files in /etc/security, but the four files mentioned here are the most critical for this article.

The audit_startup script is simply a series of auditconfig commands to initialize the system auditing policy:

#!/bin/sh
/usr/sbin/auditconfig -conf
/usr/sbin/auditconfig -aconf
/usr/sbin/auditconfig -setpolicy none
/usr/sbin/auditconfig -setpolicy +cnt
/usr/sbin/auditconfig -setpolicy +argv,arge

The first two lines pull configuration information out of the audit_control file and set up the basic events that the system will audit. The remaining lines set other special auditing policy options.

"-setpolicy none" first blanks the audit policy for the system so we start with a clean slate. "setpolicy +cnt" then tells the system to continue running even if the auditing partition on the machine fills up (high-security sites are required to have the machine shut down if auditing becomes impossible, "-setpolicy -cnt"). "-setpolicy +argv,arge" means to track the full command line and all environment settings for any command executed on the system. Note that the "-setpolicy +argv,arge" line is not part of the default BSM configuration set up by the bsmconv script, but it is part of the consensus recommendations from CIS[3].

The audit_control file appears deceptively simple:

dir:/var/audit
minfree:20
flags:lo,ad,pc,fm,fw,-fc,-fd,-fr
naflags:lo,ad,ex

"dir" is the directory where audit logs will be written on the system -- it's a good idea to make sure this directory is only accessible by the superuser. There is no built-in facility for writing audit logs to some other system, although some sites have tried writing to an NFS-mounted directory from some central file server (note that this configuration requires the client system to have root write privileges into the NFS volume, which has some significant security implications). "minfree" specifies the amount of free space -- as a percentage -- that must exist in the auditing partition or else the system starts complaining. So in our case, once our audit partition goes above 80% full, the auditing subsystem starts sending the administrator annoying warning messages (via the /etc/security/audit_warn script).

The "flags" and "naflags" lines are the really interesting part of the file. These lines define which audit events the system is actually going to pay attention to (these are the lines that the "auditconfig -conf" and "auditconfig -aconf" commands in audit_startup are looking at). The two-letter codes are groups ("audit classes") of related events (system calls) defined via the audit_class and audit_events files. For example, the "fr" class deals with "file read" type events, which mostly consist of various permutations of the open() system call when a file is opened for reading, but also includes events like readlink() for dealing with symbolic link files. For any given class, you have the option of monitoring only system call failures ("-fr"), only successful calls ("+fr"), or both ("fr").

The "flags" line defines the "audit vector" for normal user sessions on the machine. The "naflags" line catches all "non-attributable" events on the system events that are not associated with a particular user's session. Usually, non-attributable events are the result of system processes and do not occur that frequently. Most of the interesting "tuning" for BSM happens on the "flags" line, and I'll cover these settings in a lot more detail in the latter part of this article.

Audit logs are written to binary files in your audit directory. The file naming convention used is "<start>.<end>.<hostname>", where <start> and <end> are time/date stamps in the format "YYYYMMDDhhmmss" and <hostname> is the fully qualified hostname of the local machine. Actually, the current audit log that's actively being written is named "<start>.not_terminated.<hostname>" to distinguish it from the other audit logs in the directory.

The command "audit -n" signals the system audit daemon to close its current audit log file and start a new one. Unless told otherwise, the audit daemon will simply continue writing to the current audit log, and it will grow without bound until it reaches the file size limit for the machine or fills the partition. The CIS recommendation [3] is to put the following line in root's crontab to force audit logs to be restarted at the top of every hour:

0 * * * * /usr/sbin/audit -n

Once the new audit log has been started, the old log can be compressed and/or moved off of the local system for archival purposes. Some sites that are concerned about attackers sabotaging or removing the audit logs actually rotate their audit logs much more frequently (some every 5 minutes) so that they can move the audit logs to some secure repository elsewhere on the network.

Since the audit log files are binary data, you need a special tool to read them. The praudit command will dump an audit trail in a variety of different formats. The auditreduce program can be used to select particular audit events based on different criteria such as username, time of day, etc. However, the output of auditreduce is still in the internal binary format used in the audit log files themselves, so you must pipe the output of auditreduce into praudit to achieve any intelligible result.

Beyond that, you're on your own when it comes to interpreting the audit output. Most sites end up "rolling their own" Perl scripts to parse and report on the auditing data. Some of the commercial IDS vendors have been working on incorporating the BSM audit trail into their host-based IDS products as a near real-time system auditing mechanism, but this is still far more art than science.

Tuning Auditing

As I mentioned earlier, the real difficulty with BSM is tuning the level of auditing on the system. The trick is to strike a balance between getting the events you need to reconstruct what's been happening on the system, while filtering out uninteresting events that add "noise" to the audit trail and consume huge amounts of disk space. One size does not fit all here. For example, the DoD requirements specified in the DISA STIG [1] reflect very high levels of auditing required for secure military sites. Most organizations don't require this level of auditing and can save enormous amounts of disk space by eliminating some of these audit events from their default audit vectors in the audit_control file.

I'll first explain which events are covered by some of the more useful audit classes, and then I'll make some recommendations for preserving your sanity and your disk space.

General Events -- "lo" (login) and "ad" (administrative)

The "lo" (login) class covers all forms of system logins as well as use of the su command. The "ad" (administrative) class covers a wide variety of administrative actions including rebooting the system, adding and deleting users, changing auditing and logging parameters, mounting and unmounting file both local and remote file systems, changing quotas, loading kernel modules, and even setting the system clock. Every configuration recommendation I've ever seen recommends tracking both successes and failures for events in these classes. The Sun recommendations [2] include some hints for customizing the "ad" class to reduce some of the noise from uninteresting events, but this is not a huge optimization.

Process Events -- "ex" (execution) and "pc" (process control)

The "ex" and "pc" classes deal with process execution on the system. Actually, the two events in the "ex" class -- exec() and execve() system calls -- are also contained in the "pc" class, so if your audit vector includes "pc" then you don't need to worry about "ex".

Aside from the exec() and execve() system calls that are actually used to execute programs on the system, the "pc" class also tracks everything that the process might do during its lifetime -- changing directories, calling setuid() and setgid() to change its privilege level, making chroot() calls, creating child processes with fork() and vfork(), etc. The "pc" class also tracks administrative interaction with processes on the system, like kill and nice.

The problem here is that "pc" tracks various system calls that aren't usually interesting. For example, you probably don't care to know every time your mail server fork()s a new child process to deal with an incoming connection. What you really care about is when new processes get started on the system -- typically with a fork() followed by an exec(). So we really just want track the exec()s. Similarly, keeping track of every single chdir() call by a process is going to drive you nuts.

So, we need a way to track the important events from the "pc" class but ignore the uninteresting ones. This requires actually creating a new custom class that includes just the events that we want. More on this later.

File Attribute Modification -- The "fm" Class

The "fm" class tracks changes to file attributes like ownership (chown) and permissions (chmod) and even extended file ACL settings. However, "fm" also tracks file locking and updating timestamps on files. These latter events are way too frequent on normal Unix systems to be anything other than "noise" in your logs. Again, we'll need some way to customize the "fm" class so that we see only the interesting stuff in our audit trail.

Other File Actions -- "fc" (create), "fr" (read), "fw" (write), "fd" (delete)

The decision about whether to include the other "file" event classes to track creating, deleting, reading, and modifying files was probably the most contentious aspect of our auditing discussions within CIS. Sun [2] recommends avoiding these audit classes to reduce the size of the audit trail. On the other hand, the DoD guidelines [1] require tracking at least failure for these classes (actually the specific recommendation is "fw,-fc,-fd,-fr").

These classes really can generate an enormous number of audit events and consume huge amounts of disk space. Think about compiling a huge software package from source code -- not only are you going to generate all of those new *.o files, but the compiler will probably be creating and deleting intermediate result files in some system temp directory.

The "fr" class can really kill you during process execution because each process execution involves searching your LD_LIBRARY_PATH for shared library files. You have to walk through every directory in LD_LIBRARY_PATH for each *.so, and every time you "miss" and don't find the shared object in the early directories in your search path, you generate a read "failure". I've often wished that the auditing system could distinguish between "file not found" (ENOENT) and "permission denied" (EACCES) so that I could audit one and not the other. Unfortunately, this is not possible at this time.

Ultimately, we decided to err on the side of caution in the CIS recommendations [3]. The default recommendation is not to turn on any auditing of these classes, although we document the DISA recommendations in the notes for the BSM item. This is not to say that these classes do not cover important events on the system (particularly from a forensic perspective); we just didn't want people who followed our recommendations to suddenly start running out of disk space due to their audit logging. If you have the disk space to burn, you might consider auditing "fc,fd,fw". The "fr" class just adds too much noise for my taste, but again your mileage may vary.

Custom Audit Classes

Audit class names are defined in the audit_class file. Here are the audit class entries for the classes I've talked about so far:

0x00000001:fr:file read
0x00000002:fw:file write
0x00000008:fm:file attribute modify
0x00000010:fc:file create
0x00000020:fd:file delete
0x00000080:pc:process
0x00000800:ad:administrative
0x00001000:lo:login or logout
0x40000000:ex:exec
0xffffffff:all:all classes

The first field of each line is a unique bit mask that's used to represent the audit class in the internals of the auditing subsystem (there are "gaps" in the numbering between the lines above because there are some other audit classes in the default audit_class file that I'm not showing you). The second field is the class code used in the "flags" and "naflags" lines in audit_control, and the third field is just a brief descriptive name that's for the use of the systems administrator.

When creating a custom class, you need to pick a bit mask and a two-letter class code that are not currently in use by any other class. In the default audit_class file installed by bsmconv, bit masks from 0x00010000 through 0x08000000 are not used. Our CIS recommendations [3] create a custom class called "cc" with a bit mask of 0x08000000:

0x08000000:cc:CIS custom class

Once you've defined your new class in the audit_class file, you associate the two-letter class code with specific events via the audit_event file. Here's a sample line from that file:

7:AUE_EXEC:exec(2):pc,ex,cc

The first two fields are a unique code number and code name to identify the event. The third field is purely descriptive. The last field describes which audit classes this event is associated with. As noted earlier, exec() calls are monitored by both the "pc" and "ex" classes. We've added our custom "cc" class to the end of the line so it picks up the exec() events, too.

So, what events do we actually want to monitor with our custom class? Here's the awk code we use to modify the audit_event file in the CIS document [3]:

awk 'BEGIN { FS = ":"; OFS = ":" }
($4 ~ /fm/) && ! ($2 ~ /MCTL|FCNTL|FLOCK|UTIME/) \
    { $4 = $4 ",cc" }
($4 ~ /pc/) && \
! ($2 ~ /FORK|CHDIR|KILL|VTRACE|SETGROUPS|SETPGRP/) \
    { $4 = $4 ",cc" }
{ print }' audit_event >audit_event.new

Essentially, this code is saying that we want all "fm" events except for file locking (mctl()/fcntl()/flock() handle file locking plus some other stuff) and timestamp updates with utime() plus all "pc" events except fork()/vfork(), chdir(), kill(), vtrace(), setgroups(), and setpgrp(). Check the manual pages if you have any questions about what this last set of system calls does. Once you've verified that the audit_event.new file looks the way you want it, make a backup copy of the original audit_event file and replace it with your new version.

Now that we've fully defined our custom class, we actually have to use it in the audit vector in the audit_control file. Here's the audit_control file from our CIS recommendations [3]:

dir:/var/audit
minfree:20
flags:lo,ad,cc
naflags:lo,ad,ex

After making all of these changes, the system must be rebooted for the changes to take effect.

BSM Caveats

It's important to mention a few additional items before closing this article:

Enabling BSM automatically disables the <Stop>-A keyboard sequence on the machine. After all, you want to be able to monitor shutdown and reboot events and associate them with a particular user. Disabling <Stop>-A means somebody has to log in, become root, and halt the machine. All of these are auditable events.
Enabling BSM disables "auto-mounting" of CD-ROMs and floppies via vold. Again, there's an audit trail issue if a system process spontaneously mounts and unmounts file systems.
There are known interoperability problems between OpenSSH (particularly with PrivSep enabled) and BSM. The most noticeable issue is that OpenSSH sessions will not appear in your audit logs at all. A patch [4] is available to fix this and some other issues.

Conclusion

The first step toward BSM deployment seems to have been achieved, namely that we now have some reasonable configuration standards that a large number of people have agreed to. The next step is developing some good tools for reporting on events in the audit trail. The rumor is that there is already an effort in progress within Sun Microsystems to do just that. Keep your fingers crossed.

References

1. DISA Unix STIG -- http://csrc.nist.gov/pcig/STIGs/unix-stig-v4r4-091503.zip

2. "Auditing in the Solaris 8 Operating Environment", William Osser and Alex Noordergraaf -- http://www.sun.com/blueprints/browsesubject.html#security

3. Center for Internet Security "Solaris Benchmark" document -- http://www.cisecurity.com/bench_solaris.html

4. OpenSSH patch to help with BSM auditing -- http://bugzilla.mindrot.org/show_bug.cgi?id=125

5. "SunSHIELD Basic Security Module Guide" -- http://docs.sun.com/db/doc/806-1789

Hal Pomeranz (hal@deer-run.com) spent so much time haggling about audit flags during the CIS consensus process that he actually started having dreams about them. As scary as this may sound, this was actually an improvement over his previous subconscious activity.

SolarisTM BSM Auditing

Solaris^TM BSM Auditing