SolarisTM
BSM Auditing
Hal Pomeranz
When enabled, the Solaris Basic Security Module (BSM) can create
an extremely detailed audit trail for all processes on the system.
The level of auditing produced is at the level required by systems
attempting to achieve the DoD "C2" level certification. The simplest
description of BSM auditing that I've been able to come up with
is to imagine running truss -- the Solaris system call tracing
tool -- on every single process on the system and saving the resulting
output to a file. BSM actually provides even more detailed information
than that.
While this information can be incredibly useful for both systems
administration and forensic purposes, we're obviously talking about
enormous amounts of data here. As a lower bound, consider that the
small workgroup server that runs my two-person consulting business
generates about 120 MB of BSM logs per month using the configuration
discussed later in this article. A machine that actually does
something, like a compute server or end-user workstation, will
typically generate perhaps an order of magnitude more information
than my relatively meager audit trail. It's a good thing disk space
is cheap these days.
While Sun's BSM reference [5] is quite complete, the initial stumbling
block for BSM has always been a lack of clearly defined standards
or recommendations on which events are necessary or useful to audit
and that just generate "noise" in the audit logs. Both DISA [1]
and Sun [2] have made their recommendations, and recently part of
my work with the Center for Internet Security has been to rationalize
these recommendations along with other input to produce some consensus
standards [3]. This article is both an introduction to BSM auditing
on Solaris and a discussion of the configuration recommendations
we've developed via the CIS consensus project.
BSM Basics
BSM is not enabled by default under Solaris. The administrator
is required to run the "bsmconv" script to set up the initial
auditing environment for the system. Sun strongly recommends running
the bsmconv script only in single-user mode, but honestly
I always do my BSM configuration in multi-user mode and have never
encountered any problems. Your mileage, as always, may vary.
bsmconv creates a number of files in the /etc/security
directory. The audit_startup script is invoked at boot time
and sets a number of different audit policies for the system. The
audit_control file is the primary configuration file for
BSM. The audit_class and audit_event files can be
used when more fine-grained control of the audit configuration is
required. There are a number of other BSM-related files in /etc/security,
but the four files mentioned here are the most critical for this
article.
The audit_startup script is simply a series of auditconfig
commands to initialize the system auditing policy:
#!/bin/sh
/usr/sbin/auditconfig -conf
/usr/sbin/auditconfig -aconf
/usr/sbin/auditconfig -setpolicy none
/usr/sbin/auditconfig -setpolicy +cnt
/usr/sbin/auditconfig -setpolicy +argv,arge
The first two lines pull configuration information out of the audit_control
file and set up the basic events that the system will audit. The remaining
lines set other special auditing policy options.
"-setpolicy none" first blanks the audit policy for the
system so we start with a clean slate. "setpolicy +cnt" then
tells the system to continue running even if the auditing partition
on the machine fills up (high-security sites are required to have
the machine shut down if auditing becomes impossible, "-setpolicy
-cnt"). "-setpolicy +argv,arge" means to track the full
command line and all environment settings for any command executed
on the system. Note that the "-setpolicy +argv,arge" line
is not part of the default BSM configuration set up by the bsmconv
script, but it is part of the consensus recommendations from CIS[3].
The audit_control file appears deceptively simple:
dir:/var/audit
minfree:20
flags:lo,ad,pc,fm,fw,-fc,-fd,-fr
naflags:lo,ad,ex
"dir" is the directory where audit logs will be
written on the system -- it's a good idea to make sure this directory
is only accessible by the superuser. There is no built-in facility
for writing audit logs to some other system, although some sites have
tried writing to an NFS-mounted directory from some central file server
(note that this configuration requires the client system to have root
write privileges into the NFS volume, which has some significant security
implications). "minfree" specifies the amount of free space
-- as a percentage -- that must exist in the auditing partition or
else the system starts complaining. So in our case, once our audit
partition goes above 80% full, the auditing subsystem starts sending
the administrator annoying warning messages (via the /etc/security/audit_warn
script).
The "flags" and "naflags" lines are the really interesting
part of the file. These lines define which audit events the system
is actually going to pay attention to (these are the lines that
the "auditconfig -conf" and "auditconfig -aconf" commands
in audit_startup are looking at). The two-letter codes are
groups ("audit classes") of related events (system calls) defined
via the audit_class and audit_events files. For example,
the "fr" class deals with "file read" type events, which
mostly consist of various permutations of the open() system
call when a file is opened for reading, but also includes events
like readlink() for dealing with symbolic link files. For
any given class, you have the option of monitoring only system call
failures ("-fr"), only successful calls ("+fr"), or
both ("fr").
The "flags" line defines the "audit vector" for normal
user sessions on the machine. The "naflags" line catches
all "non-attributable" events on the system events that are not
associated with a particular user's session. Usually, non-attributable
events are the result of system processes and do not occur that
frequently. Most of the interesting "tuning" for BSM happens on
the "flags" line, and I'll cover these settings in a lot more detail
in the latter part of this article.
Audit logs are written to binary files in your audit directory.
The file naming convention used is "<start>.<end>.<hostname>",
where <start> and <end> are time/date
stamps in the format "YYYYMMDDhhmmss" and <hostname>
is the fully qualified hostname of the local machine. Actually,
the current audit log that's actively being written is named "<start>.not_terminated.<hostname>"
to distinguish it from the other audit logs in the directory.
The command "audit -n" signals the system audit daemon
to close its current audit log file and start a new one. Unless
told otherwise, the audit daemon will simply continue writing to
the current audit log, and it will grow without bound until it reaches
the file size limit for the machine or fills the partition. The
CIS recommendation [3] is to put the following line in root's crontab
to force audit logs to be restarted at the top of every hour:
0 * * * * /usr/sbin/audit -n
Once the new audit log has been started, the old log can be compressed
and/or moved off of the local system for archival purposes. Some sites
that are concerned about attackers sabotaging or removing the audit
logs actually rotate their audit logs much more frequently (some every
5 minutes) so that they can move the audit logs to some secure repository
elsewhere on the network.
Since the audit log files are binary data, you need a special
tool to read them. The praudit command will dump an audit
trail in a variety of different formats. The auditreduce
program can be used to select particular audit events based on different
criteria such as username, time of day, etc. However, the output
of auditreduce is still in the internal binary format used
in the audit log files themselves, so you must pipe the output of
auditreduce into praudit to achieve any intelligible
result.
Beyond that, you're on your own when it comes to interpreting
the audit output. Most sites end up "rolling their own" Perl scripts
to parse and report on the auditing data. Some of the commercial
IDS vendors have been working on incorporating the BSM audit trail
into their host-based IDS products as a near real-time system auditing
mechanism, but this is still far more art than science.
Tuning Auditing
As I mentioned earlier, the real difficulty with BSM is tuning
the level of auditing on the system. The trick is to strike a balance
between getting the events you need to reconstruct what's been happening
on the system, while filtering out uninteresting events that add
"noise" to the audit trail and consume huge amounts of disk space.
One size does not fit all here. For example, the DoD requirements
specified in the DISA STIG [1] reflect very high levels of auditing
required for secure military sites. Most organizations don't require
this level of auditing and can save enormous amounts of disk space
by eliminating some of these audit events from their default audit
vectors in the audit_control file.
I'll first explain which events are covered by some of the more
useful audit classes, and then I'll make some recommendations for
preserving your sanity and your disk space.
General Events -- "lo" (login) and "ad" (administrative)
The "lo" (login) class covers all forms of system logins
as well as use of the su command. The "ad" (administrative)
class covers a wide variety of administrative actions including
rebooting the system, adding and deleting users, changing auditing
and logging parameters, mounting and unmounting file both local
and remote file systems, changing quotas, loading kernel modules,
and even setting the system clock. Every configuration recommendation
I've ever seen recommends tracking both successes and failures for
events in these classes. The Sun recommendations [2] include some
hints for customizing the "ad" class to reduce some of the
noise from uninteresting events, but this is not a huge optimization.
Process Events -- "ex" (execution) and "pc" (process control)
The "ex" and "pc" classes deal with process execution
on the system. Actually, the two events in the "ex" class
-- exec() and execve() system calls -- are also contained
in the "pc" class, so if your audit vector includes "pc"
then you don't need to worry about "ex".
Aside from the exec() and execve() system calls
that are actually used to execute programs on the system, the "pc"
class also tracks everything that the process might do during its
lifetime -- changing directories, calling setuid() and setgid()
to change its privilege level, making chroot() calls, creating
child processes with fork() and vfork(), etc. The
"pc" class also tracks administrative interaction with processes
on the system, like kill and nice.
The problem here is that "pc" tracks various system calls
that aren't usually interesting. For example, you probably don't
care to know every time your mail server fork()s a new child
process to deal with an incoming connection. What you really care
about is when new processes get started on the system -- typically
with a fork() followed by an exec(). So we really
just want track the exec()s. Similarly, keeping track of
every single chdir() call by a process is going to drive
you nuts.
So, we need a way to track the important events from the "pc"
class but ignore the uninteresting ones. This requires actually
creating a new custom class that includes just the events that we
want. More on this later.
File Attribute Modification -- The "fm" Class
The "fm" class tracks changes to file attributes like ownership
(chown) and permissions (chmod) and even extended
file ACL settings. However, "fm" also tracks file locking
and updating timestamps on files. These latter events are way too
frequent on normal Unix systems to be anything other than "noise"
in your logs. Again, we'll need some way to customize the "fm"
class so that we see only the interesting stuff in our audit trail.
Other File Actions -- "fc" (create), "fr" (read), "fw" (write),
"fd" (delete)
The decision about whether to include the other "file" event classes
to track creating, deleting, reading, and modifying files was probably
the most contentious aspect of our auditing discussions within CIS.
Sun [2] recommends avoiding these audit classes to reduce the size
of the audit trail. On the other hand, the DoD guidelines [1] require
tracking at least failure for these classes (actually the specific
recommendation is "fw,-fc,-fd,-fr").
These classes really can generate an enormous number of audit
events and consume huge amounts of disk space. Think about compiling
a huge software package from source code -- not only are you going
to generate all of those new *.o files, but the compiler
will probably be creating and deleting intermediate result files
in some system temp directory.
The "fr" class can really kill you during process execution
because each process execution involves searching your LD_LIBRARY_PATH
for shared library files. You have to walk through every directory
in LD_LIBRARY_PATH for each *.so, and every time you
"miss" and don't find the shared object in the early directories
in your search path, you generate a read "failure". I've often wished
that the auditing system could distinguish between "file not found"
(ENOENT) and "permission denied" (EACCES) so that
I could audit one and not the other. Unfortunately, this is not
possible at this time.
Ultimately, we decided to err on the side of caution in the CIS
recommendations [3]. The default recommendation is not to turn on
any auditing of these classes, although we document the DISA recommendations
in the notes for the BSM item. This is not to say that these classes
do not cover important events on the system (particularly from a
forensic perspective); we just didn't want people who followed our
recommendations to suddenly start running out of disk space due
to their audit logging. If you have the disk space to burn, you
might consider auditing "fc,fd,fw". The "fr" class
just adds too much noise for my taste, but again your mileage may
vary.
Custom Audit Classes
Audit class names are defined in the audit_class file.
Here are the audit class entries for the classes I've talked about
so far:
0x00000001:fr:file read
0x00000002:fw:file write
0x00000008:fm:file attribute modify
0x00000010:fc:file create
0x00000020:fd:file delete
0x00000080:pc:process
0x00000800:ad:administrative
0x00001000:lo:login or logout
0x40000000:ex:exec
0xffffffff:all:all classes
The first field of each line is a unique bit mask that's used to represent
the audit class in the internals of the auditing subsystem (there
are "gaps" in the numbering between the lines above because there
are some other audit classes in the default audit_class file
that I'm not showing you). The second field is the class code used
in the "flags" and "naflags" lines in audit_control,
and the third field is just a brief descriptive name that's for the
use of the systems administrator.
When creating a custom class, you need to pick a bit mask and
a two-letter class code that are not currently in use by any other
class. In the default audit_class file installed by bsmconv,
bit masks from 0x00010000 through 0x08000000 are not
used. Our CIS recommendations [3] create a custom class called "cc"
with a bit mask of 0x08000000:
0x08000000:cc:CIS custom class
Once you've defined your new class in the audit_class file,
you associate the two-letter class code with specific events via the
audit_event file. Here's a sample line from that file:
7:AUE_EXEC:exec(2):pc,ex,cc
The first two fields are a unique code number and code name to identify
the event. The third field is purely descriptive. The last field describes
which audit classes this event is associated with. As noted earlier,
exec() calls are monitored by both the "pc" and "ex"
classes. We've added our custom "cc" class to the end of the
line so it picks up the exec() events, too.
So, what events do we actually want to monitor with our custom
class? Here's the awk code we use to modify the audit_event
file in the CIS document [3]:
awk 'BEGIN { FS = ":"; OFS = ":" }
($4 ~ /fm/) && ! ($2 ~ /MCTL|FCNTL|FLOCK|UTIME/) \
{ $4 = $4 ",cc" }
($4 ~ /pc/) && \
! ($2 ~ /FORK|CHDIR|KILL|VTRACE|SETGROUPS|SETPGRP/) \
{ $4 = $4 ",cc" }
{ print }' audit_event >audit_event.new
Essentially, this code is saying that we want all "fm" events
except for file locking (mctl()/fcntl()/flock() handle file
locking plus some other stuff) and timestamp updates with utime()
plus all "pc" events except fork()/vfork(), chdir(),
kill(), vtrace(), setgroups(), and setpgrp().
Check the manual pages if you have any questions about what this last
set of system calls does. Once you've verified that the audit_event.new
file looks the way you want it, make a backup copy of the original
audit_event file and replace it with your new version.
Now that we've fully defined our custom class, we actually have
to use it in the audit vector in the audit_control file.
Here's the audit_control file from our CIS recommendations
[3]:
dir:/var/audit
minfree:20
flags:lo,ad,cc
naflags:lo,ad,ex
After making all of these changes, the system must be rebooted for
the changes to take effect.
BSM Caveats
It's important to mention a few additional items before closing
this article:
- Enabling BSM automatically disables the <Stop>-A
keyboard sequence on the machine. After all, you want to be able
to monitor shutdown and reboot events and associate them with
a particular user. Disabling <Stop>-A means somebody
has to log in, become root, and halt the machine. All of these
are auditable events.
- Enabling BSM disables "auto-mounting" of CD-ROMs and floppies
via vold. Again, there's an audit trail issue if a system
process spontaneously mounts and unmounts file systems.
- There are known interoperability problems between OpenSSH (particularly
with PrivSep enabled) and BSM. The most noticeable issue
is that OpenSSH sessions will not appear in your audit logs at
all. A patch [4] is available to fix this and some other issues.
Conclusion
The first step toward BSM deployment seems to have been achieved,
namely that we now have some reasonable configuration standards
that a large number of people have agreed to. The next step is developing
some good tools for reporting on events in the audit trail. The
rumor is that there is already an effort in progress within Sun
Microsystems to do just that. Keep your fingers crossed.
References
1. DISA Unix STIG -- http://csrc.nist.gov/pcig/STIGs/unix-stig-v4r4-091503.zip
2. "Auditing in the Solaris 8 Operating Environment", William
Osser and Alex Noordergraaf -- http://www.sun.com/blueprints/browsesubject.html#security
3. Center for Internet Security "Solaris Benchmark" document --
http://www.cisecurity.com/bench_solaris.html
4. OpenSSH patch to help with BSM auditing -- http://bugzilla.mindrot.org/show_bug.cgi?id=125
5. "SunSHIELD Basic Security Module Guide" -- http://docs.sun.com/db/doc/806-1789
Hal Pomeranz (hal@deer-run.com) spent so much time haggling
about audit flags during the CIS consensus process that he actually
started having dreams about them. As scary as this may sound, this
was actually an improvement over his previous subconscious activity.
|