Monitoring
Net Traffic with OpenBSD's Packet Filter
Randal L. Schwartz
The server for stonehenge.com lives somewhere in Texas,
in a place I've never seen. I rent a box from Sprocket Data
Systems, and they provide my remote eyes and ears, and hook me up
to their networks and power grid. I'm limited to a certain
bandwidth each month for the rate I pay, and to offset the costs,
I also sublease the box to geekcruises.com and redcat.com.
Because the bandwidth costs me actual dollars for usage and over-usage,
I needed to monitor how much is used, and by whom. This would be
easy to solve if I controlled the upstream router for the box, but
I don't. However, as I was setting up tighter security on my
OpenBSD machine, I noticed that the Packet Filtering firewall software
could give me statistics on named rules. By naming the rules that
pass traffic, I could query the pf subsystem frequently and
get traffic data. Problem solved!
In the filtering section of my /etc/pf.conf file, the last
dozen rules look like:
pass quick on lo0 keep state
pass in log label "other-inbound" keep state
pass out log label "other-outbound" keep state
pass in to <geekcruises> label "geekcruises-inbound" keep state
pass out from <geekcruises> label "geekcruises-outbound" keep state
pass in to <redcat> label "redcat-inbound" keep state
pass out from <redcat> label "redcat-outbound" keep state
pass in to <webstonehenge> label "webstonehenge-inbound" keep state
pass out from <webstonehenge> label "webstonehenge-outbound" keep state
pass in to <stonehenge> label "stonehenge-inbound" keep state
pass out from <stonehenge> label "stonehenge-outbound" keep state
These rules use tables defined earlier to identify the CIDR blocks
addresses of interest. For example, at the moment, the webstonehenge
table is defined as:
table <webstonehenge> { 209.223.236.163 }
The result of having the labels on these rules is that every time
a conversation is started for my www.stonehenge.com address,
all packets are "charged to" that particular rule. Because
of keep state, all reply packets are also charged to that rule.
We can dump the counters by running pfctl -zvsl frequently,
which results in an output something like:
other-inbound 421 0 0
other-outbound 421 0 0
geekcruises-inbound 421 2430 754775
geekcruises-outbound 421 0 0
redcat-inbound 421 0 0
redcat-outbound 421 0 0
webstonehenge-inbound 421 1081 470209
webstonehenge-outbound 421 0 0
stonehenge-inbound 421 810 132237
stonehenge-outbound 421 619 69223
The first of three numbers is the number of times the rule has been
evaluated; the second is the number of packets; and the third is the
number of bytes (our most important number). The -z flag on
the command "resets" the counter, so the next execution
will be all packets that had not yet been seen.
I wrote a program that runs from root's crontab every
five minutes to execute this command and parse the data (trivial
for Perl), and using DBI, insert it into a database for querying.
I won't show that program, because it's pretty short.
However, to keep things dumbly easy for me, I chose DBD::SQLite
as my "database".
This CPAN module contains an entire transaction-enabled, SQL92-compliant
database driver that accesses a single file for the database. It's
very nice when you don't want to go the full distance for a
complex application, and also very speedy (much faster than MySQL
for the same application).
I used the SQLite database schema of:
CREATE TABLE stats (
stamp INT,
type TEXT,
packets INT,
bytes INT,
PRIMARY KEY (stamp, type)
);
Because SQLite doesn't have direct date primitive support, I'm
storing the timestamp as the Unix epoch value (obtained with time
in my Perl program).
Once I had a few days worth of data, I wanted to see what it looked
like. I could have used an off-the-shelf solution like MRTG to do
this, but after reviewing the complexity of the existing applications
in this area, I decided to write something much simpler and more
appropriate to my needs. The result is shown in Listing 1, which
I will explain thoroughly.
Lines 5 through 36 define how the graph looks. To begin, in lines
7 through 9, I have the height and width in pixels, and number of
seconds of history to examine. (Multiplying the number of days times
86400 keeps me from doing heavy math in my head.)
Lines 11 and 12 define the input database (SQLite keeps this as
a single file) and the output graphic location, conveniently located
inside my Web server's space so I can get to it with a browser.
Lines 14 and 15 define the number of bars and the number of labels,
roughly proportional to the width of the graph.
Lines 16 to 34 control what gets displayed, and how it gets displayed.
I don't want an output for every rule category (especially
later when I start breaking out email versus ssh versus other),
so I'll merge the types using the rules defined in MAPPING.
The subroutine expects one of the category names as input, and returns
the desired roll-up name. The list in @PLOT should correspond
to those roll-up names in the order wanted, and the colors in @COLORS
show how they're colored.
Once we've got a plan, we need to implement it. To begin,
we connect to the database in lines 40 to 42. Then, in line 44,
we provide that MAPPING subroutine directly to SQLite as
an SQL function. This is very cool, because I can extend the SQL's
functions and aggregates using Perl definitions.
Lines 46 to 48 compute the range of time values to be covered,
including the step between output column values.
Line 50 sets up the result array that will be passed to GD::Graph,
including the labels for the timestamps as the first row. The time_to_label
subroutine is defined later.
Line 51 computes a mapping from the mapped roll-up names to their
appropriate row number in the graph.
Lines 53 to 59 prepare and execute an SQL query to get our data
summarized for output. Because SQLite doesn't have a floor
function, I cheated by using the round function and subtracting
0.5.
Lines 60 to 66 grab the result, which is a series of rows with
a graph column number (which I called $row because I was
sideways), a rolled-up type, and the average number of bytes transferred
during that time slot. Because the byte samples were taken during
5-minute intervals, I divided that number by 300 to get average
bytes-per-second. But I also care a lot more about gigabytes per
month than bytes per second, so I scaled the number appropriately.
The results are inserted into @graphdata, auto-vivified as
necessary.
Lines 68 to 72 define the time_to_label subroutine. The
input is a Unix epoch value, and the output is a string of "month/day-hour".
Finally, the graphing part, beginning in line 74. Line 75 creates
the graph object. Lines 76 to 85 define the specifications for the
graph. Line 86 passes the roll-up names as the labels, and line
87 actually does all the hard work and creates a GD object.
Lines 89 to 94 write the GD object as a PNG file, taking care
not to let a partially written file be visible on the Web page.
And the result of that is a pretty picture that shows my traffic,
organized by customer, and graphed over time (see Figure 1). In
the few weeks that I've been gathering data, it's been
useful to see exactly how my box is being accessed. And because
bandwidth is a precious resource, I've got one more tool now
to manage it. Until next time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration,
security, technical writing, and training. He has coauthored the
"must-have" standards: Programming Perl, Learning
Perl, Learning Perl for Win32 Systems, and Effective
Perl Programming. He's also a frequent contributor to the
Perl newsgroups, and has moderated comp.lang.perl.announce since
its inception. Since 1985, Randal has owned and operated Stonehenge
Consulting Services, Inc.
|