mar2004.tar

Monitoring Net Traffic with OpenBSD's Packet Filter

Randal L. Schwartz

The server for stonehenge.com lives somewhere in Texas, in a place I've never seen. I rent a box from Sprocket Data Systems, and they provide my remote eyes and ears, and hook me up to their networks and power grid. I'm limited to a certain bandwidth each month for the rate I pay, and to offset the costs, I also sublease the box to geekcruises.com and redcat.com.

Because the bandwidth costs me actual dollars for usage and over-usage, I needed to monitor how much is used, and by whom. This would be easy to solve if I controlled the upstream router for the box, but I don't. However, as I was setting up tighter security on my OpenBSD machine, I noticed that the Packet Filtering firewall software could give me statistics on named rules. By naming the rules that pass traffic, I could query the pf subsystem frequently and get traffic data. Problem solved!

In the filtering section of my /etc/pf.conf file, the last dozen rules look like:

pass quick on lo0 keep state

pass in log label "other-inbound" keep state
pass out log label "other-outbound" keep state

pass in to <geekcruises> label "geekcruises-inbound" keep state
pass out from <geekcruises> label "geekcruises-outbound" keep state

pass in to <redcat> label "redcat-inbound" keep state
pass out from <redcat> label "redcat-outbound" keep state

pass in to <webstonehenge> label "webstonehenge-inbound" keep state
pass out from <webstonehenge> label "webstonehenge-outbound" keep state

pass in to <stonehenge> label "stonehenge-inbound" keep state
pass out from <stonehenge> label "stonehenge-outbound" keep state

These rules use tables defined earlier to identify the CIDR blocks addresses of interest. For example, at the moment, the webstonehenge table is defined as:

table <webstonehenge> { 209.223.236.163 }

The result of having the labels on these rules is that every time a conversation is started for my www.stonehenge.com address, all packets are "charged to" that particular rule. Because of keep state, all reply packets are also charged to that rule.

We can dump the counters by running pfctl -zvsl frequently, which results in an output something like:

other-inbound 421 0 0
other-outbound 421 0 0
geekcruises-inbound 421 2430 754775
geekcruises-outbound 421 0 0
redcat-inbound 421 0 0
redcat-outbound 421 0 0
webstonehenge-inbound 421 1081 470209
webstonehenge-outbound 421 0 0
stonehenge-inbound 421 810 132237
stonehenge-outbound 421 619 69223

The first of three numbers is the number of times the rule has been evaluated; the second is the number of packets; and the third is the number of bytes (our most important number). The -z flag on the command "resets" the counter, so the next execution will be all packets that had not yet been seen.

I wrote a program that runs from root's crontab every five minutes to execute this command and parse the data (trivial for Perl), and using DBI, insert it into a database for querying. I won't show that program, because it's pretty short. However, to keep things dumbly easy for me, I chose DBD::SQLite as my "database".

This CPAN module contains an entire transaction-enabled, SQL92-compliant database driver that accesses a single file for the database. It's very nice when you don't want to go the full distance for a complex application, and also very speedy (much faster than MySQL for the same application).

I used the SQLite database schema of:

CREATE TABLE stats (
  stamp INT,
  type TEXT,
  packets INT,
  bytes INT,
  PRIMARY KEY (stamp, type)
);

Because SQLite doesn't have direct date primitive support, I'm storing the timestamp as the Unix epoch value (obtained with time in my Perl program).

Once I had a few days worth of data, I wanted to see what it looked like. I could have used an off-the-shelf solution like MRTG to do this, but after reviewing the complexity of the existing applications in this area, I decided to write something much simpler and more appropriate to my needs. The result is shown in Listing 1, which I will explain thoroughly.

Lines 5 through 36 define how the graph looks. To begin, in lines 7 through 9, I have the height and width in pixels, and number of seconds of history to examine. (Multiplying the number of days times 86400 keeps me from doing heavy math in my head.)

Lines 11 and 12 define the input database (SQLite keeps this as a single file) and the output graphic location, conveniently located inside my Web server's space so I can get to it with a browser.

Lines 14 and 15 define the number of bars and the number of labels, roughly proportional to the width of the graph.

Lines 16 to 34 control what gets displayed, and how it gets displayed. I don't want an output for every rule category (especially later when I start breaking out email versus ssh versus other), so I'll merge the types using the rules defined in MAPPING. The subroutine expects one of the category names as input, and returns the desired roll-up name. The list in @PLOT should correspond to those roll-up names in the order wanted, and the colors in @COLORS show how they're colored.

Once we've got a plan, we need to implement it. To begin, we connect to the database in lines 40 to 42. Then, in line 44, we provide that MAPPING subroutine directly to SQLite as an SQL function. This is very cool, because I can extend the SQL's functions and aggregates using Perl definitions.

Lines 46 to 48 compute the range of time values to be covered, including the step between output column values.

Line 50 sets up the result array that will be passed to GD::Graph, including the labels for the timestamps as the first row. The time_to_label subroutine is defined later.

Line 51 computes a mapping from the mapped roll-up names to their appropriate row number in the graph.

Lines 53 to 59 prepare and execute an SQL query to get our data summarized for output. Because SQLite doesn't have a floor function, I cheated by using the round function and subtracting 0.5.

Lines 60 to 66 grab the result, which is a series of rows with a graph column number (which I called $row because I was sideways), a rolled-up type, and the average number of bytes transferred during that time slot. Because the byte samples were taken during 5-minute intervals, I divided that number by 300 to get average bytes-per-second. But I also care a lot more about gigabytes per month than bytes per second, so I scaled the number appropriately. The results are inserted into @graphdata, auto-vivified as necessary.

Lines 68 to 72 define the time_to_label subroutine. The input is a Unix epoch value, and the output is a string of "month/day-hour".

Finally, the graphing part, beginning in line 74. Line 75 creates the graph object. Lines 76 to 85 define the specifications for the graph. Line 86 passes the roll-up names as the labels, and line 87 actually does all the hard work and creates a GD object.

Lines 89 to 94 write the GD object as a PNG file, taking care not to let a partially written file be visible on the Web page.

And the result of that is a pretty picture that shows my traffic, organized by customer, and graphed over time (see Figure 1). In the few weeks that I've been gathering data, it's been useful to see exactly how my box is being accessed. And because bandwidth is a precious resource, I've got one more tool now to manage it. Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.