Cover V12, I04

Article
Figure 1
Figure 2
Figure 3
Figure 4

apr2003.tar

Writing an SNMP Agent

Damir Delija

A few years ago, during the summer of 2000, we faced serious overheating problems. Our Sun ULTRA 250 machines were in trouble, and we had an immediate need for monitoring. Our first try was with the prtdiag command and syslog report, but that turned out to be unusable and unreliable. The prtdiag output dump was too big and caused additional trouble to our log monitor. It was obvious that we needed something better.

By coincidence, I read Sun Performance and Tuning: Java and the Internet by A. Cockcroft and R. Pettit (Sun Microsystems) and got the idea of implementing a small, easily parsable output program to report temperature, fan, and disk status. In a few days I had created a tool called envstat, which mimics the basic output of prtdiag and also can send data by cron to syslog. However, my new tool was still missing a few things. The log monitor was sending reports by email, which was handled manually at the receiving end. This solution was clumsy, and the worst part was that the tool was not integrated with a network management or integrated monitoring system. Here is an example of envstat -t output:

#TABLE: temperature
#ROWS:  label   value
        CPU0    48
        CPU1    48
        MB0     36
        MB1     31
        PDB     31
        SCSI    28
Actually it goes through kernel chain and prints the requested values. It is in C, so it is faster than some awkward parse interface to prtdiag. Listing 1 shows the main body of the envstat program. The whole archive can be found on the Sys Admin Web site or at:

http://jagor.srce.hr/~ddelija/envstat/
This limited tool did succeed in averting the aforementioned crisis. Summer vacations started, and nobody cared much anymore. Still, I felt dissatisfied and considered it a challenge to come up with something better. I got an idea while reading Understanding SNMP MIBs by David Perkins and Evan McGinnis (Prentice Hall). I thought a small, dedicated SNMP agent built around our temperature module, envstat, could be a nice, workable solution. The first step was to define a MIB and then implement it through the agent and manager. Luckily, there were useful leftovers on hand from previous projects.

During the winter of 1995-1996, we tried to implement a system to monitor the network, but the project ended without much success. A CARNet MIB was written and a related agent was developed in Scotty. During the summer of 2000, I finally rewrote the agent to make it simpler, more robust, and more secure and created a MIB extension based on envstat. This tool has evolved into a modular agent with the capacity to add new modules. The system currently includes production modules for commands such as:

io stat -ne -- To show error status on io devices.

df -lk and df -oi -- To show file system status.

en vstat -t -- To show temperature status.

ne tstat -k -- To show some unusual counters on Solaris.

The agent main body is a very short initialization loop, which dynamically loads defined modules (see Listing 2). The rest of the job is hidden and done by Scotty initialization. Other modules can be found at the Sys Admin Web site or at:

http://jagor.srce.hr/~ddelija/agent/
We have subsequently integrated this small agent into three different management systems and given it a command-line interface. At the moment, we are hoping to create a system integrity-checking interface that would provide a MIB through which the management station can observe the status of tools like AIDE, Tripwire, or fcheck.

Development of the SNMP Agent

A full-scale SNMP agent is a huge, complex piece of software. As a heavy-duty server it is related to some important functions on a managed system and capable of doing some mischief, like stopping or starting important services unintentionally. SNMP agents are often misconfigured and can be used as a back door into systems or as a method of a denial of service, even by mistake. (The recent CERT Advisory CA-2002-03 addressed SNMP-related security problems.)

As in any server process, the SNMP agent primarily consists of two parts: a Protocol API that handles SNMP, and another part that handles the system interface. The basic idea behind SNMP management is remote debugging. Each SNMP agent presents the system as a set of ordered, well-defined variables. Each variable has its name, type, value range, and order defined and stored in the related MIB. MIB stands for managed information base, which often causes misinterpretation because there is no real database, just declarations. Variables exist only in agents as names of real entities. Action on those variables can trigger real action on the managed system. Changing a variable can trigger a command execution or even stop the system. The agent associates the MIB-related variables with real variables and actions on the managed system. On the other hand, the manager process must agree with the agent on variables and their syntax. The manager and the agent therefore share the MIB as a common definition.

SNMP agents must be economical and unobtrusive when it comes to system performance. It doesn't make sense for a system to spend all its resources servicing the agent instead of performing the primary function the agent is supposed to monitor.

If you are building a full-scale agent you intend to sell, there are some standard MIBs that you must incorporate, like MIB II. If you are a Sunday agent developer, you can be less strict and choose to implement just a small subset of variables that are important to you. Similar choices exist for development tools. If you are building a heavy-duty, full-system agent, you'll probably use some C-oriented tool well integrated into the system.

On the other hand, if you write something experimental, you might develop a prototype with some open source tool or even in scripting language. There are many such tools. UCSD SNMP is probably the best known. Various Perl implementations are available, and there is the Tcl/Tk implementation named "Scotty". Some of these tools are standalone, and some are part of bigger network management environments. I chose Scotty because it is simple and handy and is based on Tcl. Tcl is a good solution for beginners because it is easy to learn, develop, and extend. However, the simplicity of Tcl can sometimes lead to bad code if you write the program very quickly and without proper analyses.

There are various types of SNMP agents, some with very complex functionalities. For experimental purpose or just interfacing commands, you can use a very simple agent called a "screen scraper". A screen scraper actually wraps around existing commands or applications. Screen scrapers were among the first SNMP agents developed, and are still useful and easy to implement. A screen scraper can exist completely in the user space, without kernel hooks. This approach is especially well suited to an operating system, like Linux, where the system can read most important data straight from the /proc file system.

Screen scrapers typically provide a control variable, which triggers execution of the external command and shows when data was last generated. Values scraped from command output are presented as tables, usually indexed by rows.

How It Looks

Solaris 2.6 (and higher versions) provides the command iostat -ne, which shows errors on devices like disks, tapes, CD-ROMS, metadevices, and auto volumes. This command is an ideal early warning for IO subsystem health. Figure 1 shows the output of the iostat -ne command. The idea is to get iostat -ne output into a simple index table as shown in Figure 2.

The iostat -ne output is self-explanatory, but for an exact interpretation: s/w means soft errors on device, h/w means hard errors, tm means transient errors, tot is the total of all three error columns, and the last column is the device to which the errors are related. To present such results within SNMP, there must be a MIB with defined variables:

IoDEvicesNUmber -- Number of devices on which iostat -ne returns statistics. As for control, the index table goes from 1 to this number.

ErrorsiostatLastCHange -- Indicates time when iostat -ne was last executed and the table was last loaded.

IostatDeviceTable -- Table of iostat -ne entries.

IostatDeviceEentry -- A line in the table consists of Index, IostatDevice, soft errors, hard errors, and transient error columns.

Index -- Index number of the line.

IostatDevice -- Device that is described in line.

SoftError -- Number of soft errors, s/w column in output.

HardError -- Number of hard errors, h/w column in output.

TransientError -- Number of transient errors, trans column in output.

To store this data for the agent, you'll need a two-dimensional associative array with a few additional variables for the number or devices and the timestamp.

Listing 3 shows an example of the envstat module interface. It calls envstat -t command to get temperature information, removes the two first lines and stores the rest into Env global (a two-dimensional array with values bound to MIB variables). Each module has at least two functions. The first is to handle the external command, Env_get, and the second is the initialization function, Env_Init. The initialization function handles MIB tree installation and initial population of the global arrays.

Scotty allows connection of MIB variables with real Tcl variables and event scripts. As variables in MIBs are ordered into a so-called MIB-tree, it is possible to connect action scripts to the variables, which are then executed in a specific order depending on how variables are accessed.

In this case, each time a IoDeviceNumber is requested, the new iostat -ne results are loaded. The data in the table remains the same until the next request for an IoDeviceNumber. The only code required is a few, fast lines in Tcl, having very low impact on the system performance. The agent will connect the real Tcl variables to MIB names and wait for a request. The process is completely event driven.

The basic ideas behind this code are simplicity and ease of use. Typically, agents are more complex, including security mechanisms, passwords, address-based access controls, locking and synchronization, and many other things. Another basic idea that it's important to remember is not to make the agent too heavy on the system. Often, the execution of CPU-heavy commands can slow the system. This is a common problem with simple agents, especially if they are written in scripting language and written in a hurry. Associative arrays are very convenient for organizing things, and it's easy to forget how huge the output from a command can be and what affect memory allocation can have on system performance. During testing, we noticed that our agent increased memory usage by about 10-12 MB of RAM, which is acceptable for machines with more than 256 MB of RAM.

To avoid too rapid command execution, there is a resting period for each management station. Such a mechanism with simple host-based access control gives sufficient protection for some environments. You must decide whether this type of solution is consistent with the security policies of your own network. Additional security can be achieved through using non-privileged users and ports for the agents and additional SNMPv1 or SNMPv2 security mechanisms. To be precise, the SNMP session definition is completely independent of the basic functions. Actually, you can have more then one agent session in the agent program. Scotty is well designed to accomplish that. Security is more policy than implementation, but good rule of thumb is to be as restrictive as possible, especially in the beginning while all dependencies are not clear.

I know that professional agent and MIB writers will not agree with this approach. It is true that, in the MIB design, there should be better managed system analyses (which is the heart of MIB definition), better design of modules, less overhead, and better testing. Professional descriptions on how to build an agent can be found in Understanding SNMP MIBs.

How to Integrate an Agent into the System

The presence of an agent is not enough if it is not integrated into the management system. To begin, you must determine which management system is available. The manager must be capable of contacting the agent and working with the collected data. The manager should also include a trouble notification tool.

In our case, the agent started as a simple helping tool. Data collection was added to the existing module, which monitored the environmental values for Cisco routers. Originally, notification occurred through email, and data went to a file. This management system was at a remote location and not easily portable. We eventually needed a less centralized system because of the possibility of lost mail and link problems. We were actually following the normal evolution of management systems and architectures.

There were several discussions on how to overcome the shortcomings of a centralized management model. One of the UNIX admins said something about "all this SNMP mumbo-jumbo and still you cannot tell the cause of a problem unless you're right on the command line". This comment led to a simple idea: a command-line tool that can indicate specific data from the agent and show it in an admin-friendly way. (Don't confuse such a tool with the dreaded agent browser or MIB browser.)

We quickly developed a set of simple tools for displaying agent data. These tools are in the manner of RPC-based r-tools (for example, ruptime). We named our tool for accessing environmental information renvstat. As other functionalities were added to the agent, we added additional tools, such as riostat for remote iostat and rdf for remote df. The output presents the data in sysadmin-acceptable form. Figure 3 shows the example of the renvstat output.

The r-commands are written in Scotty. The effective code is about 30 lines. renvstat checks through the environmental tables in the agent and outputs the results. (For a test, we wrote one version of renvstat in Perl.) We installed these r-tools on computers where admins are often logged in. Listing 4 contains the code for the renvstat command, a remote version of envstat implemented through SNMP. It can be found with other similar commands at:

http://jagor.srce.hr/~ddelija/agent/monitor/
This is a walk-through agent that pretty prints its results and, actually, more than one machine can be defined in one call. The code is a little bit more complex because of necessary MIB loading and session handling, but still it is very short. The heart is the envstat_test function, which does the MIB walk and prints the result. All of the abovementioned r-commands are designed in the same way.

Once the r-commands were working, we started thinking about outfitting a monitoring station close to the monitored devices. We tried Big Sister and Big Brother, as well as other approved tools.

Big Sister proved to be the better choice, since it is completely in Perl and has direct SNMP support. It turned out, however, that my Perl package was broken so Big Sister was not able to run. So, we tried Big Brother and got a Big Brother monitoring station operational in a reasonably short time. The fact that Big Brother offers no direct SNMP support did not present a problem. We succeeded in integrating our r-tools almost without modification. Big Brother has its own bb-df tool to collect and process file system data. Turning this script bb-df into bb-rdf was a matter of changing the command "df" into "rdf hostname". It was the same with other modules. Only the envstat module needed its own new bb-env script developed from a skeleton example.

Listing 5 shows the bb-rdf script (it is now outdated, because Big Brother is getting new versions). My actual change was adding rdf command into PATH and changing the value of the command name variable DF into RDF to be compliant with Big Brother coding practice. I had to add NODE name reference in the eval line, but the rest is the same as before. Figure 4 shows Big Brother's global status display.

Obviously, this is not the best solution since Big Brother is a multitude of scripts doing data polling. The demand on the management station running Big Brother is heavy, depending on the number of events and polling schedule. A more sensible solution would be a proxy or a mid-level manager stationed adjacent to Big Brother, internally doing data poll from remote agents. Such an approach lends itself to a sophisticated, professional system. We have plans for building this type of a system when the demand impact on the monitoring stations becomes too heavy.

A very important issue, closely related with the integration, is the method of distribution, installation, and upgrade. The traditional method for distributing software is to compile it, tarball it, copy it, and install it on the target machines. This takes time and is not error proof. A better method is to use packaging tools. We use a Debian package manager in our system. This means all software needed for our agents is organized into a set of Debian packages. The more difficult question is not which package manager to use, but how to separate the files into handy modules. For easy maintenance, modules that are often changed should be kept in separate packages. In our case, the agent code is constantly upgraded, so we isolated it in a separate module. The steadier Tcl/Tk and Scotty code are also in separate packages. This approach is helpful in emergency situations and simplifies any automated upgrades.

The end result is a small, extensive open source system capable of doing very specific tasks. However, the system is not easily accessible with standard tools and utilities.

Conclusion

CERT Advisory "CA-2002-03 Multiple Vulnerabilities in Many Implementations of the Simple Network Management Protocol (SNMP)," released on February 12th, 2002, compiles many well-known SNMP-related problems. At last maybe some of the old problems will be solved. Like other protocols, SNMP often suffers from unreliable old code and misconfigured daemons. I encourage you to read through CERT Advisory CA-2002-03 before embarking on an SNMP project.

Building the agent was definitely a useful experience. In reality, for busy sys admins with sufficient financial resources, it would be wiser to buy the necessary tools or contract with a professional developer for a custom SNMP agent. However, if you don't have the budget for a high-end solution, I hope you find this article helpful.

References and Links

A. Cockcroft and R. Petit, Sun Performance Tunning, Java & Internet. ISBN 0-13-095249-4.

David Perkins and Evan McGinnis, Understanding SNMP MIBs. ISBN 0-13-437708-7.

Dave Zeltserman and Gerard Puoplo, Building Network Management Tools with Tcl/Tk. ISBN 0-13-080727-3.

SCOTTY home page -- http://wwwhome.cs.utwente.nl/~schoenw/scotty/

CARNet MIB page -- http://jagor.srce.hr/~ddelija/agent/

Agent home page -- http://jagor.srce.hr/~ddelija/envstat/

BigBrother home page -- http://www.bb4.com/

BigSister home page -- http://bigsister.graeff.com/

Unix Sysadmin Resource Center on Stokeley Consulting -- http://www.stokely.com/unix.sysadm.resources/

SNMP Research -- http://www.snmp.com

CERT Advisory CA-2002-03: "Multiple Vulnerabilities in Many Implementations of the Simple Network Management Protocol (SNMP)", http://www.cert.org.

Damir Delija has been a UNIX system engineer since 1991. He received a Ph.D. in Electrical Engineering in 1998. His primary job is systems administration, education, and other system-related activities.