Cover V14, i03

Article
Listing 1

mar2005.tar

Linux Server Monitoring with IPMI

Philip J. Hollenback

If you have expensive computer systems running in your data center, you want to make sure they keep running smoothly. Server vendors have addressed this by adding system monitoring devices to motherboards to report on temperatures, fan speeds, and voltages.

The standard way to monitor these parameters has traditionally been with tools such as lm_sensors on systems running Linux. However, this mechanism is far from perfect. For starters, it can be incredibly difficult to configure lm_sensors correctly because of poor documentation. Vendors often supply monitoring software that works perfectly and requires no tweaking, but that only runs on one motherboard and under Windows. Then, there's no guarantee that the temperature probes or fan sensors are properly calibrated. Also, it would be nice if you could reach out and reboot a hung system over the network without the use of additional equipment.

One solution is the Intelligent Platform Management Interface (IPMI). IPMI is a specification for monitoring and controlling server hardware. Specifically, it is a standardized way to do things such as:

  • Monitor system temperatures
  • Remotely force a hung machine to reboot
  • Read hardware event logs
  • Redirect the serial console over a network connection

IPMI covers several different instrumentation and reporting mechanisms. However, in this article, I'll focus on the main use of IPMI in monitoring and remotely controlling Intel-based servers running Linux and containing a baseboard management controller (BMC). Many higher-end server systems currently ship with this hardware pre-installed. Additionally, BMC cards are available as an add-on for many server systems.

What IPMI Can Do for You

If you are managing Intel-based servers of any sort, you are probably going to encounter IPMI sooner rather than later. In my case, the company I work for received a new shipment of Dell 1650 and Gateway 955 servers. We run a Linux shop, so the first order of business with new hardware is getting Linux installed with all of the necessary drivers.

Part of getting Linux operational on new systems is system monitoring. Modern servers with dual Pentium 4 processors and big SCSI hard drives run extremely hot. If a system fan fails, the server will fail badly in a matter of minutes. In the best case, the system will lock up. In the worst case, CPUs will burn up.

An even more insidious failure mode is the gradually slowing CPU fan. The system will run hotter and hotter and start to suffer from random lockups. In all these cases, it's critical to know that the fans are running and the system is cool at all times. If you don't detect and respond to a failure immediately, the system will go down and possibly suffer physical damage.

Thus, extracting temperature, fan, and voltage readings from servers is an absolute requirement at my company. Our systems are trading stocks day and night and we can't have them fail without warning. Ideally they don't fail at all, but realistically the best we can hope for is warning that a problem is approaching. That way, we hope to have time to move critical services to another system before a catastrophic failure occurs. On these new servers, I discovered the only way to obtain monitoring information is by using IPMI.

Our server room rack space is tight, so we prefer to run headless servers as much as possible. This also has the desirable effect of reducing electrical and heating loads in our server room. Additionally, we maintain a disaster recovery site some 40 miles away. These two facts make me extremely interested in remote control and management tools such as IPMI. Anything that saves me an emergency train trip to the DR site is an especially welcome addition to the system administration toolbox.

All of this and more is available via IPMI. Even though I was not initially interested in the rest of the remote management aspects of IPMI, I had to use it to get that critical monitoring data, which led me to investigate what else IPMI has to offer.

How IPMI Works

As I mentioned previously, IPMI is a standardized interface to system monitoring and management. This is accomplished with a separate, almost totally independent, small computer (the Baseboard Management Controller or BMC) running inside your server. This is not a new concept -- expensive servers have been shipping with management controllers for years. The important difference is that IPMI is standardized. With the proper tools, you should be able to use IPMI on any system and not have to worry about installing proprietary software.

The BMC is directly connected to the power supply in the system and should be operational at all times, even if the main system hangs or crashes. It controls the power connection to the server and can cycle it as needed to restart the main server. The BMC is also connected to your network. Some systems have a dedicated network interface for the BMC (typically labeled something like "Management Interface").

A more popular (and presumably cheaper) alternative is for the BMC to intercept traffic on a regular Ethernet interface. Packets to UDP port 623 are redirected to the BMC instead of to the motherboard. The Ethernet interface is always powered up when the system is connected to mains, so in theory you should always be able to use IPMI over the LAN to access the server. Note that all servers I am aware of come with LAN IPMI access disabled by default, because this is an obvious security hole. You probably don't want to enable IPMI over LAN until you understand the security implications and have at least configured password protection. Remember that at the very least, someone who can gain access to IPMI on your server can reboot the system at will.

IPMI supports a number of other hardware interfaces. Many rackmount servers come with a blue identification LED on the front and back of the system. You can use this to quickly locate a server in a rack of identical systems. Luckily, at my company we buy servers in ones and twos so everything looks different anyway. But if you have many identical systems, being able to control the ID light remotely could be very useful. You could use IPMI over the network or on a system to turn on the light, and then tell your tech to go find the system with the blue light on it. This would be particularly useful in remote server setups.

One IPMI feature that I was initially very excited about was Serial over LAN. One of the other hardware interfaces to which the BMC connects is the serial port. At my company, we have always configured servers with serial consoles. We set the BIOS, bootloader, and Linux all to redirect to the serial port. Then, we connect the serial port to a terminal server (typically a Cisco 2620 with a multi-port serial card). This solution works well; however, it creates extra cabling. The ideal server would have just a power cord and a network connection. This is exactly what Serial over LAN with IPMI promises. The BMC intercepts all traffic on the serial port and redirects it onto the network. The additional serial cable is eliminated.

This sounds wonderful, except for one small problem -- it doesn't work, for several reasons:

1. Serial over LAN is only standardized in the 2.0 version of IPMI. All of our servers have version 1.0 or 1.5.

2. The only way to get it to work on systems supporting IPMI versions prior to 2.0 is to use proprietary Intel software (see number 1).

3. There's a Linux kernel bug in RTS/CTS handling on serial ports that results in either a hung serial port (with RTS/CTS on) or dropped characters (with it off).

Luckily, all this is documented in the Debian IPMI HOWTO listed in the references at the end of this article. Although Serial over LAN sounds tempting, it's not very practical with current hardware and software.

IPMI does a lot more than I can cover in this article, including things like a true hardware watchdog and system inventory reporting. Read the references at the end of this article for more details.

Installing and Configuring IPMI

Because I manage Linux servers and thus already have good remote access, I first decided to get IPMI working through the operating system and not worry about IPMI over LAN. This was also motivated by my first desire to obtain sensor readings from my servers. If you are more interested in Serial over LAN or remote power-cycling of systems, you may want to explore that mechanism first.

Vendors do offer some Linux support for IPMI. Dell's standard system monitoring tool is OMSA, the Open Manage Server Administrator. Similarly, Intel offers some Linux tools for their SE7501WV2 motherboard. However, as it often the case with vendor-supplied tools, the packages are large, intrusive, and contain proprietary software. Because I was initially interested in just basic system monitoring, it seemed appropriate to investigate completely open source solutions first.

The first step in enabling IPMI support in Linux is to install the OpenIPMI kernel driver. These are included with the stock 2.4.21 and later kernels but may not be enabled in your configuration. You may want to check the OpenIPMI homepage (see References) and download the drivers from there as they might be more recent than what comes with your kernel. In my case, the drivers that came with my kernel (2.4.22) have been adequate.

You need the following driver modules:

  • ipmi_msghandler
  • ipmi_devintf
  • ipmi_kcs_drv (2.4 kernels) or ipmi_si (2.6 kernels)

If you don't already have these on your system, follow the usual Linux kernel build instructions and enable the IPMI drivers as modules.

These kernel modules are necessary, but you can't actually do anything useful with them. For that, you need the ipmitool userland utility. This doesn't appear to ship with any current distributions (up to Fedora Core 2, anyway). Because of this and the fact that these tools are still relatively young, your best bet is to download ipmitool from the project Web page (see References).

Make sure you have at least version 1.6.0 of ipmitool as it contains a number of fixes and improvements. For example, earlier versions of ipmitool only partially work on my Dell 1650 servers because those servers only support IPMI version 1.0 and ipmitool didn't fully understand that version. This has been fixed in version 1.6.0.

Build and install ipmitool from source or install the package if you have it. Building your own rpm package is fairly easy with the spec file included in the source tarball (that's the approach I used). Then, load the modules and verify they loaded correctly by checking dmesg.

Now you are almost ready to test IPMI on your system. The last step before doing that is configuring the device node that ipmitool uses to communicate with the driver. You can create this manually, in modules.conf or your init script. You can see my init script (Listing 1) for how to create the device there. Or if you are a master of modules.conf like one of my co-workers, use this modules.conf entry he wrote (one line):

post-install ipmi_devintf /bin/awk '/ ipmidev$/{print $1}' \
  /proc/devices | /usr/bin/xargs -r -imajor /bin/sh -c "rm -f \
  /dev/ipmi0 && /bin/mknod -m 0600 /dev/ipmi0 c major 0" \
  >/dev/null 2>&1 || :
which automatically determines the device number by checking in /proc/devices. The device number will almost always be 254.

Testing IPMI

IPMI is now installed and configured on your system. To test it, run ipmitool with appropriate arguments. Try this to query the BMC:

# ipmitool -I open bmc info
And you will see something like this:

Device ID                 : 0
Device Revision           : 0
Firmware Revision         : 1.71
IPMI Version              : 1.0
Manufacturer ID           : 674
Product ID                : 1 (0x0001)
Device Available          : yes
Provides Device SDRs      : yes
Additional Device Support :
    Sensor Device
    SDR Repository Device
    SEL Device
    FRU Inventory Device
    IPMB Event Receiver
Aux Firmware Rev Info     :
    0x00
    0x00
    0x00
    0x00
First, note that you have to run ipmitool as root to get access to the IPMI devices. Second, select the interface to use with the -I switch. "open" is the OpenIPMI driver on the local system. The other major interface is "lan" for communicating with the BMC on a remote system over Ethernet.

Finally, the IPMI version supported on a particular system is important. The sample output above is from a Dell 1650, which only supports IPMI v1.0. As I mentioned, ipmitool versions prior to 1.6.0 don't really work on this system without a patch. Some features (such as Serial over LAN) don't work on systems that only support IPMI version 1.0.

The current IPMI version is 2.0, which adds standardized Serial over LAN and better security compared to v1.5. Systems supporting version 2.0 of the specification should be appearing shortly on the market.

Making IPMI Work with (or at Least Like) lm_sensors

Because my initial goal in setting up IPMI was to obtain sensor data from my systems, the next step was to get at that data. Remember that IPMI-enabled systems only support retrieving sensor data via IPMI. In other words, you can't use the normal lm_sensors drivers to obtain readings as you would on other servers.

There is preliminary support in lm_sensors for IPMI, so you may want to investigate system monitoring only with that package. However, because I hoped to eventually use the other features of IPMI (particularly remote reboot over LAN), I decided to work only with ipmitool. Furthermore, my company's existing monitoring tools all expect data in the format of lm_sensor's "sensors" program. This meant that for the easiest integration I would have to write a script that called ipmitool appropriately, parsed that programs output, and displayed it in this format:

CPU1 Temp: 40 C
CPU2 Temp: 44 C
CPU1 Fan: 5311 RPM
CPU2 Fan: 5201 RPM
However, the impitool sensor data looks like this:

Sensor ID             : CPU 1 (0x1)
Sensor Type (Analog)  : Temperature
Sensor Reading        : 40 ( -124) degrees C
Status                : ok
Lower Non-Recoverable : -128.000
Lower Critical        : 5.000
Lower Non-Critical    : 10.000
Upper Non-Critical    : 70.000
Upper Critical        : 75.000
Upper Non-Recoverable : 127.000
Obviously, some munging is necessary. See Listing 1 for how I dealt with this.

While IPMI provides many other sensor values, I was concerned with just CPU temperature for the first version of my script. IPMI sensor readings are available from both the "sdr" and "sensors" commands:

# ipmitool -I open sdr list
and

# ipmitool -I open sensor get "CPU 1"
Though the sdr (Sensor Data Repository) seems to contain most of the sensor data, it's not in a particularly useful format. On our Gateway servers, the CPU temperatures did not appear in the sdr output at all. The sdr output also just spits out all the sensor readings, which results in a lot of waiting for sensors you don't care about. On the other hand, the sensor command to the ipmitool open interface allows you to query a particular sensor, so that seemed the more efficient route to follow.

An annoyance quickly appeared when I tested ipmitool on systems running both the 2.4 and 2.6 Linux kernels. Under the 2.6 kernel I could obtain both CPU temperatures in 1-10 seconds, but under the 2.4 kernel the wait was much longer. The worst case was under the 2.4 kernel on the Gateway servers -- ipmitool took 1 minute 44 seconds to return both CPU temperatures! System load during this wait was minimal, so ipmitool was just sitting around and waiting for data to return from the OpenIPMI driver. This comment in the ipmi_kcs driver documentation was illuminating:

If you have high-res timers compiled into the kernel, the driver will use them to provide much better performance. Note that if you do not have high-res timers enabled in the kernel and you don't have interrupts enabled, the driver will run VERY slowly. Don't blame me, these interfaces suck.

I found that in general, the performance on 2.6 systems was acceptable. The Dell 1650s were much quicker to return values than the Gateways, which seemed to be related to the Dells only supporting the 1.0 IPMI specification and thus having less data to read from the BMC.

Ultimately, the only optimization I was able to make was to ask ipmitool for all the sensor values at once:

# ipmitool -I open sensor get "CPU 1" "CPU 2"
which brought the time for the Gateways running the 2.4 kernel down to 53 seconds. This isn't great, but adequate for my needs since I wanted to query for CPU temperatures every 5 minutes. Also, note that the OpenIPMI driver is fairly resistant to becoming wedged in an unstable or unresponsive state -- when ipmitool is waiting for data, I can interrupt it with <ctrl-c> with no ill effects, other than a warning message in the kernel log. I never experienced system lockups or a hung driver while testing the OpenIPMI driver and ipmitool utility.

I also discovered that some sensor names were different between the Dell and Gateway servers. On the Dells, the first CPU temperature can be found in "CPU 1". On the Gateways, this same sensor is called "Processor1 Temp". I chalked this up to a difference between the 1.0 and 1.5 IPMI specification, but I did not verify this.

The fact that ipmitool must be run as root is an annoying limitation. I understand the security commands that require this; however, it would be handy if there were some sort of read-only access to IPMI on Linux so that regular users could retrieve system sensor values. This may be a feature in the lm_sensors IPMI support and could be a reason to investigate that solution. My fix was to wrap the ipmitool invocation in a script that was accessible to certain users via sudo, which is not entirely secure but adequate for my needs. I hope to get my init script accepted into the ipmitool distribution soon as it should be useful for others.

After much tweaking, I finally arrived at a script that extracts CPU temperatures from IPMI. I included this in an init script, as seen in Listing 1. The init script handles loading the driver and creating the necessary device node, as well as displaying CPU temperatures. Here are my final run times for "ipmi status" on both the Dell and Gateway test systems:

System Kernel 2.4 Kernel 2.6
Gateway 955 55s 12s
Dell 1650 9s 1s

As I noted, this is time spent just waiting for a response from the BMC -- system load is minimal. You can verify this with the time command -- user and sys time for the command is 0 or very close to 0, while the real time is 55 seconds. The processor is not working at all.

Future Developments and Closing Thoughts

Now that I have basic monitoring working, I plan to implement the more exciting features of IPMI such as Serial over LAN and remote system power control. The ability to always connect to a system even if it is down will obviously make IPMI very useful in a data center situation, and I plan to integrate that into our environment next. Later, it would be useful to get Serial over LAN operational to reduce cabling requirements in our data center.

Similarly, watchdog support should prove useful for unattended systems (as long as it works perfectly, of course). My company will probably wait to explore these other IPMI features until we move all systems to the 2.6 Linux kernel to minimize problems like the long sensor read times on some systems.

I have only scratched the surface of IPMI. This is a large and complex mechanism with many features. I attacked the issue from one perspective -- how do I obtain CPU temperatures on a system equipped with IPMI? The answer is through a combination of Linux kernel drivers, a userland IPMI tool, and some custom scripting. Along the way, I discovered that IPMI under Linux has some interesting quirks, such as different sensor names on different systems and wildly varying sensor read times. Luckily, all of these issues were resolved, and my tool is now in production at my company, providing temperature data on many of our servers.

References

The IPMI on Debian Howto: http://buttersideup.com/docs/howto/IPMI_on_Debian.html

Ipmitool (Linux userland tools) home: http://ipmitool.sourceforge.net/

OpenIPMI (Linux kernel driver) home: http://openipmi.sourceforge.net/

Phil Hollenback is a Linux Systems Administrator at Telemetry Investments in New York City. He holds a BS in Computer Science with an emphasis in AI. Outside of work he tries to avoid getting run over while skateboarding the streets of Manhattan. He can be reached via his Web site, http://www.hollenback.net.