nov2005.tar

Staying Alive...or Online, Anyway

Dick Munroe

I run a small consulting company from a home office. I have a decent number of machines online constantly running everything from OpenVMS to Macintosh OS 9 and X to Windows to Linux. These machines support my client's software development efforts as well as my own products, marketing, etc. Additionally, my house is fairly well wired, both with wireless and wired networks and a number of workstations used by the rest of my family. Unfortunately, none of my family members have been particularly interested in understanding the communication infrastructure in the house and thereby hangs this tale.

The Usual Suspects

The basic networking infrastructure is pretty much the same everywhere and nothing is too different at my installation. I have a router/firewall/server built with an old 500-MHz AlphaServer 800 running Debian Linux, the testing version. It's my personal favorite Linux distribution; try it if you like a lot of control over what goes onto your system and you don't want to hassle with installations too much.

My ISP is Verizon and at the time I wrote this article, I was using their 768-Kbs ADSL service. The modem they sold me with the service was a Westell WireSpeed and, for reasons that neither Verizon nor I ever discovered, the modem would periodically lock up and have to be power-cycled to restart. There seemed to be some correlation between wet weather and the modem locking up but nothing that was particularly reliable in predicting the problem. I travel quite a bit and, of course, the modem would behave perfectly for months while I was home. When I would leave, of course, it would lock up and cut off my clients from their support services.

Clearly, I needed something to automate the things that I would do to get my communications infrastructure online in the face of a variety of flaky environmental factors.

Requirements

Whatever the solution to staying connected was, it had to satisfy the following requirements:

1. Robust in the face of power outages. The router/firewall and any servers had to come back up automatically, reboot themselves, and restart communications. The connection to Verizon's DSL network had to come back automatically.

2. Insensitive to individual piece failures. While a completely failed component (you try getting a DSL modem or a computer that's taken a lightening strike to work again) would have to be replaced, the firewall/router should be able to detect, repair, and complain if repair isn't possible with all pieces of the "plumbing" between itself and the intra- and inter- network.

3. No external intervention. In the event that anything had to be done to recover connectivity, I shouldn't need to be there nor should there be a need for a network connection to fix the problem (if at all possible).

4. Flexible. I had to be able to manipulate each piece of the solution individually. If I had to power cycle something, I didn't want to power cycle everything.

5. Inexpensive. As a small businessman, I'm sensitive to price. I tend to trade "sweat" for money in my office, buying used gear, or creating solutions to problems out of free software components and a little in-house glue.

All of my computers reboot themselves automatically, and the only time I saw a situation in which the DSL network didn't reconnect after power-cycling was when a traffic accident put a car into the local Verizon office, so I felt reasonably certain that the hardware I had satisfied the first requirement.

With the Debian Linux distribution's PPPoE client (by Roaring Penguin), I had noticed the PPP interface disappeared every time the WireSpeed modem needed to be power-cycled. Because I could reliably spot the problem, it was possible to try to fix it if I could actually do the power cycling. A little research in "home automation" turns up any number of possibilities for cycling power on an appliance. However, qualifying the research to those products supported by Linux drives you, more or less immediately, to the X10 product line.

The X10 product line is a "hobbyist" collection of home automation products ranging from motion sensors to Web-enabled cameras, to light dimmers, appliance controllers, and (most importantly) transceivers for the X10 protocol.

Basically these products work by using the 60-cycle line current in your wiring as a carrier and sending a very simple protocol from a control station to one or more slave devices. Their C17A product (a.k.a., "Firecracker") connects to a standard RS-232 serial port and emits the X10 protocol on radio frequencies (essentially acting as one of the X10 "remote control" units). The transceiver takes the radio frequency version of the X10 protocol and converts it to the line current carrier wave version of the protocol. The AlphaServer 800 I use for a router has an RS-232 port suitable for use with "Firecracker" and doesn't have anything else, so if it were to be possible to cycle power on my modem, it would have to be the "Firecracker" that would do the job.

I have to emphasize the "hobbyist" aspect of the X10 product line. The X10 hardware design is very inexpensive and not suitable for industrial or other harsh environments. X10's customer support isn't all that consistent, and they have a mixed reputation for refunds for defective equipment. However, there is a decent user group if you are more comfortable asking users about solutions to problems rather than the company's technical support. The bottom line is that X10 home automation solutions can be made to work in specific limited environments if you are willing to spend the time. Test your environment and your specific X10 hardware thoroughly before committing your business. In my lab, for example, although the modem solution described here worked like a dream, a substantially more complicated lighting control scheme would not (or at least not for the money I was interested in spending).

At the time, there was a "special" going on (X10 is always running specials) for a "Firecracker", a transceiver for the "Firecracker" signals, and several lighting control units. The total cost for the X10 hardware was less than $40 so the price was certainly right.

A little further research turned up a shell-scripting interface for the "Firecracker" called Bottle Rocket, which had been ported to Debian and was available as a Debian package. As of this writing, the author of Bottle Rocket is looking for a maintainer for the package. However, it is functional in its current form. So there appeared to be a collection of stuff that would satisfy my requirements if I were clever enough to put all the pieces together.

The Hardware Solution

The first step was to buy the necessary X10 hardware and test whether the solution could be made to work. At this point, the worst case was that I would be out $40 for the hardware.

The kit that I bought consisted of:

Firecracker Control Module
Remote Control Unit
Transceiver Module
Lamp Module

This was just enough equipment to put together the solution I had in mind and test it with and without software. Once the hardware arrived, it took very little time to plug the transceiver and lamp module into a power strip, plug the WireSpeed modem into the transceiver module, and see whether the X10 hardware would control the modem state.

After that, I fired up aptitude, the Debian package manager of choice at my shop, and installed the source package of Bottle Rocket (version 0.05b3), built it, tested it, and found it didn't work. So, I rebuilt the package using the debug mode, which, among other things, turns off all compiler optimizations, tried again, and this time it worked. Your mileage may vary, but if Bottle Rocket doesn't work for you out of the box, try rebuilding in debug mode.

So, now I had a hardware-only solution. The WireSpeed could be controlled individually without any other hardware, the solution was inexpensive, and if necessary, flexible. Now, I just needed to make it smart.

The Software Solution

Here comes the fun stuff -- writing the software. Because Bottle Rocket came with an interface that could be used from a shell script, I decided to write a daemon that would:

Start/Stop with the usual init.d interface
Be configurable to accommodate the different delays and restrictions imposed by ISPs and modem builders
Attempt a reasonable power-cycle strategy (not simply madly flip the power switch)
Warn me if the network stayed down after trying "everything"

The configurable parameters of the check-pppoe daemon are shown in Listing 1. The parameters are:

CYCLETIME -- The existence of the PPP interface of interest will be checked every CYCLETIME seconds. The number is specified in "sleep" format.

DSL_PROVIDER -- The Debian PPP infrastructure provides for a variety of named DSL providers. The default is "dsl-provider".

HOUSECODE -- The X10 address of the module that controls power to the modem. Since your modem can generally be plugged into the transceiver, only a single X10 module is generally necessary.

HOUSECODE_OFF_PAUSE -- To allow the modem hardware time to shut down properly, check-pppoe waits for the specified time before proceeding after powering off the modem.

HOUSECODE_ON_PAUSE -- To allow the modem hardware time to come up completely and detect both the wide area and local networks, this amount of time is allowed to elapse before bringing up the PPP interface. You'll have to figure out how long your modem takes by inspection. Time it, then add a few seconds to make sure that the modem is really up.

POWERCYCLETIME -- The integer number of seconds to wait once the need to cycle power to the modem has been determined. The failure modes I've seen with the DSL network are either that the network connection comes back after the first power cycle (the problem was with the modem) or the network is down and will be back up "soon". To keep from blindly flipping the power on the modem, check-pppoe implements a back-off strategy such that it waits for subsequently more time (to a configurable maximum) between attempts to cycle power.

POWERCYCLETIMEINCREMENT -- The integer number of seconds to increment POWERCYCLETIME if the PPP connection fails to come up after a power cycle.

POWERCYCLETIMEMAXIMUM -- The integer number of seconds that is the maximum amount of time between power-cycle attempts.

PPP -- The PPP interface to be checked. It is a simple existence check. If the PPP interface exists, the network connection is assumed to be up.

PPP_PAUSE -- The number of seconds (in "sleep" format) that it takes the PPPoE client to bring up the new connection. You'll have to time this on your system.

SYSTEMSTARTUP -- The number of seconds (in "sleep" format) to wait after starting the daemon before testing for the network connection. If you start this daemon at boot time, you should always start it after the network interfaces are up. This makes sure that check-pppoe doesn't get into any unfortunate interaction with the network startup scripts.

TRY -- The integer number of times that check-pppoe will attempt to restart the PPPoE client in the hopes that it won't be necessary to actually cycle power on the modem.

Listing 2 contains the code of the check-pppoe daemon. It's pretty straightforward. Note that the logger command is used to emit status information. If additional notification of a system manager is needed, then modification of syslog.conf to provide the necessary hooks to the appropriate notification mechanism will be necessary. The last piece is Listing 3, which is the daemon startup shell script.

Installing check-pppoe

I don't have enough "round tuits" left to do the right thing and write a Debian package for this, but it's on my list. Installation is pretty straightforward:

1. Copy Listing 2 to /usr/local/sbin/check-pppoe.daemon. If you prefer, you can copy it to /usr/sbin, /sbin, or /opt/sbin.

2. Copy Listing 1 to /etc/check-pppoe/check-pppoe.conf.

3. Copy Listing 3 to /etc/init.d/check-pppoe.

You should make symbolic links as appropriate from the /etc/rc directories for the run levels, which should have check-pppoe started and killed.

Conclusion

For $40 in hardware and less than a day of time, I solved a problem that had driven me nuts for years. This isn't the most robust or general solution possible, but it works far better than anything else I could have had for the money. So far, I've had no further need for automation of this sort of systems administration task, but given how easy this solution was, the next time I won't hesitate. Now I can leave the office and not worry about the state of the network. It's all being taken care of for me.

This code is available for download from the Sys Admin Web site and from:

http://www.csworks.com/download/StayingAlive.tar.bz2

Resources

Bottle Rocket software -- http://mlug.missouri.edu/~tymm/

Debian Linux -- http://www.debian.org

X10 "Firecracker" Kit -- http://www.x10.com/software/automation_software.html

Dick Munroe is a software engineer, architect, and consultant with nearly 40 years of software and project experience ranging from the sublime to the ridiculous. He grinds code and wood from his offices at Cottage Software Works in Belmont, Massachusetts, Havana, Florida, and Guanaja, Honduras. When playing, he can frequently be found at the top of mountains wondering whether they will find the pieces come springtime or deep in the water worrying whether that shark is really as hungry as it looks. He can be contacted at: munroe@csworks.com.