Staying
Alive...or Online, Anyway
Dick Munroe
I run a small consulting company from a home office. I have a
decent number of machines online constantly running everything from
OpenVMS to Macintosh OS 9 and X to Windows to Linux. These machines
support my client's software development efforts as well as my own
products, marketing, etc. Additionally, my house is fairly well
wired, both with wireless and wired networks and a number of workstations
used by the rest of my family. Unfortunately, none of my family
members have been particularly interested in understanding the communication
infrastructure in the house and thereby hangs this tale.
The Usual Suspects
The basic networking infrastructure is pretty much the same everywhere
and nothing is too different at my installation. I have a router/firewall/server
built with an old 500-MHz AlphaServer 800 running Debian Linux,
the testing version. It's my personal favorite Linux distribution;
try it if you like a lot of control over what goes onto your system
and you don't want to hassle with installations too much.
My ISP is Verizon and at the time I wrote this article, I was
using their 768-Kbs ADSL service. The modem they sold me with the
service was a Westell WireSpeed and, for reasons that neither Verizon
nor I ever discovered, the modem would periodically lock up and
have to be power-cycled to restart. There seemed to be some correlation
between wet weather and the modem locking up but nothing that was
particularly reliable in predicting the problem. I travel quite
a bit and, of course, the modem would behave perfectly for months
while I was home. When I would leave, of course, it would lock up
and cut off my clients from their support services.
Clearly, I needed something to automate the things that I would
do to get my communications infrastructure online in the face of
a variety of flaky environmental factors.
Requirements
Whatever the solution to staying connected was, it had to satisfy
the following requirements:
1. Robust in the face of power outages. The router/firewall and
any servers had to come back up automatically, reboot themselves,
and restart communications. The connection to Verizon's DSL network
had to come back automatically.
2. Insensitive to individual piece failures. While a completely
failed component (you try getting a DSL modem or a computer that's
taken a lightening strike to work again) would have to be replaced,
the firewall/router should be able to detect, repair, and complain
if repair isn't possible with all pieces of the "plumbing" between
itself and the intra- and inter- network.
3. No external intervention. In the event that anything had to
be done to recover connectivity, I shouldn't need to be there nor
should there be a need for a network connection to fix the problem
(if at all possible).
4. Flexible. I had to be able to manipulate each piece of the
solution individually. If I had to power cycle something, I didn't
want to power cycle everything.
5. Inexpensive. As a small businessman, I'm sensitive to price.
I tend to trade "sweat" for money in my office, buying used gear,
or creating solutions to problems out of free software components
and a little in-house glue.
All of my computers reboot themselves automatically, and the only
time I saw a situation in which the DSL network didn't reconnect
after power-cycling was when a traffic accident put a car into the
local Verizon office, so I felt reasonably certain that the hardware
I had satisfied the first requirement.
With the Debian Linux distribution's PPPoE client (by Roaring
Penguin), I had noticed the PPP interface disappeared every time
the WireSpeed modem needed to be power-cycled. Because I could reliably
spot the problem, it was possible to try to fix it if I could actually
do the power cycling. A little research in "home automation" turns
up any number of possibilities for cycling power on an appliance.
However, qualifying the research to those products supported by
Linux drives you, more or less immediately, to the X10 product line.
The X10 product line is a "hobbyist" collection of home automation
products ranging from motion sensors to Web-enabled cameras, to
light dimmers, appliance controllers, and (most importantly) transceivers
for the X10 protocol.
Basically these products work by using the 60-cycle line current
in your wiring as a carrier and sending a very simple protocol from
a control station to one or more slave devices. Their C17A product
(a.k.a., "Firecracker") connects to a standard RS-232 serial port
and emits the X10 protocol on radio frequencies (essentially acting
as one of the X10 "remote control" units). The transceiver takes
the radio frequency version of the X10 protocol and converts it
to the line current carrier wave version of the protocol. The AlphaServer
800 I use for a router has an RS-232 port suitable for use with
"Firecracker" and doesn't have anything else, so if it were to be
possible to cycle power on my modem, it would have to be the "Firecracker"
that would do the job.
I have to emphasize the "hobbyist" aspect of the X10 product line.
The X10 hardware design is very inexpensive and not suitable for
industrial or other harsh environments. X10's customer support isn't
all that consistent, and they have a mixed reputation for refunds
for defective equipment. However, there is a decent user group if
you are more comfortable asking users about solutions to problems
rather than the company's technical support. The bottom line is
that X10 home automation solutions can be made to work in specific
limited environments if you are willing to spend the time. Test
your environment and your specific X10 hardware thoroughly before
committing your business. In my lab, for example, although the modem
solution described here worked like a dream, a substantially more
complicated lighting control scheme would not (or at least not for
the money I was interested in spending).
At the time, there was a "special" going on (X10 is always running
specials) for a "Firecracker", a transceiver for the "Firecracker"
signals, and several lighting control units. The total cost for
the X10 hardware was less than $40 so the price was certainly right.
A little further research turned up a shell-scripting interface
for the "Firecracker" called Bottle Rocket, which had been ported
to Debian and was available as a Debian package. As of this writing,
the author of Bottle Rocket is looking for a maintainer for the
package. However, it is functional in its current form. So there
appeared to be a collection of stuff that would satisfy my requirements
if I were clever enough to put all the pieces together.
The Hardware Solution
The first step was to buy the necessary X10 hardware and test
whether the solution could be made to work. At this point, the worst
case was that I would be out $40 for the hardware.
The kit that I bought consisted of:
- Firecracker Control Module
- Remote Control Unit
- Transceiver Module
- Lamp Module
This was just enough equipment to put together the solution I
had in mind and test it with and without software. Once the hardware
arrived, it took very little time to plug the transceiver and lamp
module into a power strip, plug the WireSpeed modem into the transceiver
module, and see whether the X10 hardware would control the modem
state.
After that, I fired up aptitude, the Debian package manager of
choice at my shop, and installed the source package of Bottle Rocket
(version 0.05b3), built it, tested it, and found it didn't work.
So, I rebuilt the package using the debug mode, which, among other
things, turns off all compiler optimizations, tried again, and this
time it worked. Your mileage may vary, but if Bottle Rocket doesn't
work for you out of the box, try rebuilding in debug mode.
So, now I had a hardware-only solution. The WireSpeed could be
controlled individually without any other hardware, the solution
was inexpensive, and if necessary, flexible. Now, I just needed
to make it smart.
The Software Solution
Here comes the fun stuff -- writing the software. Because Bottle
Rocket came with an interface that could be used from a shell script,
I decided to write a daemon that would:
- Start/Stop with the usual init.d interface
- Be configurable to accommodate the different delays and restrictions
imposed by ISPs and modem builders
- Attempt a reasonable power-cycle strategy (not simply madly
flip the power switch)
- Warn me if the network stayed down after trying "everything"
The configurable parameters of the check-pppoe daemon are shown
in Listing 1. The parameters are:
CYCLETIME -- The existence of the PPP interface of interest
will be checked every CYCLETIME seconds. The number is specified
in "sleep" format.
DSL_PROVIDER -- The Debian PPP infrastructure provides
for a variety of named DSL providers. The default is "dsl-provider".
HOUSECODE -- The X10 address of the module that controls
power to the modem. Since your modem can generally be plugged into
the transceiver, only a single X10 module is generally necessary.
HOUSECODE_OFF_PAUSE -- To allow the modem hardware time
to shut down properly, check-pppoe waits for the specified time
before proceeding after powering off the modem.
HOUSECODE_ON_PAUSE -- To allow the modem hardware time
to come up completely and detect both the wide area and local networks,
this amount of time is allowed to elapse before bringing up the
PPP interface. You'll have to figure out how long your modem takes
by inspection. Time it, then add a few seconds to make sure that
the modem is really up.
POWERCYCLETIME -- The integer number of seconds to wait
once the need to cycle power to the modem has been determined. The
failure modes I've seen with the DSL network are either that the
network connection comes back after the first power cycle (the problem
was with the modem) or the network is down and will be back up "soon".
To keep from blindly flipping the power on the modem, check-pppoe
implements a back-off strategy such that it waits for subsequently
more time (to a configurable maximum) between attempts to cycle
power.
POWERCYCLETIMEINCREMENT -- The integer number of seconds
to increment POWERCYCLETIME if the PPP connection fails to come
up after a power cycle.
POWERCYCLETIMEMAXIMUM -- The integer number of seconds
that is the maximum amount of time between power-cycle attempts.
PPP -- The PPP interface to be checked. It is a simple
existence check. If the PPP interface exists, the network connection
is assumed to be up.
PPP_PAUSE -- The number of seconds (in "sleep" format)
that it takes the PPPoE client to bring up the new connection. You'll
have to time this on your system.
SYSTEMSTARTUP -- The number of seconds (in "sleep" format)
to wait after starting the daemon before testing for the network
connection. If you start this daemon at boot time, you should always
start it after the network interfaces are up. This makes sure that
check-pppoe doesn't get into any unfortunate interaction with the
network startup scripts.
TRY -- The integer number of times that check-pppoe will
attempt to restart the PPPoE client in the hopes that it won't be
necessary to actually cycle power on the modem.
Listing 2 contains the code of the check-pppoe daemon. It's pretty
straightforward. Note that the logger command is used to
emit status information. If additional notification of a system
manager is needed, then modification of syslog.conf to provide the
necessary hooks to the appropriate notification mechanism will be
necessary. The last piece is Listing 3, which is the daemon startup
shell script.
Installing check-pppoe
I don't have enough "round tuits" left to do the right thing and
write a Debian package for this, but it's on my list. Installation
is pretty straightforward:
1. Copy Listing 2 to /usr/local/sbin/check-pppoe.daemon. If you
prefer, you can copy it to /usr/sbin, /sbin, or /opt/sbin.
2. Copy Listing 1 to /etc/check-pppoe/check-pppoe.conf.
3. Copy Listing 3 to /etc/init.d/check-pppoe.
You should make symbolic links as appropriate from the /etc/rc
directories for the run levels, which should have check-pppoe started
and killed.
Conclusion
For $40 in hardware and less than a day of time, I solved a problem
that had driven me nuts for years. This isn't the most robust or
general solution possible, but it works far better than anything
else I could have had for the money. So far, I've had no further
need for automation of this sort of systems administration task,
but given how easy this solution was, the next time I won't hesitate.
Now I can leave the office and not worry about the state of the
network. It's all being taken care of for me.
This code is available for download from the Sys Admin
Web site and from:
http://www.csworks.com/download/StayingAlive.tar.bz2
Resources
Bottle Rocket software -- http://mlug.missouri.edu/~tymm/
Debian Linux -- http://www.debian.org
X10 "Firecracker" Kit -- http://www.x10.com/software/automation_software.html
Dick Munroe is a software engineer, architect, and consultant
with nearly 40 years of software and project experience ranging
from the sublime to the ridiculous. He grinds code and wood from
his offices at Cottage Software Works in Belmont, Massachusetts,
Havana, Florida, and Guanaja, Honduras. When playing, he can frequently
be found at the top of mountains wondering whether they will find
the pieces come springtime or deep in the water worrying whether
that shark is really as hungry as it looks. He can be contacted
at: munroe@csworks.com. |