Questions
and Answers
Amy Rich
Q We have several remote offices
that are connected by various kinds of pipes. We're looking for
a tool that can measure the available bandwidth and determine where
our bottlenecks might be. Do you have any suggestions for free software
that will run on FreeBSD?
A Take a look at the program pchar,
a free reimplementation of Van Jacobson's old pathchar. It's
available from:
http://www.kitchenlab.org/www/bmah/Software/pchar/
and in the FreeBSD ports tree as /usr/ports/net/pchar/. The
default mode (no options) will probably get you what you want. To
give you a feel for the kind of data it supplies, here's the example
shown from the manpage for a simple three-hop path:
pchar to dancer.ca.sandia.gov (146.246.246.1) using UDP/IPv4
Packet size increments by 32 to 1500
46 test(s) per repetition
32 repetition(s) per hop
0:
Partial loss: 0 / 1472 (0%)
Partial char: rtt = 0.657235 ms, (b = 0.000358 ms/B), r2 = 0.989713
stddev rtt = 0.004140, stddev b = 0.000006
Hop char: rtt = 0.657235 ms, bw = 22333.268771 Kbps
Partial queueing: avg = 0.000150 ms (418 bytes)
1: 146.246.243.254 (con243.ca.sandia.gov)
Partial loss: 0 / 1472 (0%)
Partial char: rtt = 0.811278 ms, (b = 0.000454 ms/B), r2 = 0.995401
stddev rtt = 0.003499, stddev b = 0.000005
Hop char: rtt = 0.154043 ms, bw = 83454.764777 Kbps
Partial queueing: avg = 0.000153 ms (336 bytes)
2: 146.246.250.251 (slcon1.ca.sandia.gov)
Partial loss: 0 / 1472 (0%)
Partial char: rtt = 1.044412 ms, (b = 0.002161 ms/B), r2 = 0.999658
stddev rtt = 0.004533, stddev b = 0.000006
Hop char: rtt = 0.233133 ms, bw = 4686.320952 Kbps
Partial queueing: avg = 0.000100 ms (46 bytes)
3: 146.246.246.1 (dancer.ca.sandia.gov)
Path length: 3 hops
Path char: rtt = 1.044412 ms, r2 = 0.999658
Path bottleneck: 4686.320952 Kbps
Path pipe: 611 bytes
Path queueing: average = 0.000100 ms (46 bytes)
Another program that might be of use is netperf, which is written
by people who work at HP and is available from:
http://www.netperf.org/
Netperf primarily measures bulk data transfer and
request/response performance using either TCP or UDP and BSD sockets.
Some implementation of ttcp (originally written by Mike Muuss
and Terry Slattery) might also work for you.
Q As part of our JumpStart install,
we're adding the Sun SAN software to our Solaris 9 machines. After
the JumpStart finishes, these machines should have access to all
of our SAN volumes. We're installing the following packages:
SUNWcfcl SUNWfcsm
SUNWcfclr SUNWfcsmx
SUNWcfclx SUNWjfca
SUNWcfpl SUNWjfcau
SUNWcfplx SUNWjfcaux
SUNWfchba SUNWjfcax
SUNWfchbr SUNWmdiu
SUNWfchbx SUNWsan
And these SAN-related patches:
111847-08 113041-07 113044-05 114476-04 114878-08
113039-08 113042-09 113046-01 114477-01
113040-11 113043-09 113049-01 114478-06
Then we reboot to configure the devices and run the following script
to turn on mpxio and auto-failback:
echo "running cfgadm to configure fiber controllers"
cfgadm -c configure c2 2>&1 >/dev/null
cfgadm -c configure c3 2>&1 >/dev/null
cfgadm -c configure c4 2>&1 >/dev/null
cfgadm -c configure c5 2>&1 >/dev/null
echo "editing /kernel/drv/scsi_vhci.conf"
sed -e 's/^mpxio-disable="yes"/mpxio-disable="no"/' \
-e 's/^auto-failback="disable"/auto-failback="enable"/' \
< /kernel/drv/scsi_vhci.conf > /kernel/drv/scsi_vhci.conf.new
mv /kernel/drv/scsi_vhci.conf.new /kernel/drv/scsi_vhci.conf
chown root:sys /kernel/drv/scsi_vhci.conf
chmod 644 /kernel/drv/scsi_vhci.conf
reboot -- -r
Even though we've run a reconfig reboot, the machine comes back up
but doesn't appear to be using mpxio to manage all the disks.
When we run format, the SAN disks show up as follows:
4. c5t20030003BA4D3EAEd6 <SUN-T4-0301 cyl 538 alt 2 hd 12 sec 32>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w20030003ba4d3eae,6
5. c5t20030003BA4D3EAEd12 <SUN-T4-0301 cyl 5506 alt 2 hd 12 sec 32>
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w20030003ba4d3eae,c
6. c6t60003BA4D3D5600042A5DC3C000E1004d0 <SUN-T4-0301 cyl 52 alt 2 \
hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042a5dc3c000e1004
7. c6t60003BA4D3D5600042A9D4760005EA74d0 <SUN-T4-0301 cyl 2698 alt 2 \
hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042a9d4760005ea74
8. c6t60003BA4D3D5600042B84E4300005F5Ed0 <SUN-T4-0301 cyl 2698 alt 2 \
hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042b84e4300005f5e
9. c6t60003BA4D3D5600042B857D300065153d0 <SUN-T4-0301 cyl 5506 alt 2 \
hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042b857d300065153
10. c6t60003BA4D3D5600042B8553700073A01d0 <SUN-T4-0301 cyl 538 alt 2 \
hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042b8553700073a01
Note that some of these are correct, but the first two are not. I
verified that the settings for mpxio-disable and auto-failback
were correct in /kernel/drv/scsi_vhci.conf, but no amount of
reconfiguration reboots seems to fix the drives with the problem.
Do you have any idea why we're seeing this weird behavior and how
we can correct it?
A In order for your
disks to appear correctly, you need to do a reconfiguration reboot
after you configure the controllers with cfgadm but before you enable
mpxio. You can split your script into two pieces, placing the
second half in /etc/rc2.d to be run when the machine boots
back up. Once it runs the first time after the reconfiguration reboot,
remove the /etc/rc2.d script so that you don't wind up in a
rebooting loop.
Q We just hired a new employee who's
supposed to be working from home most of the time, but he can't
seem to get SSH working from his home Linux machine. When he tries
to connect, he gets this error:
Disconnecting: Corrupted MAC on input.
I'm not sure what SSH would have to do with his MAC address, since
we're on drastically different networks, and the MAC address shouldn't
be seen by the server at all. Do you have any clue how we might get
around this?
A You don't provide any version
numbers for the software you're running or what OS you're running
on the server, so it's difficult to give an absolute diagnosis.
I have heard of people getting the error you describe in a few different
cases, though. These seem to be the relevant OpenSSL bug reports:
http://bugzilla.mindrot.org/show_bug.cgi?id=510
http://bugzilla.mindrot.org/show_bug.cgi?id=772
http://bugzilla.mindrot.org/show_bug.cgi?id=845
In one case, there was an issue with OpenSSL for Linux being compiled
under the wrong architecture (elf vs. ppc) or using
the wrong versions. Recompiling OpenSSL and then OpenSSH fixed that
one. Another case reports that some old firmware revisions on Linksys
routers exhibit this behavior. In those cases, the fix was to upgrade
the firmware. Another report I've seen claims that removing the Banner
directive in the sshd_config file on the server fixed an issue
that reported the message you describe.
Q We're in the process of moving
to Solaris 10 and are trying to convert our startup scripts and
inetd.conf entries to SMF. Do you know of any good resources
for people who need to write their own manifests?
A Presumably you've already read
through the docs.sun.com Managing Services section from the
Solaris 10 System Administrator Collection:
http://docs.sun.com/app/docs/doc/817-1985/6mhm8o5n0?a=view
For a good primer on writing manifests, take a look at the Solaris
Service Management Facility Service Developer Introduction located
on Sun's BigAdmin site:
http://bigadmin.com/content/selfheal/sdev_intro.html
You might also be interested Ben Rockwood's SMF Manifest Cheatsheet:
http://www.cuddletech.com/blog/pivot/entry.php?id=182
and the three-part entry on creating an SMF manifest at blogs.sun.com:
http://blogs.sun.com/roller/trackback/yakshaving/Weblog/ \
creating_an_smf_service_part
Q We're supporting some
developers who keep their source in CVS in another set of machines.
We're trying to figure out the name of a module that was used some
time ago, but no one remembers exactly what it was called. None of
us has a shell account on the CVS server, unfortunately, so we can't
just look at the files on disk. Is there any way to get a listing
of all the modules from the server from a remote machine?
A I don't know of a way to tell
CVS to give you a listing of everything it knows about, but you
can probably get the information you need (assuming you have some
idea of what you're looking for and not just completely guessing)
by looking at the CVS history. You can run the following to get
the history for the entire repository:
cvs history -a
If you have some idea of when your src was last checked in, you can
decrease the output by greping those results for a partial
date, too.
Q We're running IP multipathing
on a couple of our Solaris 9 machines, and we keep getting the following
error on both machines:
in.mpathd[35]: [ID 398532 daemon.error] Cannot meet requested failure
detection time of 10000 ms on (inet ce0) new failure detection time for
group ipmp0 is 25790 ms.
What could be causing this error, and how do we correct it? It doesn't
seem to have much impact on the systems, but it worries me.
A The error indicates that whatever
source your machines are pinging to verify network connectivity
is not returning ICMP within 10 seconds. Ideally, the machine you're
pinging (/etc/defaultrouter if you have one configured) should
be close enough that it's answering in a timely fashion. If that's
not possible, you can set the failure detection time in the file
/etc/default/mpathd. From the in.mpathd man page:
The FAILURE_DETECTION_TIME variable specifies the NIC failure
detection time for the ICMP echo request probe method of detecting
NIC failure. The shorter the failure detection time, the greater
the volume of probe traffic. The default value of FAILURE_DETECTION_TIME
is 10 seconds. This means that NIC failure will be detected by in.mpathd
within 10 seconds. NIC failures detected by the IFF_RUNNING flag
being cleared are acted on as soon as the in.mpathd daemon notices
the change in the flag. The NIC repair detection time cannot be
configured; however, it is defined as double the value of FAILURE_DETECTION_TIME.
The unit of measure in /etc/default/mpathd is milliseconds,
so to specify 30 seconds (which seems to be long enough for your
hosts), you'd change:
FAILURE_DETECTION_TIME=10000
to:
FAILURE_DETECTION_TIME=30000
The best plan is to find out why it's taking so long to hear back
from the host you're probing and fix that, though.
Q We use sudo extensively
at our site to allow people access to various commands as root,
and this works well. The problem we're running into is that there
are a bunch of files that non-root people need to modify (e.g.,
application owners need to edit Web server and database config files),
but we don't want to give them access to a root shell from the editor.
For the time being, we have started installing vim everywhere
and making people use sudo rvim so that they can't break
out of vi and get a root shell. Unfortunately, not everyone
here uses vi as their editor, and we're getting flack for
not allowing people to use pico, emacs, joe,
whatever. Is there a better way to accomplish what we're trying
to do here?
A If you're running a reasonably
recent version (1.6.8) of sudo, you can allow people to use
the sudoedit <file> (or sudo -e) command instead
of sudo <editor> <file>. From the sudo man page:
-e The -e (edit) option indicates that, instead of running a command,
the user wishes to edit one or more files. In lieu of a command,
the string "sudoedit" is used when consulting the sudoers file.
If the user is authorized by sudoers the following steps are taken:
1. Temporary copies are made of the files to be edited with the
owner set to the invoking user.
2. The editor specified by the VISUAL or EDITOR environment variables
is run to edit the temporary files. If neither VISUAL nor EDITOR
are set, the program listed in the editor sudoers variable is used.
3. If they have been modified, the temporary files are copied
back to their original location and the temporary versions are removed.
If the specified file does not exist, it will be created. Note
that unlike most commands run by sudo, the editor is run with the
invoking user's environment unmodified. If, for some reason, sudo
is unable to update a file with its edited version, the user will
receive a warning and the edited copy will remain in a temporary
file.
So if you want to give your service owners the ability to edit
the file /usr/local/apache/etc/httpd.conf as root, construct
your sudoers entry to look like the following (where serviceowner
is your user, and webserver is the machine where serviceowner
should have permissions):
serviceowner webserver=sudoedit /usr/local/apache/etc/httpd.conf
Be sure that you're running at least 1.6.8p9, because previous versions
contain a race condition that might allow a user with sudo
privileges to run arbitrary commands as root.
Amy Rich has more than a decade of Unix systems administration
experience in various types of environments. Her current roles include
that of Senior Systems Administrator for the University Systems
Group at Tufts University, Unix systems administration consultant,
and author. She can be reached at: qna@oceanwave.com. |