Article

nov2005.tar

Questions and Answers

Amy Rich

Q We have several remote offices that are connected by various kinds of pipes. We're looking for a tool that can measure the available bandwidth and determine where our bottlenecks might be. Do you have any suggestions for free software that will run on FreeBSD?

A Take a look at the program pchar, a free reimplementation of Van Jacobson's old pathchar. It's available from:

http://www.kitchenlab.org/www/bmah/Software/pchar/

and in the FreeBSD ports tree as /usr/ports/net/pchar/. The default mode (no options) will probably get you what you want. To give you a feel for the kind of data it supplies, here's the example shown from the manpage for a simple three-hop path:

pchar to dancer.ca.sandia.gov (146.246.246.1) using UDP/IPv4
Packet size increments by 32 to 1500
46 test(s) per repetition
32 repetition(s) per hop
 0:
    Partial loss:     0 / 1472 (0%)
    Partial char:     rtt = 0.657235 ms, (b = 0.000358 ms/B), r2 = 0.989713
                      stddev rtt = 0.004140, stddev b = 0.000006
    Hop char:         rtt = 0.657235 ms, bw = 22333.268771 Kbps
    Partial queueing: avg = 0.000150 ms (418 bytes)
 1: 146.246.243.254 (con243.ca.sandia.gov)
    Partial loss:     0 / 1472 (0%)
    Partial char:     rtt = 0.811278 ms, (b = 0.000454 ms/B), r2 = 0.995401
                      stddev rtt = 0.003499, stddev b = 0.000005
    Hop char:         rtt = 0.154043 ms, bw = 83454.764777 Kbps
    Partial queueing: avg = 0.000153 ms (336 bytes)
 2: 146.246.250.251 (slcon1.ca.sandia.gov)
    Partial loss:     0 / 1472 (0%)
    Partial char:     rtt = 1.044412 ms, (b = 0.002161 ms/B), r2 = 0.999658
                      stddev rtt = 0.004533, stddev b = 0.000006
    Hop char:         rtt = 0.233133 ms, bw = 4686.320952 Kbps
    Partial queueing: avg = 0.000100 ms (46 bytes)
 3: 146.246.246.1 (dancer.ca.sandia.gov)
    Path length:      3 hops
    Path char:        rtt = 1.044412 ms, r2 = 0.999658
    Path bottleneck:  4686.320952 Kbps
    Path pipe:        611 bytes
    Path queueing:    average = 0.000100 ms (46 bytes)

Another program that might be of use is netperf, which is written by people who work at HP and is available from:

http://www.netperf.org/

Netperf primarily measures bulk data transfer and request/response performance using either TCP or UDP and BSD sockets. Some implementation of ttcp (originally written by Mike Muuss and Terry Slattery) might also work for you.

Q As part of our JumpStart install, we're adding the Sun SAN software to our Solaris 9 machines. After the JumpStart finishes, these machines should have access to all of our SAN volumes. We're installing the following packages:

SUNWcfcl	SUNWfcsm
SUNWcfclr	SUNWfcsmx
SUNWcfclx	SUNWjfca
SUNWcfpl	SUNWjfcau
SUNWcfplx	SUNWjfcaux
SUNWfchba	SUNWjfcax
SUNWfchbr	SUNWmdiu
SUNWfchbx	SUNWsan

And these SAN-related patches:

111847-08	113041-07	113044-05	114476-04	114878-08
113039-08	113042-09	113046-01	114477-01
113040-11	113043-09	113049-01	114478-06

Then we reboot to configure the devices and run the following script to turn on mpxio and auto-failback:

echo "running cfgadm to configure fiber controllers"
cfgadm -c configure c2 2>&1 >/dev/null
cfgadm -c configure c3 2>&1 >/dev/null
cfgadm -c configure c4 2>&1 >/dev/null
cfgadm -c configure c5 2>&1 >/dev/null

echo "editing /kernel/drv/scsi_vhci.conf"
sed -e 's/^mpxio-disable="yes"/mpxio-disable="no"/' \
    -e 's/^auto-failback="disable"/auto-failback="enable"/' \
    < /kernel/drv/scsi_vhci.conf > /kernel/drv/scsi_vhci.conf.new
mv /kernel/drv/scsi_vhci.conf.new /kernel/drv/scsi_vhci.conf
chown root:sys /kernel/drv/scsi_vhci.conf
chmod 644 /kernel/drv/scsi_vhci.conf

  reboot -- -r

Even though we've run a reconfig reboot, the machine comes back up but doesn't appear to be using mpxio to manage all the disks. When we run format, the SAN disks show up as follows:

4. c5t20030003BA4D3EAEd6 <SUN-T4-0301 cyl 538 alt 2 hd 12 sec 32>
   /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w20030003ba4d3eae,6
5. c5t20030003BA4D3EAEd12 <SUN-T4-0301 cyl 5506 alt 2 hd 12 sec 32>
   /pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w20030003ba4d3eae,c
6. c6t60003BA4D3D5600042A5DC3C000E1004d0 <SUN-T4-0301 cyl 52 alt 2 \
   hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042a5dc3c000e1004
7. c6t60003BA4D3D5600042A9D4760005EA74d0 <SUN-T4-0301 cyl 2698 alt 2 \
   hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042a9d4760005ea74
8. c6t60003BA4D3D5600042B84E4300005F5Ed0 <SUN-T4-0301 cyl 2698 alt 2 \
   hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042b84e4300005f5e
9. c6t60003BA4D3D5600042B857D300065153d0 <SUN-T4-0301 cyl 5506 alt 2 \
   hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042b857d300065153
10. c6t60003BA4D3D5600042B8553700073A01d0 <SUN-T4-0301 cyl 538 alt 2 \
    hd 12 sec 32> /scsi_vhci/ssd@g60003ba4d3d5600042b8553700073a01

Note that some of these are correct, but the first two are not. I verified that the settings for mpxio-disable and auto-failback were correct in /kernel/drv/scsi_vhci.conf, but no amount of reconfiguration reboots seems to fix the drives with the problem. Do you have any idea why we're seeing this weird behavior and how we can correct it?

A In order for your disks to appear correctly, you need to do a reconfiguration reboot after you configure the controllers with cfgadm but before you enable mpxio. You can split your script into two pieces, placing the second half in /etc/rc2.d to be run when the machine boots back up. Once it runs the first time after the reconfiguration reboot, remove the /etc/rc2.d script so that you don't wind up in a rebooting loop.

Q We just hired a new employee who's supposed to be working from home most of the time, but he can't seem to get SSH working from his home Linux machine. When he tries to connect, he gets this error:

Disconnecting: Corrupted MAC on input.

I'm not sure what SSH would have to do with his MAC address, since we're on drastically different networks, and the MAC address shouldn't be seen by the server at all. Do you have any clue how we might get around this?

A You don't provide any version numbers for the software you're running or what OS you're running on the server, so it's difficult to give an absolute diagnosis. I have heard of people getting the error you describe in a few different cases, though. These seem to be the relevant OpenSSL bug reports:

http://bugzilla.mindrot.org/show_bug.cgi?id=510
http://bugzilla.mindrot.org/show_bug.cgi?id=772
http://bugzilla.mindrot.org/show_bug.cgi?id=845

In one case, there was an issue with OpenSSL for Linux being compiled under the wrong architecture (elf vs. ppc) or using the wrong versions. Recompiling OpenSSL and then OpenSSH fixed that one. Another case reports that some old firmware revisions on Linksys routers exhibit this behavior. In those cases, the fix was to upgrade the firmware. Another report I've seen claims that removing the Banner directive in the sshd_config file on the server fixed an issue that reported the message you describe.

Q We're in the process of moving to Solaris 10 and are trying to convert our startup scripts and inetd.conf entries to SMF. Do you know of any good resources for people who need to write their own manifests?

A Presumably you've already read through the docs.sun.com Managing Services section from the Solaris 10 System Administrator Collection:

http://docs.sun.com/app/docs/doc/817-1985/6mhm8o5n0?a=view

For a good primer on writing manifests, take a look at the Solaris Service Management Facility Service Developer Introduction located on Sun's BigAdmin site:

http://bigadmin.com/content/selfheal/sdev_intro.html

You might also be interested Ben Rockwood's SMF Manifest Cheatsheet:

http://www.cuddletech.com/blog/pivot/entry.php?id=182

and the three-part entry on creating an SMF manifest at blogs.sun.com:

http://blogs.sun.com/roller/trackback/yakshaving/Weblog/ \
  creating_an_smf_service_part

Q We're supporting some developers who keep their source in CVS in another set of machines. We're trying to figure out the name of a module that was used some time ago, but no one remembers exactly what it was called. None of us has a shell account on the CVS server, unfortunately, so we can't just look at the files on disk. Is there any way to get a listing of all the modules from the server from a remote machine?

A I don't know of a way to tell CVS to give you a listing of everything it knows about, but you can probably get the information you need (assuming you have some idea of what you're looking for and not just completely guessing) by looking at the CVS history. You can run the following to get the history for the entire repository:

cvs history -a

If you have some idea of when your src was last checked in, you can decrease the output by greping those results for a partial date, too.

Q We're running IP multipathing on a couple of our Solaris 9 machines, and we keep getting the following error on both machines:

in.mpathd[35]: [ID 398532 daemon.error] Cannot meet requested failure
detection time of 10000 ms on (inet ce0) new failure detection time for
group ipmp0 is 25790 ms.

What could be causing this error, and how do we correct it? It doesn't seem to have much impact on the systems, but it worries me.

A The error indicates that whatever source your machines are pinging to verify network connectivity is not returning ICMP within 10 seconds. Ideally, the machine you're pinging (/etc/defaultrouter if you have one configured) should be close enough that it's answering in a timely fashion. If that's not possible, you can set the failure detection time in the file /etc/default/mpathd. From the in.mpathd man page:

The FAILURE_DETECTION_TIME variable specifies the NIC failure detection time for the ICMP echo request probe method of detecting NIC failure. The shorter the failure detection time, the greater the volume of probe traffic. The default value of FAILURE_DETECTION_TIME is 10 seconds. This means that NIC failure will be detected by in.mpathd within 10 seconds. NIC failures detected by the IFF_RUNNING flag being cleared are acted on as soon as the in.mpathd daemon notices the change in the flag. The NIC repair detection time cannot be configured; however, it is defined as double the value of FAILURE_DETECTION_TIME.

The unit of measure in /etc/default/mpathd is milliseconds, so to specify 30 seconds (which seems to be long enough for your hosts), you'd change:

FAILURE_DETECTION_TIME=10000

to:

FAILURE_DETECTION_TIME=30000

The best plan is to find out why it's taking so long to hear back from the host you're probing and fix that, though.

Q We use sudo extensively at our site to allow people access to various commands as root, and this works well. The problem we're running into is that there are a bunch of files that non-root people need to modify (e.g., application owners need to edit Web server and database config files), but we don't want to give them access to a root shell from the editor.

For the time being, we have started installing vim everywhere and making people use sudo rvim so that they can't break out of vi and get a root shell. Unfortunately, not everyone here uses vi as their editor, and we're getting flack for not allowing people to use pico, emacs, joe, whatever. Is there a better way to accomplish what we're trying to do here?

A If you're running a reasonably recent version (1.6.8) of sudo, you can allow people to use the sudoedit <file> (or sudo -e) command instead of sudo <editor> <file>. From the sudo man page:

-e The -e (edit) option indicates that, instead of running a command, the user wishes to edit one or more files. In lieu of a command, the string "sudoedit" is used when consulting the sudoers file. If the user is authorized by sudoers the following steps are taken:

1. Temporary copies are made of the files to be edited with the owner set to the invoking user.

2. The editor specified by the VISUAL or EDITOR environment variables is run to edit the temporary files. If neither VISUAL nor EDITOR are set, the program listed in the editor sudoers variable is used.

3. If they have been modified, the temporary files are copied back to their original location and the temporary versions are removed.

If the specified file does not exist, it will be created. Note that unlike most commands run by sudo, the editor is run with the invoking user's environment unmodified. If, for some reason, sudo is unable to update a file with its edited version, the user will receive a warning and the edited copy will remain in a temporary file.

So if you want to give your service owners the ability to edit the file /usr/local/apache/etc/httpd.conf as root, construct your sudoers entry to look like the following (where serviceowner is your user, and webserver is the machine where serviceowner should have permissions):

serviceowner     webserver=sudoedit /usr/local/apache/etc/httpd.conf

Be sure that you're running at least 1.6.8p9, because previous versions contain a race condition that might allow a user with sudo privileges to run arbitrary commands as root.

Amy Rich has more than a decade of Unix systems administration experience in various types of environments. Her current roles include that of Senior Systems Administrator for the University Systems Group at Tufts University, Unix systems administration consultant, and author. She can be reached at: qna@oceanwave.com.