Questions and Answers

Amy Rich

Q We're installing a number of Solaris 8 boxes on our LAN, and we want to verify that the speed and duplex on the switch match that of the Sun network interfaces during auto-negotiation. If we have to, we'll explicitly set both sides and turn off auto-negotiation. These machines have ce0 interfaces, but ndd doesn't seem to be useful here. I tried:

ndd /dev/ce link_status
ndd /dev/ce link_mode
ndd /dev/ce link_speed

The link_speed shows up as 0. That doesn't seem to be meaningful, since it only seems to have a value of 0 or 1, and there are more than two speeds available. Is there another variable I should be looking at with ndd, or does link_speed have more than a 0/1 setting?

A The ndd command doesn't work with ce interfaces. Use the -k switch to netstat to obtain the information you're looking for:

netstat -k ce0 | egrep 'link_speed|link_status|link_duplex'

The output has the following meaning:

link_up - 0 down, 1 up
link_speed - speed in Mbit/s
link_duplex - 1 half duplex, 2 full duplex, 0 down

Q I just tried upgrading from FreeBSD 5.1 to FreeBSD 5.2 by obtaining the source from CVS and building things from scratch. I followed the documentation at:

http://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/makeworld.html

but it crashed during make installworld. It looks like things installed ok, but obviously the crash should not have happened. Does the normal build procedure not apply to FreeBSD 5 because it's not a STABLE release yet?

A When upgrading FreeBSD from source, always be sure to read /usr/src/UPDATING. The UPDATING file from 5.2 contains the following warning about the build and install process:

20031112:
  The statfs structure has been updated with 64-bit fields to allow 
  accurate reporting of multi-terabyte filesystem sizes. You should
  build world, then build and boot the new kernel BEFORE doing a 
  'installworld' as the new kernel will know about binaries using
  the old statfs structure, but an old kernel will not know about
  the new system calls that support the new statfs structure.
  Note that the backwards compatibility is only present when 
  the kernel is configured with the COMPAT_FREEBSD4 option. Since
  even /bin/sh will not run with a new kernel without said option 
  you're pretty much dead in the water without it. Make sure you have 
  COMPAT_FREEBSD4! Running an old kernel after a 'make world' will
  cause programs such as 'df' that do a statfs system call to fail with 
  a bad system call. Marco Wertejuk <wertejuk@mwcis.com> also reports 
  that cfsd (ports/security/cfs) needs to be recompiled after these 
  changes are installed.

  ****************************DANGER*******************************

  DO NOT make installworld after the buildworld w/o building and
  installing a new kernel FIRST.  You will be unable to build a
  new kernel otherwise on a system with new binaries and an old
  kernel.

Q I'm running sendmail 8.12.10 on our primary MX hosts and our backup MX hosts. Even though our primary MX machines are well connected, we're seeing a great deal of spam mail traveling through our backup MX machines. Is there some vulnerability that spammers are scanning for? I can't figure out why they're hitting both our backup and primary MX hosts.

A Many spammers use the backup MX hosts because some sites that are doing spam filtering are only doing it on their primary MX hosts. Spammers especially try to circumvent hosts using DNSBL rulesets on the primary MX hosts only. If spammers inject mail through the unprotected backup MX hosts, then the primaries will see the spam as coming from the backup MX hosts and not the original blocked host.

Q I'm running an ftp server that allows uploads to a central area. I have a cron job that checks this area and copies files off to a staging directory every half hour. Occasionally files get copied in the midst of uploading, and the copied file is therefore incomplete and corrupted. Is there an easy way to move only files that have been sitting around long enough to have transferred completely? I'm not sure how long that time frame would need to be, though, because uploads vary in size.

A Instead of waiting a specific period of time, check whether the file is open before copying it off into your staging area. You don't say which operating system you're running, but the tool lsof runs on most Unix-like systems. You may also have a tool that comes with the OS, like fuser, that will suffice. Here's some sample output for /var/log/syslog on a Solaris 8 machine:

lsof /var/log/syslog
COMMAND PID USER   FD   TYPE DEVICE SIZE/OFF   NODE NAME
syslogd 150 root    9w  VREG   85,4        0 106343 /var/log/syslog

fuser /var/log/syslog
/var/log/syslog:      150o

Here's some output for /etc/passwd, which isn't open:

lsof /etc/passwd

fuser /etc/passwd
/etc/passwd:

If lsof or fuser doesn't show the file as open, then it's safe to copy. If the file is open, lsof shows the process that has the file open, the PID of the process, the user, the file descriptor number and method of access, the type of node associated with the file, the device numbers of the file, the size of the file or the file offset in bytes, the node number of the file, and the file name. The contents of the FD field have the following meaning:

FD is the File Descriptor number of the file or:

cwd -- Current working directory
Lnn -- Library references (AIX)
ltx -- Shared library text (code and data)
Mxx -- Hex memory-mapped type number xx
m86 -- DOS Merge mapped file
mem -- Memory-mapped file
mmap -- Memory-mapped device
pd -- Parent directory
rt -- Droot directory
txt -- Program text (code and data)
v86 -- VP/ix mapped file

FD is followed by one of these characters, describing the mode under which the file is open:

r -- For read access
w -- For write access
u -- For read and write access
space -- If unknown and no lock character
- -- If unknown and lock character

If the file is open, the fuser output displays the process ID of the program using the file followed by a letter specifying the type of access (some types are only available on certain operating systems):

a -- If the process is using the file as its trace file in /proc
c -- If the process is using the file as its current directory
e -- If the process is using the file as the executable being run
f -- If the process is using the file as an open file
m -- If the process is using the file as a mmaped or shared lib
o -- If the process is using the file as an open file
p -- If the process is using the file as the parent of its current directory
r -- If the process is using the file as root directory
s -- If the process is using the file as a shared lib
t -- If the process is using the file as its text file
y -- If the process is using the file as its controlling terminal

Q I'm running OpenSSH 3.7.1p2 on a Beowulf cluster of Linux machines. When a machine drops out of the cluster and jobs fail over to another node, users are no longer able to connect to the logical cluster name using ssh because the host key has changed. It's non-optimal for them to have to know the name of each cluster node (and self-defeating, because a node might be up or down at any given time), but ssh clearly needs that granularity. The sys admins also need that granularity, since we need to occasionally need to operate on specific machines instead of on the cluster as a whole. What's the best method to deal with a situation like this?

A The issue, as you pointed out, is that the users who are experiencing problems have a known_hosts hostname/key pair entry for the cluster name, and not an individual machine name. When the key changes but the host name stays the same, ssh believes that an IP/hostname hijacking may have taken place and won't allow password authentication. There are a few methods to work around this depending on whether you're more concerned about security or ease of use and whether you have control over the machines from which the users are ssh'ing.

The first method is to put the same host key(s) on every system. Check sshd_config for any HostKey lines and be sure to copy those and the corresponding .pub files to each machine. Now users will always see the same host key no matter which node they connect to. You can also connect to the node by name and not have any problems. The downside is that once the key is compromised, all hosts are compromised.

The second method is to modify each user's known_hosts file so that each node has its own entry and is also tagged with the cluster name. This takes effort on the user's part and may be difficult to keep up with if new nodes are being added or changed continuously. If you rarely add or change nodes, this can be accomplished through a large one-time addition, supplied by the sys admins, to each user's known_hosts file. The new hostname/key pairs would change from:

hostname,IP key-type key

to:

hostname,IP,cluster-name key-type key

For example:

node1,192.168.1.1,cluster,cluster.my.domain ssh-dsa AAAA...
node2,192.168.1.2,cluster,cluster.my.domain ssh-dsa BBBB...
...
nodeN,192.168.1.N,cluster,cluster.my.domain ssh-dsa ZZZZ...

If all of the users are connecting from centralized Unix-like machines that the systems administrators control, add the above entries to the global ssh_known_hosts file instead of modifying the known_hosts file of each individual user. If possible, this is the best option because no effort is required on the part of the user, and the integrity of each node is not compromised when one key is cracked.

Q I have a laptop that I connect to the serial ports of our machines in the datacenter to get console access. Right now I'm using Hyperterm, but it's pretty rotten and I'd like to find something better. Any suggestions for something that will handle Solaris servers and perhaps do ssh, too?

A Since you list Hyperterm, I'm going to presume you're using some form of Microsoft Windows on your laptop. If you want to replace Hyperterm for serial and ssh, try using one of the following:

The free ssh plugin to Teraterm, TTSSH, (http://www.zip.com.au/~roca/ttssh.html) only supports SSH v1.5, so it's not useful if you need to connect to SSH v2 servers, which are more secure.
SecureCRT (http://www.vandyke.com/products/securecrt/) is a commercial product but supports both versions of the SSH protocol as well as serial, telnet, and a few other protocols.

The best solution, though, is to connect all of your machines to one (or multiple, if you have a large number of machines) dedicated console server that supports ssh. That way any machine that supports ssh can remotely connect to a machine's console and there's no need for anyone's laptop to support a serial connection except when configuring the console server for the first time. If you go this route, PuTTY (http://www.chiark.greenend.org.uk/~sgtatham/putty/) might also be worth a look. It's a free product that supports both versions of the SSH protocol. It currently has no serial interface, but it's on the developer's wishlist.

Q I install Solaris on a number of machines and then mirror the boot disks. The disks are always identical, so is there an easy way to format them identically at the same time, or do I have to format by hand on all the mirror disks?

A If you're doing a large number of installations, you should look into using JumpStart. Here's a partial JumpStart profile that would do exactly what you want. It also sets aside 20M of space in slice 7 of each disk for the state databases. I happen to name the spare slices and mount them during the install, because I have finish scripts that automatically install and configure DiskSuite/Volume Manager for me:

install_type    initial_install
system_type     standalone
partitioning    explicit
filesys         rootdisk.s0 1024 / logging
filesys         rootdisk.s1 1024 swap size=512m
filesys         rootdisk.s3 1024 /usr logging
filesys         rootdisk.s4 1024 /var logging
filesys         rootdisk.s7 20 /spare/md0
filesys         rootdisk.s5 free /files logging
filesys         c0t1d0s0 1024 /spare/root
filesys         c0t1d0s1 1024 unnamed
filesys         c0t1d0s3 1024 /spare/usr
filesys         c0t1d0s4 1024 /spare/var
filesys         c0t1d0s7 10 /spare/md1
filesys         c0t1d0s5 free /spare/files

If you opt not to use JumpStart or need to partition a disk after the fact, you can still get the job done with one command chain instead of using format. This example assumes that your boot disk is c0t0d0 and your mirror disk is c0t3d0:

prtvtoc /dev/rdsk/c0t0d0s2 | fmthard -s - /dev/rdsk/c0t3d0s2

Amy Rich, president of the Boston-based Oceanwave Consulting, Inc. (http://www.oceanwave.com), has been a UNIX systems administrator for more than 10 years. She received a BSCS at Worcester Polytechnic Institute, and can be reached at: qna@oceanwave.com.