Article

feb2005.tar

Questions and Answers

Amy Rich

Q We have a number of Netra T1 105 machines that we purchased from a company that was going out of business. Unfortunately the company that previously owned these machines set the eeprom password and put them into command-secure mode and we can't get beyond it. How can we reset the prom password so that we can reinstall the OS?

A Since security-mode is only set to command, not full, and you have physical access to the machine, you can access the eeprom from the running OS. If you have the root password to the machine, simply boot it and use the eeprom command to turn off security-mode and/or change the security-password. If the disk was wiped clean before it was sold to you, you can try booting and hope that the diag-device is set to network (where you can put an OS image -- especially easy if you already have a jumpstart server set up) or the CDROM.

If there's an existing OS image but you don't have the root password, or if the machine has no useful diag device set, you can relocate the boot disk to a machine with no eeprom password. Then you can either zero out or reset the root password on the Netra's boot disk, or you can just use the spare machine to install an entirely new OS image with a root password that you know.

If all else fails, you can replace the NVRAM chip with a new one, but that's generally an unnecessary last resort.

Q I come from a Linux background, but I've just been hired as a sys admin at a site that's primarily HP-based. I'm looking for information on creating snapshot volumes on HP/UX 11.00 like I would under Linux. Is there something similar to the following Linux command for HP:

lvcreate -s -L 256m -n snap /dev/vg00/lvol1

The above creates a snapshot linked to the lvol1 volume in the vg00 group.

A If you want to create snapshots with HP/UX 11.00, search for the OnlineJFS documentation and the documentation on using a Snapshot Filesystem for backups at http://docs.hp.com. OnlineJFS is essentially an HP re-branded version of Veritas Volume Manager. With OnlineJFS, a snapshot filesystem is created with the mount command:

mount -F vxfs -o snapof=special|mount_point,snapsize=snapshot_size \
snapshot_special snapshot_mount_point

Once the filesystem is un-mounted, the snapshot ceases to exist. If the filesystem is re-mounted, a fresh snapshot is taken.

Q I'm running OS X 10.3.5 on my laptop and using the Fink Commander GUI to help me install/upgrade third-party freeware. I've been doing this ever since the 10.2 days with no problems, but recently when I've tried to do a self-update, I've gotten errors and it dies. Here's the output of the self-update command from within Fink Commander:

I will now run the cvs command to retrieve the latest package 
descriptions. The 'su' command will be used to run the cvs command 
as the user 'arr'. After that, the core packages will be updated 
right away; you should then update the other packages using 
commands like 'fink update-all'. 

/usr/bin/su dsr -c 'cvs -q -z3 update -d -P'
? stamp-rel-0.6.2
? 10.2-gcc3.3/stable/main/finkinfo/base/dpkg-bootstrap.info
? 10.2-gcc3.3/stable/main/finkinfo/base/fink-0.16.0-1.info
.
.
.
? 10.3/stable/main/finkinfo/x11-system/xfree86-base.patch
? 10.3/stable/main/finkinfo/x11-wm/sawfish-1.1-16.info
C 10.3/stable/crypto/finkinfo/gaim-ssl.info
M 10.3/stable/crypto/finkinfo/kdebase3-ssl.info
M 10.3/stable/crypto/finkinfo/kdelibs3-ssl.info
### execution of /usr/bin/su failed, exit code 1
Failed: Updating using CVS failed. Check the error messages above.

No matter what cvs commands I try, or whether I try them from the command line or from within the GUI, nothing seems to get past that point. As far as I know, nothing changed between the last time I built packages and now, so I'm not sure why it's failing. Do you have any hints?

A Since you aren't getting any specific errors from the cvs command, it's likely that there's a problem with a file in the /sw/fink/dists tree. There are three irregularities in your output, the lines:

C 10.3/stable/crypto/finkinfo/gaim-ssl.info
M 10.3/stable/crypto/finkinfo/kdebase3-ssl.info
M 10.3/stable/crypto/finkinfo/kdelibs3-ssl.info

In CVS, the leading tags have special meaning. From the man page:

M file The file is modified in your working directory. M can indicate one of two states for a file you're working on: either there were no modifications to the same file in the repository, so that your file remains as you last saw it; or there were modifications in the repository as well as in your copy, but they were merged successfully, without conflict, in your working directory.

C file A conflict was detected while trying to merge your changes to file with changes from the source repository. file (the copy in your working directory) is now the result of merging the two versions; an unmodified copy of your file is also in your working directory, with the name .#file.version, where version is the revision that your modified file started from. (Note that some systems automatically purge files that begin with .# if they have not been accessed for a few days. If you intend to keep a copy of your original file, it is a very good idea to rename it.)

The most likely culprit is the file 10.3/stable/crypto/finkinfo/gaim-ssl.info. Perhaps you've made a manual change to that file and now the maintainer of the gaim-ssl package has also changed it. You should move that file aside and try running the update again to see if that fixes the issue:

sudo rm /sw/fink/10.2/unstable/main/finkinfo/libs/db31-3.1.17-6.info
sudo fink selfupdate-cvs

You might also want to check the two kde info files to see why they're tagged with an M and/or move them aside as well.

Q We're running sendmail 8.13.1 on Fedora Core. I've configured sendmail so that it should be limiting the number of connections per second differently for specific domains and IP ranges. Our base rate is set at 50, but we have some offsite machines under the same corporate umbrella with which we exchange a lot of email and therefore need to bump up the maximum connection rate. Unfortunately, everyone seems to be getting throttled at 30. Here's the access file:

ClientRate:other.corp.domain    100
...
ClientRate:192.168.100          100
...
ClientRate:127.0.0.1            0
ClientRate:                     50

The following should be the only pertinent line in my mc file:

FEATURE('ratecontrol')dnl

A You're using the wrong syntax for the ClientRate statements. This snippet from the sendmail 8.13 cf/README file explains the correct usage:

ratecontrol Enable simple ruleset to do connection rate control checking. This requires entries in access_db of the form

ClientRate:IP.ADD.RE.SS LIMIT

Instead of specifying domain names, FQDNs, or IP blocks, you need to specify specific IP addresses. It is hopefully that the sites from which you need to accept heavier mail loads are using a handful of relay servers so that you don't need to specify a large number of IPs in the mc file.

Q I'm running Solaris 8 on a SunFire 4800 and trying to apply some patches to the system. Every time I try to run patchadd, regardless of patch, I get the following message (with the full path to the specific checkinstall program omitted since it happens with all of them):

checkinstall: cannot open
pkgadd: ERROR: checkinstall script did not complete successfully
Dryrun complete.
No changes were made to the system.

The checkinstall file is in fact there, so I'm not sure what it's saying it can't open. Any clues what the problem might be?

A When you receive checkinstall errors during a patchadd, it usually indicates that all of the files in the patch cannot be read by the user install, or nobody, or that neither user exists. Patchadd will attempt to execute the checkinstall script as the user install and then fall back to the user nobody if the install user doesn't exist.

If you have an install user, make sure that all directories leading up to and including all the patch files, as well as the files themselves, are readable by install. If you're storing patches in an unreadable directory, either change the permissions as needed or move the patches to someplace like /tmp where the permissions are relaxed. If you don't have an install user, make sure the nobody user is in both the /etc/passwd and /etc/shadow files. Also apply the same rules for directory and file readability as described above for the install user.

Q Recently our company's DNS server started having issues resolving addresses on the first try. The first attempted resolution results in a "host not found" error. If the address is tried a second time, though, it appears to work fine. This server is an OS X 10.3.5 machine running BIND 9.2.2, and it's serving a variety of different machines. Might you have any idea what the issue is and how I could fix it?

A You don't provide any debugging output from named or dig, but based on the date your question was submitted, I'd hazard a guess that you are being bitten by an issue that's plaguing a number of OS X administrators. The two possible problem/answers seem to be that the firewall is blocking large EDNS udp packets, and you need to allow for those queries to be passed or add the following to the options subcategory of the named.conf for newer versions of bind:

edns-udp-size 512;

The second issue is conjectured to be one with the IPv6 implementation on some BSD-based operating systems. To work around this, upgrade to BIND 9.3 and force only IPv4 resolution by invoking named as:

named -4

Q We have a new Netra 440 server running Solaris 9 that we're using to handle Web services for our department. Things go fine for a while, but then the machine mysteriously reboots with no errors logged to syslog. To track down the issue, I hooked up a console server and logged the output to that as well, just in case the machine was dying before it had a chance to throw an error and log it to disk. The only output it logged before each crash was the following:

Fatal Error Reset
SC Alert: Host System has Reset

You probably can't tell what the issue is from this little information, but I was hoping that you might point me in the right direction when it comes to exploring this problem further.

A For starters, you might want to guarantee that you have core dumps saved after you reboot. Take a look at the file /etc/dumpadm.conf and make sure it says something along the lines of:

DUMPADM_DEVICE=/dev/dsk/c0t1d0s1
DUMPADM_SAVDIR=/var/crash/you.host.name
DUMPADM_CONTENT=kernel
DUMPADM_ENABLE=yes

The last line shows that savecore is enabled. If you're using Veritas Volume Manager or Solaris Volume Manager, you'll need to set the dump device to one of the mirrors to make sure it doesn't get overwritten when the filesystem drivers take over. For Solaris Volume Manager, the DUMPADM_DEVICE line would look more like:

DUMPADM_DEVICE=/dev/md/dsk/d1

To modify the dumpadm.conf file, use the command dumpadm(1m) instead of editing the file directly.

If you get a core file, you can send it to Sun Support for analysis once you open up a case, or you can try to analyze it yourself.

Because you specifically mention the hardware as a Netra 440, though, there's a good chance that you're seeing the issue described in Document 57618:

http://sunsolve8.sun.com/search/document.do?assetkey=1-26-57618-1

If you don't have a Sunsolve account, the document basically states that there are problems with the ce0 or net0 interfaces on those machines. The error message matches what you're seeing exactly. To gather more diagnostic data to determine whether this is the issue, they suggest setting the following from the OBP:

diag-switch?    true
post-trigger    none
obdiag-trigger  none

You're experiencing this issue if the reset reason includes PBM FATAL with a PCI IO-BRIDGE register output similar to:

ha019 console login:

Fatal Error Reset
SC Alert: Host System has Reset

@(#)OBP 4.10.10 2003/08/29 06:25 Sun Fire V440
Clearing TLBs
Loading Configuration
Membase: 0000.0033.0000.0000
MemSize: 0000.0000.4000.0000
Init CPU arrays Done
Init E$ tags Done
Setup TLB Done
MMUs ON
Scrubbing Tomatillo tags... 0 1
Block Scrubbing Done
Find dropin, Copying Done, Size 0000.0000.0000.5ca0
PC = 0000.07ff.f000.4c88
PC = 0000.0000.0000.4d28
Find dropin, (copied), Decompressing Done, Size 0000.0000.0006.6700
ttya initialized
System Reset: (PBM FATAL)
JBUS-PCI bridge
JBUS-PCI bridge
slave Error Register: 8000000000001000

To work around the issue, they suggest disabling the ce0 (net0) interface and from the OBP and only using the ce1/net1 interface:

nvedit
0: probe-all install-console banner
1: " /pci@1c,600000/network@2" $delete-device drop
2:
^C
nvstore
setenv use-nvramrc? true
reset-all

If you definitely need another network interface, the document suggests adding another PCI network card to the machine. A permanent fix for the issue is still in the works according to the document.

Amy Rich, president of the Boston-based Oceanwave Consulting, Inc. (http://www.oceanwave.com), has been a UNIX systems administrator for more than 10 years. She received a BSCS at Worcester Polytechnic Institute, and can be reached at: qna@oceanwave.com.