Cover V14, i09

Article

sep2005.tar

Questions and Answers

Amy Rich

Q I'm running Solaris 10 on a V120, and it's acting as one of our DNS servers. Unfortunately, every time the machine reboots, the DNS server is disabled. To check diagnostic output, I ran:

svcs -x dns/server
which showed the following output:

   svc:/network/dns/server:default (?)
 State: disabled since Tue Jun 19 13:37:44 2005
Reason: Disabled by an administrator.
   See: http://sun.com/msg/SMF-8000-05
   See: named(1M)
Impact: This service is not running.
To fix it, I do:

svcadm enable dns/server
which brings up the service just fine, and everything works until the machine is rebooted again. How do I permanently tell the service manager to start BIND at boot?

A This is actually a known bug with Solaris 10 and is described in docid 6226796, available to support customers at:

http://sunsolve.sun.com/search/document.do?assetkey=1-1-6226796-1
There's a patch that's been rolled into Solaris Express and should be making its way to the userbase as a Solaris 10 patch as well.

Q I've decided to take the plunge and start testing the fink unstable tree for OS X 10.3.9. Our users want newer versions of software than are available in the stable tree. I added unstable/main and unstable/crypto to the Trees: line of /sw/etc/fink.conf and then used Fink Commander 0.5.3 to do a selfupdate. Everything looked great, and I updated a bunch of packages to the unstable versions. A week later, though, I clicked Update in Fink Commander and received the following output:

  Err file: unstable/main Packages
         File not found
  Ign file: unstable/main Release
  Err file: unstable/crypto Packages
         File not found
  Ign file: unstable/crypto Release
  Hit http://bindist.finkmirrors.net 10.3/release/main Packages
  Hit http://bindist.finkmirrors.net 10.3/release/main Release
  Hit http://bindist.finkmirrors.net 10.3/release/crypto Packages
  Hit http://bindist.finkmirrors.net 10.3/release/crypto Release
  Hit http://bindist.finkmirrors.net 10.3/current/main Packages
  Hit http://bindist.finkmirrors.net 10.3/current/main Release
  Hit http://bindist.finkmirrors.net 10.3/current/crypto Packages
  Hit http://bindist.finkmirrors.net 10.3/current/crypto Release
  Failed to fetch
      file:/sw/fink/dists/unstable/main/binary-darwin-powerpc/Packages \
         File not found
  Failed to fetch
       file:/sw/fink/dists/unstable/crypto/ \
            binary-darwin-powerpc/Packages  File not found
  Reading Package Lists...
  Building Dependency Tree...
  W: Couldn't stat source package list file: unstable/main Packages
  (/sw/var/lib/apt/lists/ \
   _sw_fink_dists_unstable_main_binary-darwin-powerpc_Packages)
       - stat (2 No such file or directory)
  W: Couldn't stat source package list file: unstable/crypto Packages
       (/sw/var/lib/apt/lists/ \
        _sw_fink_dists_unstable_crypto_binary-darwin-powerpc_Packages)
       - stat (2 No such file or directory)
  W: Couldn't stat source package list file: unstable/main Packages
       (/sw/var/lib/apt/lists/ \
        _sw_fink_dists_unstable_main_binary-darwin-powerpc_Packages)
       - stat (2 No such file or directory)
  W: Couldn't stat source package list file: unstable/crypto Packages
       (/sw/var/lib/apt/lists/_sw_fink_dists_unstable \
        _crypto_binary-darwin-powerpc_Packages)
       - stat (2 No such file or directory)
  W: You may want to run apt-get update to correct these problems
  E: Some index files failed to download, they have been ignored, \
     or old ones used instead.
I looked for the files that it complained about, and they weren't there. I would have figured that the download would have picked them up, but that appears to have failed as well. Where can I pick up these files so that I can get updates without having to do a selfupdate every time?

A You missed a step when making your modifications. After you add the necessary lines to /sw/etc/fink.conf and run selfupdate, you must also run scanpackages to build the files that you're missing. If you want to do this from Fink Commander, it's under the Source menu. You can also do this from the command line by running:

sudo fink scanpackages
Q We're trying to JumpStart a SunFire V100 with Solaris 8. The machine finds the boot server and loads the kernel then tries to bring up the Ethernet interfaces. At this point, the machine panics and then resets. Here's the output from the boot sequence:

  boot net1 - install
  Boot device: /pci@1f,0/ethernet@5  File and args: - install
  Timeout waiting for ARP/RARP packet
  Timeout waiting for ARP/RARP packet
  Requesting Internet address for <the machine's ethernet addr>
  SunOS Release 5.8 Version Generic_108528-29 64-bit
  Copyright 1983-2003 Sun Microsystems, Inc.  All rights reserved.
  ip_rput_dlpi(dmfe2): DL_ERROR_ACK for DL_ATTACH_REQ(11), errno 8, unix 0
  ip_rput_dlpi(dmfe2): DL_ERROR_ACK for DL_BIND_REQ(1), errno 3, unix 0
  ip_rput_dlpi(dmfe2): DL_ERROR_ACK for DL_PHYS_ADDR_REQ(49), errno 3, unix 0
  ip_rput_dlpi(dmfe2): DL_ERROR_ACK for DL_UNBIND_REQ(2), errno 3, unix 0
  ip_rput_dlpi(dmfe2): DL_ERROR_ACK for DL_DETACH_REQ(12), errno 3, unix 0
  strplumb: setifname dmfe2 unit 2 IP failed, error 6
  dl_attach: DL_ERROR_ACK bad PPA
  revarp_myaddr: dl_attach failed: error -1
  whoami: revarp_myaddr failed: error -1.
  WARNING: nfsdyn_mountroot: NFS3 mount_root failed: error -1
  Cannot mount root on /pci@1f,0/ethernet@5 fstype nfsdyn

  panic[cpu0]/thread=10408000: vfs_mountroot: cannot mount root

  0000000010407970 genunix:vfs_mountroot+70 (10436800, 0, 0, 104395c8, 10, 14)
    %l0-3: 0000000010436800 0000000010439f20 00000000de000000 \
      0000000010436a28
    %l4-7: 0000000000000000 0000000010413868 00000000000beafd \
      0000000000000afd 0000000010407a20 genunix:main+8c \
      (104101f0, 2000, 10407ec0, 10408030, fff2, 1005 2a0c)
    %l0-3: 0000000000000001 0000000000000001 0000000000000015 \
      0000000000000f36
    %l4-7: 0000000010429638 0000000010472c00 00000000000d7438 \
      0000000000000540

  skipping system dump - no dump device configured
  rebooting...
I've run test net1 from the OBP, and everything checks out okay. I've also run watch-net-all, and we're definitely seeing packets going by. If we let the machine boot off the internal disk, we can bring up both Ethernet interfaces just fine, and we don't see any errors. I'm rather at a loss to determine and correct the problem. If you have any suggestions, I would very much appreciate them.

A The V100s come with two onboard Ethernet devices, dmfe0 and dmfe1. If you look at your error output, you'll notice that it is referencing dmfe2, which doesn't exist. There is a problem with some revisions of Solaris 8 where the getminor() routine returns the wrong value and thereby makes other routines fail. For the Sunsolve bug reports, see the following (for x86 and SPARC, respectively):

http://sunsolve.sun.com/search/document.do?assetkey=1-1-1146859-1
http://sunsolve.sun.com/search/document.do?assetkey=1-1-4482263-1
The workaround is to try network boots from only the first interface, dmfe0. You can also try to use the boot image from a version of Solaris 8 that did not have this problem, such as 05/03.

Q We run Solaris 9 and Sun Cluster 3.1u2 on a two-node cluster. We have an in-house failover application that provides service to clients on the service name at a given port. Sometimes we want to block service to certain users, but we aren't certain how to go about this. Is there a way to handle this in the cluster itself? I also see that Solaris 10 has a package for ipfilter; can we just build the open source version for Solaris 9 and use that?

A While ipfilter is now available as a Solaris package under 10 and the non-Sun version will work for most circumstances under earlier versions of Solaris, you can not use it in conjunction with a cluster. A cluster works by migrating the IP and associated services from one node to another during a failure. ipfilter works by keeping the state of packets on a per-machine basis.

Unfortunately, there's no way for one machine in a cluster to share filtering state information with another machine in a cluster. If you want to filter connections to a service IP, place a device in front of the cluster (a firewall, or maybe a router, if you only want to filter machines outside your segment) and filter there. The most redundant solution is to place a pair of devices that are capable of sharing filtering information in front of your cluster nodes.

Q I'm writing some automated installation scripts for some software we need to install on various OS types. Unfortunately, this software requires input from a human at the keyboard. I haven't really found a way around this, since I don't know of a way that will make the binary install program read from my script instead of STDIN. Do you have any suggestions?

A You don't really give a lot of details about what the installer is looking for or what scripting language you're trying to use. The answer to your problem could be as simple as echoing the input to the installer script (this assumes some sort of shell scripting language like bourne, bash, ksh, csh, tcsh, zsh, etc.):

echo "answer1\nanswer2\nanswer3\nanswer4\n" | ./binary-installer
Most likely, though, you'll want to use something more complex, like the program expect, which will look for certain strings and then respond with answers. Expect, available from http://expect.nist.gov/, is written in Tcl and can also be linked with Tk to create X11-based applications. You can call expect from a shell script or possibly write your entire install script in expect.

There's a utility called autoexpect that watches your interaction with a program and then creates an expect script based on your input. autoexpect does not always create a perfect script, because it must do some guesswork. The autoexpect man page lists some of the most common problems with scripts generated this way:

  • Timing. A surprisingly large number of programs (rn, ksh, zsh, telnet, etc.) and devices (e.g., modems) ignore keystrokes that arrive "too quickly" after prompts. If you find your new script hanging up at one spot, try adding a short sleep just before the previous send.
  • Echoing. Many programs echo characters. Without specific knowledge of the program, it is impossible to know if you are waiting to see each character echoed before typing the next. If autoexpect sees characters being echoed, it assumes that it can send them all as a group rather than interleaving them the way they originally appeared. This makes the script more pleasant to read. However, it could conceivably be incorrect if you really had to wait to see each character echoed.
  • Change. Autoexpect records every character from the interaction in the script. This is desirable because it gives you the ability to make judgments about what is important and what can be replaced with a pattern match. On the other hand, if you use commands whose output differs from run to run, the generated scripts are not going to be correct. For example, the "date" command always produces different output. So, using the date command while running autoexpect is a sure way to produce a script that will require editing in order for it to work.

Workarounds for each of these issues are described in the man page as well.

If you're using Perl instead of a shell-based language, there are several modules/programs you might want to investigate:

http://search.cpan.org/~rgiersig/Expect-1.15/
http://search.cpan.org/~djerius/Expect-Simple-0.02/
http://search.cpan.org/~lbrocard/Test-Expect-0.29/
Q We have a SunFire V240 that's been having problems booting up correctly. I want to get to the OBP to run some diagnostics, but I can't seem to be able to send the machine a break signal to get it there. I've tried using the LOM to power the machine off and back on, but it always boots back into multi-user mode. I've also tried issuing a break once the machine is fully up and running, but that also fails. It looks like it's getting the break sequence and not doing anything with it. I see the following message when trying to send the break:

SC Alert: SC Request to send Break to host.
So, why isn't the break working, and is there a way to force it to the OBP?

A Without being able to see the configuration of your machine, I'm guessing that you've set the alternate break sequence in /etc/default/kbd. Look for a line that says:

KEYBOARD_ABORT=alternate
This is a good idea for machines hooked up to a console server, so that disconnecting the console cable (or the keyboard, if you're not using a console) does not cause the machine to drop to the OBP. This setting takes affect after the OS is loaded and takes control of the machine.

To actually get to the OBP, you can do one of several things:

1. Send the break sequence before the OS loads and the alternate break sequence takes affect.

2. From the running OS, issue an init 0 to bring the machine back to the OBP.

3. From the running OS, issue the alternate break sequence, the following commands in succession:

return
~
control-b
Once you get to the OBP, you'll want to set the auto-boot? variable to false for the time being so you can do a reset and not need to break out. Many diagnostics you'd want to run from the OBP do not deal well with a machine that's been brought to the OBP with a break. When you're done running your diagnostics, set the "auto-boot?" variable back to true, so your machine will automatically boot after a crash.

Q I've just installed Solaris 10 on an old Netra T1 200 we use for testing, and I'm trying to get the DHCP client on it functioning. Our DHCP server is an old BSD box that's not handing out hostnames, so the Netra keeps setting its hostname as unknown. I googled for this problem and found a suggestion that says to put the hostname in /etc/nodename as a fallback. I've tried this, and I have a blank /etc/hostname.eri0 and /etc/dhcp.eri0. I can see the machine register itself with the DHCP daemon on the BSD box and it gets the correct IP, but it's apparently still trying to set its hostname with the information gained from the DHCP server (i.e., nothing). The steps to make this work aren't that difficult to understand, and I think I've followed them all correctly. Did I miss something fundamental?

A It sounds like you have the setup configured correctly, but since you didn't post your config file or your nodename file, I can't be absolutely sure. There was a series of posts in comp.unix.solaris from someone who was having a similar problem. In his case, it turned out that he was missing the newline character after the hostname in the /etc/nodename file. The thread discussing this can be found here:

http://groups-beta.google.com/group/comp.unix.solaris/browse_frm/ \
  thread/12370dc8db852d7c
Amy Rich has more than a decade of Unix systems administration experience in various types of environments. Her current roles include that of Senior Systems Administrator for the University Systems Group at Tufts University, Unix systems administration consultant, and author. She can be reached at: qna@oceanwave.com.