Cover V14, i12

Article

dec2005.tar

Questions and Answers

Amy Rich

Thanks to Ed Schaefer for pointing out a typographical error in the answer regarding csplit in the October issue. In each of the following lines, the backtics were converted to single quotes when the issue went to print:

numrecs=`grep -c "^${recsep}" $1`
splitcount=`expr ${numrecs} - 2`
for ifile in `ls data*`; do
ofile=`head -1 ${ifile}`
Dennis Lang submitted a one-line awk solution to the same question. I've slightly modified the code he submitted to remove an UUOC and prevent the matching of false positives on the record separator:

awk '/^xxx[0-9].*$/{f=$1;next;}{print >>f;}' input-file
Q We're running the stock Sun SSH that comes with Solaris 9, and we enabled AllowTcpForwarding and X11Forwarding in sshd_config. After HUPing sshd, new connection attempts authenticate and then fail. /var/adm/messages includes lots of errors that say:

Sep 19 13:54:44 hostname sshd[6523]: [ID 800047 auth.error] \
  error: Failed to allocate internet-domain X11 display socket.
Commenting the two entries back out again fixes the issue but, unfortunately, we can't really go without X11 forwarding. Is there a workaround to this issue?

A You've run into another one of Sun's SSH bugs, this time with X11Forwarding. In this case, SPARC patch 118305-04 and x86 patch 117470-03 are at fault. If you back out of whichever patch is installed on your system and install SPARC patch 118335-04 or x86 patch 120463-01, it should fix your problem. Sun documented this issue in infodoc 101834:

http://sunsolve.sun.com/search/document.do?assetkey=1-26-101834-1
Other people have suggested starting sshd in IPv4 mode only by editing the sshd_config file and specifying:

ListenAddress 0.0.0.0
Q We're getting a rash of users who are reporting that they can no longer use POP to send and receive their mail. The users report the following error from their clients:

Your server has unexpectedly terminated the connection. Possible causes
for this include server problems, network problems, or a long period of
inactivity. Account: 'username@our.domain', Server: 'mail.our.domain',
Protocol: POP3, Server Response: '+OK 2019 octets', Port: 110,
Secure(SSL): No, Socket Error: 10053, Error Number: 0x800CCC0F
The common link with all these users seems to be that they're scanning their messages with Norton AntiVirus. The only fix we've been able to suggest so far has been to turn off input message scanning. We'd like a better answer for our users so they can re-enable message scanning. Any suggestions?

A The problem you're running into is that Norton stealthily breaks TLS encryption between the client and server so it can scan the now-non-encrypted messages. It should be implemented to scan the messages before making an outgoing SMTP connection instead of after the connection is made, but that's not the way the software was designed. Your users can use POP over SSL on port 995 to retrieve mail (or IMAP SSL on port 993 if they want to switch to IMAP) and the submission port (587) to submit mail. Norton only scans messages on ports 110 (pop3) and 25 (smtp). This means, of course, that the messages are still not being scanned, but Norton will not complain. For a real scanning solution, you'll need to switch to another product.

Q I just started a new job where I'm the only Unix systems administrator. I've been trying to gather information about all of the new machines, and I'm running into a problem with one specific host. Two of the file servers I've inherited as part of my new position are identically configured E450s and fully populated with disks. One of these machines shows all 20 disks when I run prtdiag -v, but the other only shows disks 0 through 3. I know that the disks are functioning just fine because they're in use on the fileserver. So, why can't I see them all?

A The default Ultra Enterprise 450 configuration only supports four disk drives connected to the internal backplane. To support the 20 drives in your system, two 8-bay storage expansion kits were installed as an upgrade. As part of the upgrade, you need to set a variable, disk-led-assoc, in the OBP to set up the mapping between disk slots and the physical and logical device names. This is covered as part of Sun infodoc 16735:

http://sunsolve.sun.com/search/document.do?assetkey=1-9-16735-1
From the OBP, you need to run:

setenv disk-led-assoc 0 x y
where x is an integer between 1 and 10 identifying the rear panel PCI slot number where the lower UltraSCSI controller is installed, and y is an integer between 1 and 10 identifying the rear panel PCI slot number where the upper UltraSCSI controller is installed. Slot 0 is the internal controller. If the other controller cards are installed in slots 5 and 7, the command would be:

setenv disk-led-assoc 0 5 7
Once you set this variable, reset the system and then do a reconfiguration reboot with boot -r.

Q Our boot disks are encapsulated using SVM under Solaris 9, and we have an external RAID 5 set attached. We've somehow managed to hose things quite spectacularly, and we need to boot from the JumpStart image on the network to try and repair things. This would be easy if we just ripped out SVM and booted off one of the unencapsulated drives, but we need to be able to access the RAID 5 device. Unfortunately, the JumpStart image doesn't recognize SVM devices. I'm sure there must be a way around this, but I'm not sure how to make the JumpStart image read the RAID 5 device. Can you offer any suggestions?

A Information on how to access a RAID 5 stripe set while booting off the CD-ROM is covered in Sun infodoc 75210:

http://sunsolve.sun.com/search/document.do?assetkey=1-25-75210-1
The procedure for accessing it from a network boot is pretty much the same. To begin, boot single user mode off the network JumpStart image (this assumes that net is the network where your JumpStart image is located):

boot net -s
Determine the id of the the SVM metadevice driver:

# modinfo | grep md
17  11be592  2d1b3  85  1  md (Solaris Volume Manager base mod)
46 7824c000   d0c5   -  1  md_trans (Solaris Volume Manager trans mo)
47 7823c000   ed04   -  1  md_raid (Solaris Volume Manager raid mod)
48 7825a000   2a03   -  1  md_hotspares (Solaris Volume Manager hot spar)
49 78178000   4c3c   -  1  md_sp (Solaris Volume Manager soft par)
50  139f480   5498   -  1  md_stripe (Solaris Volume Manager stripes )
51  13a448c  12006   -  1  md_mirror (Solaris Volume Manager mirrors )
68  134adfd   107d   -  1  md5 (MD5 Message-Digest Algorithm)
246 7819f1d7  1004   -  1  md_notify (Solaris Volume Manager notifica)
Then unload the Solaris Volume Manager base module:

modunload -i 17
Once you've unloaded the module, mount one of the unencapsulated boot devices (the directions below assume that your root filesystem is on c0t0d0s0) and copy the metadevice driver configuration over to the running OS:

mount -r /dev/dsk/c0t0d0s0 /a
cp /a/kernel/drv/md.conf /kernel/drv/md.conf
umount /a
Now reload the md driver. This time it will read the information you copied from your boot disk:

modload /kernel/drv/md
metasync -r
All of your original metadevice information should be available to commands like metastat and metadb now, and you should be able to mount the RAID 5 filesystem under /a.

Q We're running a pretty vanilla Apache 1.3.33 on a load-balanced set of FreeBSD 5.3 servers. We need to schedule some site-wide downtime so we can shuffle a large amount of data around behind the scenes. While we're down, we want to leave one server up, but redirect all traffic to a "we're down right now, please come back after 9:00AM" sort of page. I was going to be clever about this and just set the ErrorDocument to this page, but I realized that the page requires an image as well as the text. This means I need to make allowances for more than one URL that does not redirect. What's the best way to do this?

A Probably the easiest way is to use the RewriteEngine instead of Redirect or RedirectMatch. Say you've replaced index.html with the maintenance page and that includes the image maintenance.png, you'd have a set of rewrite rules like the following:

RewriteEngine   On
RewriteRule     ^/$ - [L]
RewriteRule     ^/index\.html$ - [L]
RewriteRule     ^/maintenance\.png$ - [L]
RewriteRule     ^/.*$ http://www.your.domain/ [R]
Be sure to comment out any other rewrite or redirect rules so you don't have conflicts.

Q We're running a bunch of Solaris 9 machines that have interfaces on both a public and a private network. For performance and security reasons, we're performing non-encrypted file transfers using rsync over the protected network. When we first started configuring this, we ran into an issue where we didn't think things were working because we couldn't get rsh to the machine to work. After a bunch of debugging, we discovered that rsh with no arguments just hung, but if you gave the rsh arguments, it worked fine (and subsequently we were able to get rsync over rsh working fine, too). Even though we got our immediate problem solved, I still want to know why rsh with no arguments fails because we wasted so much time debugging what turned out to be a non-issue.

A The rsh command is designed to connect to a target machine and execute the specified command. If you don't specify a command when you initiate the connection, then you wind up exec'ing rlogin on the local machine instead of rsh. If you'd run a truss of the rsh process, you would have seen lines resembling the following, where rlogin replaces the rsh process:

execve("/usr/bin/rlogin", 0xFFBFFB04, 0xFFBFFB10)  argc   = 2
resolvepath("/usr/bin/rlogin", "/usr/bin/rlogin", 1023)   = 15
resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
stat("/usr/bin/rlogin", 0xFFBFF8D0)                       = 0
If you're seeing the rsh session just hang, there's a good possibility that you've commented out the rlogin entry from /etc/inetd.conf and you've just left rsh enabled. Try uncommenting the entry for rlogin and see whether this fixes your problem:

login  stream  tcp6    nowait  root    /usr/sbin/in.rlogind in.rlogind
Q We're trying to install Oracle 10g on Solaris 9, but we keep failing the section of the validate test that deals with kernel parameters:

Rule [ 170 ]: Kernel params OK?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Description:
------------
This rule verifies if the kernel parameters have been set according to
the installation manual


Test [ FAILED ] :
-----------------
SHMMAXUndef
SHMMNIUndef
SEMMNIUndef
SEMMSLUndef
SEMMNSUndef
SEMVMXUndef =~ KernelOK|Obsoleted


Action:
-------
The kernel parameters have NOT been set according the installation
manual of 10g RDBMS. Please refer to the installation manual.

ReturnValue     Action
--------------------------------------------------------------------
SHMMAXTooSmall  Increase the kernel parameter SHMMAX to 4294967295
SHMMAXUndef     SHMMAX has not been defined and needs to be set
                to 4294967295
SHMMINTooSmall  Increase the kernel parameter SHMMIN to 1
                - ignore this message if your OS is Solaris 9
SHMMINUndef     SHMMIN has not been defined and needs to be set
                to 1 - ignore this message if your OS is Solaris 9
SHMMNITooSmall  Increase the kernel parameter SHMMNI to at least 100
SHMMNIUndef     SHMMNI has not been defined and needs to be set to 100
                or more
SHMSEGTooSmall  Increase the kernel parameter SHMSEG to 10
                - ignore this message if your OS is Solaris 9
SHMSEGUndef     SHMSEG has not been defined and needs to be set
                to 10 - ignore this message if your OS is Solaris 9
SEMMNITooSmall  Increase the kernel parameter SEMMNI to 100
SEMMNIUndef     SEMMNI has not been defined and needs to be set to 100
SEMMSLTooSmall  Increase the kernel parameter SEMMSL to at least 100
SEMMSLNotDef    SEMMSL has not been defined and needs to be set to 100
SEMMNSTooSmall  Increase the kernel parameter SEMMNS to at least 256
SEMMNSUndef     SEMMNS has not been defined and needs to be set to 256
SEMOPMTooSmall  Increase the kernel parameter SEMOPM to at least 100
SEMOPMUndef     SEMOPM has not been defined and needs to be set to 100
SEMVMXTooSmall  Increase the kernel parameter SEMVMX to 32767
SEMVMXUndef     SEMVMX has not been defined and needs to be set to 32767
NOEXEC_USER_STACKTooSmall  Increase the kernel parameter
                NOEXEC_USER_STACK to 1 - ignore this message if your OS
                is Solaris 9
NOEXEC_USER_STACKUndef     NOEXEC_USER_STACK has not been defined and
                needs to be set to 1 - ignore this message if your OS is
                Solaris 9
NoAccess        You do not have access to /etc/sysdef
Obsoleted       With Solaris 10 most shared memory and semaphore
                settings are now obsolete. Consult sunsolve.sun.com and
                documentation for System Admins on Solaris 10 for details.
It says we're missing settings for shmmax, shmmni, semmni, semmsl, and semmns, but we have the following defined in /etc/system:

* Settings for oracle
set noexec_user_stack=1
set semsys:seminfo_semmni=100
set semsys:seminfo_semmns=1024
set semsys:seminfo_semmsl=256
set semsys:seminfo_semvmx=32767
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
* End settings for oracle
If these aren't the settings they want, what should we be using?

A You have the correct settings in /etc/system, but I suspect you're attempting to validate the installation before you actually start Oracle (as you should be). The problem you're running into is the way kernel modules function. Under Solaris, kernel modules are not loaded until they're actually needed by an application. When you run the validation test before starting Oracle itself, the shmsys and semsys modules remain unloaded and the test fails. You can either let it fail during the validation phase (and it will work after Oracle starts and the modules are loaded), or you can be rid of the validation warnings by forceloading the modules at boot time. If you'd rather perform the latter, add the following two lines to /etc/system and reboot:

forceload: sys/shmsys
forceload: sys/semsys
Amy Rich has more than a decade of Unix systems administration experience in various types of environments. Her current roles include that of Senior Systems Administrator for the University Systems Group at Tufts University, Unix systems administration consultant, and author. She can be reached at: qna@oceanwave.com.