Questions
and Answers
Amy Rich
Thanks to Ed Schaefer for pointing out a typographical error in
the answer regarding csplit in the October issue. In each
of the following lines, the backtics were converted to single quotes
when the issue went to print:
numrecs=`grep -c "^${recsep}" $1`
splitcount=`expr ${numrecs} - 2`
for ifile in `ls data*`; do
ofile=`head -1 ${ifile}`
Dennis Lang submitted a one-line awk solution to the same question.
I've slightly modified the code he submitted to remove an UUOC and
prevent the matching of false positives on the record separator:
awk '/^xxx[0-9].*$/{f=$1;next;}{print >>f;}' input-file
Q We're running
the stock Sun SSH that comes with Solaris 9, and we enabled AllowTcpForwarding
and X11Forwarding in sshd_config. After HUPing sshd,
new connection attempts authenticate and then fail. /var/adm/messages
includes lots of errors that say:
Sep 19 13:54:44 hostname sshd[6523]: [ID 800047 auth.error] \
error: Failed to allocate internet-domain X11 display socket.
Commenting the two entries back out again fixes the issue but, unfortunately,
we can't really go without X11 forwarding. Is there a workaround to
this issue?
A You've run into another one of
Sun's SSH bugs, this time with X11Forwarding. In this case,
SPARC patch 118305-04 and x86 patch 117470-03 are
at fault. If you back out of whichever patch is installed on your
system and install SPARC patch 118335-04 or x86 patch 120463-01,
it should fix your problem. Sun documented this issue in infodoc
101834:
http://sunsolve.sun.com/search/document.do?assetkey=1-26-101834-1
Other people have suggested starting sshd in IPv4 mode only
by editing the sshd_config file and specifying:
ListenAddress 0.0.0.0
Q We're getting a rash of
users who are reporting that they can no longer use POP to send and
receive their mail. The users report the following error from their
clients:
Your server has unexpectedly terminated the connection. Possible causes
for this include server problems, network problems, or a long period of
inactivity. Account: 'username@our.domain', Server: 'mail.our.domain',
Protocol: POP3, Server Response: '+OK 2019 octets', Port: 110,
Secure(SSL): No, Socket Error: 10053, Error Number: 0x800CCC0F
The common link with all these users seems to be that they're scanning
their messages with Norton AntiVirus. The only fix we've been able
to suggest so far has been to turn off input message scanning. We'd
like a better answer for our users so they can re-enable message scanning.
Any suggestions?
A The problem you're running into
is that Norton stealthily breaks TLS encryption between the client
and server so it can scan the now-non-encrypted messages. It should
be implemented to scan the messages before making an outgoing SMTP
connection instead of after the connection is made, but that's not
the way the software was designed. Your users can use POP over SSL
on port 995 to retrieve mail (or IMAP SSL on port 993 if they want
to switch to IMAP) and the submission port (587) to submit mail.
Norton only scans messages on ports 110 (pop3) and 25 (smtp). This
means, of course, that the messages are still not being scanned,
but Norton will not complain. For a real scanning solution, you'll
need to switch to another product.
Q I just started a new job where
I'm the only Unix systems administrator. I've been trying to gather
information about all of the new machines, and I'm running into
a problem with one specific host. Two of the file servers I've inherited
as part of my new position are identically configured E450s and
fully populated with disks. One of these machines shows all 20 disks
when I run prtdiag -v, but the other only shows disks 0 through
3. I know that the disks are functioning just fine because they're
in use on the fileserver. So, why can't I see them all?
A The default Ultra Enterprise
450 configuration only supports four disk drives connected to the
internal backplane. To support the 20 drives in your system, two
8-bay storage expansion kits were installed as an upgrade. As part
of the upgrade, you need to set a variable, disk-led-assoc,
in the OBP to set up the mapping between disk slots and the physical
and logical device names. This is covered as part of Sun infodoc
16735:
http://sunsolve.sun.com/search/document.do?assetkey=1-9-16735-1
From the OBP, you need to run:
setenv disk-led-assoc 0 x y
where x is an integer between 1 and 10 identifying the rear
panel PCI slot number where the lower UltraSCSI controller is installed,
and y is an integer between 1 and 10 identifying the rear panel PCI
slot number where the upper UltraSCSI controller is installed. Slot
0 is the internal controller. If the other controller cards are installed
in slots 5 and 7, the command would be:
setenv disk-led-assoc 0 5 7
Once you set this variable, reset the system and then do a reconfiguration
reboot with boot -r.
Q Our boot disks are encapsulated
using SVM under Solaris 9, and we have an external RAID 5 set attached.
We've somehow managed to hose things quite spectacularly, and we
need to boot from the JumpStart image on the network to try and
repair things. This would be easy if we just ripped out SVM and
booted off one of the unencapsulated drives, but we need to be able
to access the RAID 5 device. Unfortunately, the JumpStart image
doesn't recognize SVM devices. I'm sure there must be a way around
this, but I'm not sure how to make the JumpStart image read the
RAID 5 device. Can you offer any suggestions?
A Information on how to access
a RAID 5 stripe set while booting off the CD-ROM is covered in Sun
infodoc 75210:
http://sunsolve.sun.com/search/document.do?assetkey=1-25-75210-1
The procedure for accessing it from a network boot is pretty much
the same. To begin, boot single user mode off the network JumpStart
image (this assumes that net is the network where your JumpStart
image is located):
boot net -s
Determine the id of the the SVM metadevice driver:
# modinfo | grep md
17 11be592 2d1b3 85 1 md (Solaris Volume Manager base mod)
46 7824c000 d0c5 - 1 md_trans (Solaris Volume Manager trans mo)
47 7823c000 ed04 - 1 md_raid (Solaris Volume Manager raid mod)
48 7825a000 2a03 - 1 md_hotspares (Solaris Volume Manager hot spar)
49 78178000 4c3c - 1 md_sp (Solaris Volume Manager soft par)
50 139f480 5498 - 1 md_stripe (Solaris Volume Manager stripes )
51 13a448c 12006 - 1 md_mirror (Solaris Volume Manager mirrors )
68 134adfd 107d - 1 md5 (MD5 Message-Digest Algorithm)
246 7819f1d7 1004 - 1 md_notify (Solaris Volume Manager notifica)
Then unload the Solaris Volume Manager base module:
modunload -i 17
Once you've unloaded the module, mount one of the unencapsulated boot
devices (the directions below assume that your root filesystem is
on c0t0d0s0) and copy the metadevice driver configuration over
to the running OS:
mount -r /dev/dsk/c0t0d0s0 /a
cp /a/kernel/drv/md.conf /kernel/drv/md.conf
umount /a
Now reload the md driver. This time it will read the information
you copied from your boot disk:
modload /kernel/drv/md
metasync -r
All of your original metadevice information should be available to
commands like metastat and metadb now, and you should
be able to mount the RAID 5 filesystem under /a.
Q We're running a pretty vanilla
Apache 1.3.33 on a load-balanced set of FreeBSD 5.3 servers. We
need to schedule some site-wide downtime so we can shuffle a large
amount of data around behind the scenes. While we're down, we want
to leave one server up, but redirect all traffic to a "we're down
right now, please come back after 9:00AM" sort of page. I was going
to be clever about this and just set the ErrorDocument to
this page, but I realized that the page requires an image as well
as the text. This means I need to make allowances for more than
one URL that does not redirect. What's the best way to do this?
A Probably the easiest way is to
use the RewriteEngine instead of Redirect or RedirectMatch.
Say you've replaced index.html with the maintenance page
and that includes the image maintenance.png, you'd have a
set of rewrite rules like the following:
RewriteEngine On
RewriteRule ^/$ - [L]
RewriteRule ^/index\.html$ - [L]
RewriteRule ^/maintenance\.png$ - [L]
RewriteRule ^/.*$ http://www.your.domain/ [R]
Be sure to comment out any other rewrite or redirect rules so you
don't have conflicts.
Q We're running a bunch of Solaris
9 machines that have interfaces on both a public and a private network.
For performance and security reasons, we're performing non-encrypted
file transfers using rsync over the protected network. When
we first started configuring this, we ran into an issue where we
didn't think things were working because we couldn't get rsh
to the machine to work. After a bunch of debugging, we discovered
that rsh with no arguments just hung, but if you gave the
rsh arguments, it worked fine (and subsequently we were able
to get rsync over rsh working fine, too). Even though
we got our immediate problem solved, I still want to know why rsh
with no arguments fails because we wasted so much time debugging
what turned out to be a non-issue.
A The rsh command is designed
to connect to a target machine and execute the specified command.
If you don't specify a command when you initiate the connection,
then you wind up exec'ing rlogin on the local machine instead
of rsh. If you'd run a truss of the rsh process,
you would have seen lines resembling the following, where rlogin
replaces the rsh process:
execve("/usr/bin/rlogin", 0xFFBFFB04, 0xFFBFFB10) argc = 2
resolvepath("/usr/bin/rlogin", "/usr/bin/rlogin", 1023) = 15
resolvepath("/usr/lib/ld.so.1", "/usr/lib/ld.so.1", 1023) = 16
stat("/usr/bin/rlogin", 0xFFBFF8D0) = 0
If you're seeing the rsh session just hang, there's a good
possibility that you've commented out the rlogin entry from
/etc/inetd.conf and you've just left rsh enabled. Try
uncommenting the entry for rlogin and see whether this fixes
your problem:
login stream tcp6 nowait root /usr/sbin/in.rlogind in.rlogind
Q We're trying to install
Oracle 10g on Solaris 9, but we keep failing the section of the validate
test that deals with kernel parameters:
Rule [ 170 ]: Kernel params OK?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Description:
------------
This rule verifies if the kernel parameters have been set according to
the installation manual
Test [ FAILED ] :
-----------------
SHMMAXUndef
SHMMNIUndef
SEMMNIUndef
SEMMSLUndef
SEMMNSUndef
SEMVMXUndef =~ KernelOK|Obsoleted
Action:
-------
The kernel parameters have NOT been set according the installation
manual of 10g RDBMS. Please refer to the installation manual.
ReturnValue Action
--------------------------------------------------------------------
SHMMAXTooSmall Increase the kernel parameter SHMMAX to 4294967295
SHMMAXUndef SHMMAX has not been defined and needs to be set
to 4294967295
SHMMINTooSmall Increase the kernel parameter SHMMIN to 1
- ignore this message if your OS is Solaris 9
SHMMINUndef SHMMIN has not been defined and needs to be set
to 1 - ignore this message if your OS is Solaris 9
SHMMNITooSmall Increase the kernel parameter SHMMNI to at least 100
SHMMNIUndef SHMMNI has not been defined and needs to be set to 100
or more
SHMSEGTooSmall Increase the kernel parameter SHMSEG to 10
- ignore this message if your OS is Solaris 9
SHMSEGUndef SHMSEG has not been defined and needs to be set
to 10 - ignore this message if your OS is Solaris 9
SEMMNITooSmall Increase the kernel parameter SEMMNI to 100
SEMMNIUndef SEMMNI has not been defined and needs to be set to 100
SEMMSLTooSmall Increase the kernel parameter SEMMSL to at least 100
SEMMSLNotDef SEMMSL has not been defined and needs to be set to 100
SEMMNSTooSmall Increase the kernel parameter SEMMNS to at least 256
SEMMNSUndef SEMMNS has not been defined and needs to be set to 256
SEMOPMTooSmall Increase the kernel parameter SEMOPM to at least 100
SEMOPMUndef SEMOPM has not been defined and needs to be set to 100
SEMVMXTooSmall Increase the kernel parameter SEMVMX to 32767
SEMVMXUndef SEMVMX has not been defined and needs to be set to 32767
NOEXEC_USER_STACKTooSmall Increase the kernel parameter
NOEXEC_USER_STACK to 1 - ignore this message if your OS
is Solaris 9
NOEXEC_USER_STACKUndef NOEXEC_USER_STACK has not been defined and
needs to be set to 1 - ignore this message if your OS is
Solaris 9
NoAccess You do not have access to /etc/sysdef
Obsoleted With Solaris 10 most shared memory and semaphore
settings are now obsolete. Consult sunsolve.sun.com and
documentation for System Admins on Solaris 10 for details.
It says we're missing settings for shmmax, shmmni, semmni,
semmsl, and semmns, but we have the following defined
in /etc/system:
* Settings for oracle
set noexec_user_stack=1
set semsys:seminfo_semmni=100
set semsys:seminfo_semmns=1024
set semsys:seminfo_semmsl=256
set semsys:seminfo_semvmx=32767
set shmsys:shminfo_shmmax=4294967295
set shmsys:shminfo_shmmin=1
set shmsys:shminfo_shmmni=100
set shmsys:shminfo_shmseg=10
* End settings for oracle
If these aren't the settings they want, what should we be using?
A You have the correct settings
in /etc/system, but I suspect you're attempting to validate
the installation before you actually start Oracle (as you should
be). The problem you're running into is the way kernel modules function.
Under Solaris, kernel modules are not loaded until they're actually
needed by an application. When you run the validation test before
starting Oracle itself, the shmsys and semsys modules
remain unloaded and the test fails. You can either let it fail during
the validation phase (and it will work after Oracle starts and the
modules are loaded), or you can be rid of the validation warnings
by forceloading the modules at boot time. If you'd rather perform
the latter, add the following two lines to /etc/system and
reboot:
forceload: sys/shmsys
forceload: sys/semsys
Amy Rich has more than a decade of Unix systems administration
experience in various types of environments. Her current roles include
that of Senior Systems Administrator for the University Systems Group
at Tufts University, Unix systems administration consultant, and author.
She can be reached at: qna@oceanwave.com. |