Article

nov_sup2000.tar

Starfire Administration

Jeff Ruggeri

The Enterprise line of Sun servers has proven, over time, to be an indomitable force in large-scale UNIX installations. Despite this fact, administration of the largest enterprise server still remains something of a mystery to many Solaris sys admins. The E10000, or Starfire, has brought many interesting new dimensions to the Solaris environment that dramatically enhance the existing flexibility and power of a Sun. The goal of this article is to explain precisely what makes this machine so different, and to demystify some of the concepts and eccentricities surrounding administration of what is arguably one of the most versatile machines available today.

The System

The first difference between the Starfire and its enterprise line of cousins is its capacity. Physically, the Starfire is much larger than the rest of the line, and its curvy design is made to stand out in a data center full of boxy machines. Peeking under the hood proves that it doesn't just look big and fast. With support for up to 64 CPUs overall, this machine can give nearly any vendor's largest workhorse a run for its money. However, the central concept that differentiates the Starfire from any other system is that it is capable of being partitioned into several logical machines, or domains, each of which can operate as a stand-alone Solaris box. Beyond that, system boards can be dynamically added to or removed from a running domain, allowing for previously unthinkable levels of flexibility in production environments. In addition to dynamic reconfiguration, other features such as a floating network console (netcon) and a system service processor (SSP) further enrich the machine. The SSP is the hub of operations for the Starfire, actually a separate machine, from which every aspect of the environment can be controlled. Netcon is the software equivalent of a terminal concentrator, allowing access to each domain's console from anywhere.

Examining the physical layout of the system is the first key to understanding the environment's flexibility. The machine is split into 16 system boards, with 8 on each side. Each system board is capable of holding 4 UltraSPARC II CPUs, 4 SBUS I/O cards, and 8 GB of memory. In addition to this, there are two Centerplane Support Boards (CSBs) that allow for netcon functionality without a network connection present on a domain. All of these boards are connected by an intelligent, high-bandwidth backplane that is capable of making point-to-point connections from any system board to any other, allowing for seamless SMP between random boards. Yet another notable feature is a required private, hub-based network that connects all of the domains, the CSBs, and the SSP. The usefulness of this will come to light shortly.

The theory behind the design is elegantly simple. Domains are essentially logical entities, existing primarily in the software of the SSP. The SSP itself groups boards into domains, allows access to consoles with netcon, can power on or off any component of the system, and also controls a virtual system “key” with the bringup command. Domains can be accessed even when their network connections are not present through the CSB connection, called the JTAG, which communicates over the backplane. Even the OpenBoot PROM for each domain is contained in software on the SSP! Of course, this is where the complexity comes in. The commands that are used for controlling the Starfire environment are essentially unique, as they are not currently found anywhere else in the enterprise line. Administration requires familiarity with the SSP commands, and more specifically the ssp user.

The SSP User

The SUNWssp package, which is generally preloaded on the Ultra5 (which comes with the E10k), installs a user called ssp, by default. This user controls the environment variables and scripts that are used to create, destroy, and modify domains. Upon logging into this user, you will be greeted with the prompt:

Please enter SUNW_HOSTNAME:

This is referring to the current working domain. The value of this variable is the domain to which any ssp commands issued will be applied, so it is important to ensure it is set correctly before you perform an action. (Fortunately, its value is displayed by default in the ssp user's command prompt.) To switch its value at any time, use the command:

domain_switch <domainname>

Keep in mind that all of the following commands source this variable as their argument.

Regarding domain naming conventions, note that using a domain's hostname as the domain name is generally not a good idea. The reasoning behind this is simple: each domain has both a private (SSP) and public Ethernet address. Giving the domain a name that differs from its actual hostname provides an easy way to ensure you're talking to the right IP address, which is invaluable when booting from the SSP or reconfiguring the domain. A good convention would be to name the domain something like hostname-dN, where N is a number, incremented for each domain.

To view information about configured domains, use the command:

domain_status

On a configured system, the output might look something like:

DOMAIN      TYPE                     PLATFORM   OS    SYSBDS
frobozz-d1  Ultra-Enterprise-10000   frood      2.7   0 1 2 3 4

and so on, for each domain. Notice the platform name. This is essentially the name of your E10000, which is established at initial SSP setup, presumably to differentiate it from the scores of other E10000s littering your machine room.

Of course, for any of this to work, one must first configure domains. To do this, a EEPROM image will be required for each domain. When you first uncrate the machine, Sun will provide you with an image for each domain requested. If further images ever need to be added, you can obtain a hostid and key for use with the sys-id utility, which will generate the EEPROMs. Once the images are successfully installed, the next step is to power up the individual system boards being used in the domain. This is accomplished with the command power. Its simplest uses are:

power -on -sb 0 1
power -on -cb 0

which power up system boards 0 and 1, and CSB 0, respectively. To reverse this, use the -off flag. Used with no arguments, the power command will display the power statistics of each board in the Starfire. The -all flag will apply a command to every board on the system, hence its inherent danger. Fortunately, the SSP software is smart enough to recognize when domains are running and deny power requests.

Once power is established, the domain_create command can be used to initialize a domain. Its syntax is:

domain_create -d <domain name> -b <system   \
   boards> -o <os version> -p <platform>

To remove a domain, the command is simply:

domain_remove -d <domain name>

Fortunately, this command is essentially reversible, and recreating a domain with domain_create will restore it as if it had not been destroyed, provided the same system boards are used in the re-creation.

Bringup and Netcon

Once the domain is created, how do you “turn the key” on the host, or access the console? The answer lies in the bringup and netcon commands. The bringup command will cycle a domain through a power on self-test (POST), bring it up to an OpenBoot PROM prompt, and even boot the OS if it is installed. Before issuing this command, it is wise to determine the status of the domain with the check_host command. This will return a simple “Host is UP” or “Host is DOWN” response, which lets you know whether it is safe to bring the machine up.

The bringup command has a very specific set of functions that it executes, in the following order:

• bringup runs power to check that all of the system boards in the domain are powered up. If not, it will abort with a message to this effect.

• Next, it runs check_host to determine the status of the domain. If the domain is determined to be up, bringup will prompt you whether to continue. This is useful if you are using the command to recover from a hung host; however, it is recommended that you use bringup for this purpose only in extreme situations.

• The blacklist file, located in $SSPVAR/etc/<platform name>/blacklist is checked. This file allows components, from I/O units and CPUs to entire system boards, to be individually excluded from the domain at start time. This is a fairly useful feature, which can be manually edited.

• bringup runs hpost on the domain. hpost is a very valuable tool, which can be run (on a domain that is not up) interactively at any time. It runs the domain through a series of tests, which can often shake out hardware errors. By default, it will run at level 16. It can be configured to run up to level 127 (which executes extremely detailed testing), with the file ~ssp/.postrc by adding the line level N.

• Finally, bringup starts the obp_helper and the netcon_server, which indicates that the domain is ready.

The only important arguments to bringup are -A on or -A off. This is the equivalent of the AutoBoot? OpenBoot PROM parameter, as “on” will boot the system (if extant) and “off” will dump you to the OpenBoot PROM itself.

Some mention should be made of ways to interrupt a running domain. Assuming your domain is hung, and you can't seem to get it to come back for whatever reason, the Starfire offers several commands above and beyond the traditional stop-a type interrupt. They are, in order of severity:

hostint -- This forces a panic on a domain.

hostreset -- The domain goes into a reset state, but you can then run a bringup on it.

sys_reset -- This performs a hardware reset of all of the system boards in a domain, and should be used only as a last resort.

The bringup command is, in fact, most severe of all in a hang situation.

Once bringup exits (assuming success), the netcon command can be used to access the domain. If netcon is used on a domain which has not been brought up, the command will sit and idle, waiting for a connection. As previously mentioned, the current value of $SUNW_HOSTNAME is the domain which is accessed, meaning that multiple windows can most definitely access multiple domains, simply by using domain_switch. Once the command is issued, a sysadmin can interact with the system no matter what state or runlevel it is in. One caveat, however, is that when the system is not in multi-user mode, the connection can be extremely slow, as it is going over the JTAG. It does, however, grant the access needed. Once the system reaches runlevel 2, the cvcd is started, which allows communication between the domain and the ssp on the private ethernet network.

Of course, since multiple users could theoretically have access to the ssp user at once, it follows that multiple users could try and netcon into the same domain at once. This could lead to some problems, but fortunately netcon implements a locking mechanism that only allows one user to have write access at a time. A user of netcon can be in unlocked write, locked write, or read only mode. Control commands for netcon begin with the tilde (~) as an escape character, and are as follows:

~# -- Analogous to stop-A on a normal system. This will halt your system and bring it to the OpenBoot PROM. Use caution with this command.

~? -- Shows the current status of all the open netcon sessions.

~= -- Switch between the SSP private interface for the domain and the control board JTAG interface. This feature only works in private mode, when the cvcd is running on the host.

~* -- Private mode. This sets Locked Write permission, closes any open netcon sessions, and disallows access to netcon from any other terminal. This is the same as the -f (force) flag to the netcon command itself.

~& -- Locked Write mode. This is the same as opening a session with the -l flag.

~@ -- Unlocked Write mode. Another user easily revokes this. This is the same as opening a session with the -g flag.

~^ -- Read-Only mode. Releases write permission and echoes any other session with write permission to your terminal.

~. -- Release netcon. This will exit the netcon session and return you to the command prompt.

A sample output of netcon might look like:

frobozz-ssp01:frood-d1% netcon
trying to connect...
connected.

SUNW,Ultra-Enterprise-10000, using Network Console
OpenBoot 3.2.4, 12288 MB memory installed, Serial #00000000.
Ethernet address 0:0:00:00:00:00, Host ID: 00000000.

<#0> ok

At this point, you should be in familiar territory. You can essentially treat the domain just as you would any other enterprise system. The only major difference once the domain is up comes with the dynamically reconfigurable properties of the Starfire.

Dynamic Reconfiguration

The feature on the Starfire, which is the most important departure from the rest of the enterprise line, is the ability to change the capacity of a running system without interrupting any services. The practical applications for this feature are almost endless, and it is limited only by I/O configuration. System boards can be allocated from one domain to another, or even removed from a domain, powered off, and removed from the system for repair! There are two methods that can be used to accomplish the task of reconfiguration. The first method is to use the dr command, and the other (less reliable) method is to use the hostview GUI interface.

A brief note about hostview: this tool can be used to perform several actions, including modifying the aforementioned blacklist file or opening netcon consoles. However, it has been my experience that dr should always be used for reconfiguration, as hostview seems unreliable when it comes to modifying a domain. Board attachments or detachments often do not work, for no visible reason. While the intent is not to malign this tool, as it is useful in its own right, it is not the best tool for this particular feature of the E10k.

Issuing the command dr will start a shell-like environment and report on what boards are physically present. It will also report which boards are currently in use by $SUNW_HOSTNAME, as this is the domain that will be modified. Before entering dr, you may want to first use domain_status to see what boards are being used overall on the platform. The major actions that can be performed from within dr are the attachment or detachment of a system board. The commands used to achieve these functions are as follows:

To attach an unused system board to the current domain:

init_attach <sysbd> -- Prepare the named board for attachment.

complete_attach <sysbd> -- Attaches the board to the domain, after running init_attach.

abort_attach <sysbd> -- Aborts the attach process after a failed attach, or before complete_attach is run.

Detaching a system board:

drain <sysbd> -- Evacuates the memory on the named board.

complete_detach <sysbd> -- Detaches the board from the domain, after running drain.

abort_detach <sysbd> -- Aborts the detach process after a failed drain, or before complete_detach is run.

Other commands:

reconfig -- Run after a board attachment, this will run the Solaris config sequence on the domain: drvconfig; devlinks; disks; ports; tapes.

drshow <sysbd> <command> -- Shows the status of a running dr command. The most important arguments are drain and io.

Now for the warnings. Although attachment is relatively straightforward, and can be done without incident using any free system board, use caution when detaching a board from a running domain. The first notable issue is that running drain on a board is not an instantaneous process, even though the command returns immediately. Before running a complete_detach, the board should be examined with the command drshow <sysbd> drain, which shows the status of the drain process. drain actually attempts to move physical memory pages off to memory on other system boards, and attempting to detach the board before this is complete can be catastrophic. Of course, if enough free memory isn't available elsewhere on the system, the drain may not work!

The second, more important, caveat is that a board should never be detached if it contains any I/O. While it is obvious that attempting to detach a board that contains the SCSI channel to your boot disk would be a bad thing, what is less obvious is that any I/O cards on a board may be held open by the kernel. This includes boards that you may not be using at that particular moment. Detaching a board that the system is not ready to release can lead to a panic! To be safe in these situations, use the command drshow <sysbd> io. This will tell you whether the kernel on the domain is using any I/O. In designing your domains for proper dr usage, the best idea is to institute a set of floater boards, which contain no I/O whatsoever -- only CPU and memory. These boards can easily be attached or detached from any system with few problems and make life much easier on an E10k, which is constantly reconfigured. Also, it is a good idea to concentrate as much I/O on the first board or two of a domain (in a multi-board domain) as possible, yet still ensure redundancy. Keeping as little I/O on the last boards of a domain is incredibly useful if you plan on swapping system boards often.

Conclusion

The Starfire presents several layers of complexity, which significantly expands upon the existing Sun architecture. The features presented in this article lend themselves to a coherent and, above all, reliable machine that takes the concept of uptime very seriously. The benefits of such a platform are plain to be seen. Although there are many more facets to the administration of a Starfire, I have attempted to provide you with a base arsenal of concepts and commands with which to approach this powerful environment.

About the Author

Jeff Ruggeri is a Solaris Systems Administrator at Aetna in Middletown, CT., where he is responsible for an environment comprised of nearly 300 mission-critical Sun Enterprise servers. He has been hacking UNIX in one form or another since he was approximately 12 years old.