Cover V12, I04

Article

apr2003.tar

SolarisTM Resource Management

Peter Baer Galvin

With SolarisTM 9, Sun is bundling the previously unbundled Solaris Resource Manager. How does it work, and how well does it work? This month, the Solaris Companion takes it for a spin.

Before Solaris 9, some rudimentary resource management was included in Solaris. For example, the psrset command controls processor sets. This command has been available since Solaris 2.6. With this command, you can create and delete processor sets (which are identified groups of processors), assign CPUs to those sets, assign threads to those sets, and display information about the system's processor sets. In this manner, certain tasks can be run on certain CPUs, and those tasks can be limited to not using other CPUs. This base functionality can be useful in a small number of situations, but for fine-grained management of threads on a system, memory, I/O, disk use, and more flexible CPU control are all required. Enter Solaris Resource Manager.

The Solaris Resource Manager (SRM) was an unbundled product before Solaris 9, and is now included at no cost in the Solaris 9 release from Sun. This review covers that free version (as implemented in the 12/02 release). It is included as part of the full operating system installation, so no extra effort is needed to make it available on an S9 system.

Concepts

SRM consists of two rather disparate functions -- resource limitations and fair share scheduling. Think of the first as an extension to the standard "limits" that are settable within Solaris. The second is a new scheduler that manages CPU scheduling based on allocated shares, rather than the usual use-the-most-CPU-cycles kind of scheduling. The new scheduler will be described in next month's Solaris Companion.

To clarify, SRM is in no way a replacement for domaining or other "pure" resource use limiters. That is, a crash of the operating system will take down all processes on that system (or within that domain), including all SRM jobs. So SRM can help optimize use of a system and it can allow programs that might usually be mutually exclusive to live in harmony on a system.

So how would you choose between multiple domains and dynamic reconfiguration (DR), and Solaris Resource Manager? Domaining provides absolute operating system separation, so a task within one domain cannot affect other domains. DR allows resources to move between domains, but testing and planning must occur, and issues like memory allocation must be resolved (as an application suddenly has more memory available to it). SRM is more flexible but does not provide that wall between applications. It should be used when fine-grained resource control is required, when resource use changes might be frequent, and on systems without domaining available. Of course, it could be used in conjunction with domains for the most complete set of solutions.

Resource Limitations Theory

There are several concepts to understand before making use of limit management within SRM, which include processes, tasks, and projects. Processes, tasks, and projects are units of resource allocation. A task consists of one or more processes, and a project is one or more tasks. For example, a process, task, or project may be limited in how much CPU time it can use. If a project is limited, then all tasks in that process inherit that limit. Likewise, a task limit is applied to the resource use of all processes in that task.

With SRM, processes are assigned to tasks or projects at login or through newtask, at, batch, or cron commands. Once these logical collections are made, you can use commands such as prctl and newtask to manage resource use by those groups, and commands like ps, id, prstat, and the accounting subsystem to view system activities based on those groups.

New resources that can be managed in this way include use of CPU cycles, number of threads, amount of CPU time, and maximum address space (virtual memory). This list expands the previous limitable resources of number of open files, maximum file size, core dump size, and data and stack virtual memory size. One key resource not yet included is physical memory. Network use is manageable by the separate IPQoS facility (not discussed here, but possibly a topic for a future column).

These resources can be set to have threshold values, and when a threshold is reached a local or global action can be triggered. For example, the process could be killed, or the event could simply be logged. These thresholds have three privilege levels, as UNIX administrators might expect. "Basic" means that the owner of the calling process can modify it; "privileged" means that only the superuser can modify it; and "system" is fixed at boot time by the kernel. System thresholds are set to the maximum of the resource that the kernel is capable of providing.

Resource Limitations Fact

The Sun documentation about SRM is very good, with quite a few examples. It is weird that Sun mixes network management and resource management into one document, though. The manual is available at docs.sun.com: "System Administration Guide: Resource Management and Network Services".

The definition and management of projects is done via configuration files and command-line functions. (It can also be done via the Solaris Management Console.) /etc/project is much like /etc/passwd in its format and function. It provides project information that coincides with processes on the system. As a simple example, the file can be edited with vi, or the projadd, projmod and projdel commands can be used:

system:0::::
user.root:1::::
noproject:2::::
default:3::::
group.staff:10::::
testproject:11:For testing:pbg::
dontuse:12:Unused:::
projects lists for a user what projects are available:

$ projects
default testproject
All but the last two lines of the configuration file were there from the system installation. Thus, by default, root processes run in project "system", and most others in "default". This can be seen in the abridged ps output:

$ ps -eo user,project,comm
    USER  PROJECT COMMAND
    root   system sched
    root   system /etc/init
    root   system pageout
    root  default /usr/dt/bin/dtlogin
    root   system /usr/openwin/bin/fbconsole
     pbg  default dtaction
     pbg  default /usr/openwin/bin/speckeysd
     pbg  default /bin/ksh
For this example, "testproject" is used. If a user is listed as a valid member of a project, he or she may execute tasks within that project. Only the superuser can execute tasks within a project without being a project member.

A project can be further refined via this configuration file or commands. The configuration file approach has the benefit of being resilient to reboots. The file is read at boot time, or when SRM commands are executed. However, changes made to the configuration file do not affect processes already running. This example shows commands to manage the project space.

First, let's create a task within the "testproject" project with the newtask command:

$ id -p
uid=101(pbg) gid=14(sysadmin) projid=3(default)
$ newtask -p testproject csh
% id -p
uid=101(pbg) gid=14(sysadmin) projid=11(testproject)
Also, any new child processes of a project member are also members of that project. Note the membership enforcement:

$ newtask -p dontuse
newtask: user "pbg" is not a member of project "dontuse"
The most important resource management command is prctl. It cannot create a project, but once processes are running within a project, it can manage their resources.

For example, let's limit the number of threads within a task (assuming a process is running in project "testproject"). The first command sets the "basic" limit at five threads, and the second line sets the privileged limit at eight (that command must be run as root, although the first one needn't). The third command confirms those operations:

# prctl -n task.max-lwps -v 5 -e deny -i project testproject
# prctl -n task.max-lwps -t privileged -v 8 -e deny -i project testproject
# prctl -n task.max-lwps -i project testproject
2642:   sh
task.max-lwps
                            5 basic      deny
                            8 privileged deny
                   2147483647 system     deny           [ max ]
#
Next we spawn some threads in that project, within a task, to see the results:

$ newtask -p testproject
$ csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
sunny% csh
Vfork failed
Notice that the basic rule was not enforced, but that the privileged one was. It is unclear what the basic resource limit priority is for, but privileged obviously works. Of course another task could have been spawned, and it to would be allowed eight threads in this example.

What if monitoring was desired, but not enforced limits? Enter the rctladm command. But first, the action of our limit needs to change from deny to allow (i.e., "none"):

# prctl -n task.max-lwps -t privileged -v 8 -d all -i project testproject
# prctl -n task.max-lwps  -i project testproject
2847:   sh
task.max-lwps
                            8 privileged none
                   2147483647 system     deny           [ max ]
# rctladm -e syslog task.max-lwps
# rctladm
process.max-address-space   syslog=off   [ lowerable deny no-local-action ]
process.max-file-descriptor syslog=off   [ lowerable deny ]
process.max-core-size       syslog=off   [ lowerable deny no-local-action ]
process.max-stack-size      syslog=off   [ lowerable deny no-local-action ]
process.max-data-size       syslog=off   [ lowerable deny no-local-action ]
process.max-file-size       syslog=off   [ lowerable deny file-size ]
process.max-cpu-time        syslog=off   [ lowerable no-deny cpu-time inf ]
task.max-cpu-time           syslog=off   [ no-deny cpu-time no-obs inf ]
task.max-lwps               syslog=notice
project.cpu-shares          syslog=off   [ no-basic no-local-action ]
The rctladm command tells the system to use syslog whenever the max-lwps resource limit is reached. Note that for longer-term settings, /etc/rctladm.conf is used. Now when the thread limit is exceeded, the offending command is allowed but a syslog entry is made:

# tail -1/var/adm/messages
Feb  9 20:55:07 sunny genunix: [ID 883052 kern.notice] privileged rctl task.max-
lwps (value 8) exceeded by task 25
So on the whole the facility works nicely, although its rather limited in, well, what can be limited.

Some other useful project-enabled commands include:

  • prstat -J or -T -- Dynamically updated process list, including project or task summaries
  • pgrep -J or -T -- Display the process IDs of processes in the specified project or task
  • pkill -J or -T -- Kill only processes in the specified project or task
Summary

Solaris Resource Manager is a welcome addition to the core operating system. It allows control over processes and resources that was previously available only via commercial tools. This kind of functionality continues the trend of Solaris moving from a technical computing operating system to one that can accommodate both technical and business uses, even within the same operating system instance.

This column described the concepts and showed some basic uses, but there is quite a lot to this new Solaris facility. There are plenty of details that must be considered as resource management is configured, initialized, and used. Many were discussed here, but some that were not touched on include resource control prioritization, and using global naming services such as LDAP and NIS+ for resource management information. Also, extended accounting can be used to monitor resource use on a project or task basis.

Overall, SRM is worth learning to allow systems managers and administrators to gain more control over who is doing what on the computers they manage. Next month, the Solaris Companion will look at the second half of SRM -- the Fair Share Scheduler.

Peter Baer Galvin (http://www.petergalvin.org) is the Chief Technologist for Corporate Technologies (www.cptech.com), a premier systems integrator and VAR. Before that, Peter was the systems manager for Brown University's Computer Science Department. He has written articles for Byte and other magazines, and previously wrote Pete's Wicked World, the security column, and Pete's Super Systems, the systems management column for Unix Insider (http://www.unixinsider.com). Peter is coauthor of the Operating Systems Concepts and Applied Operating Systems Concepts textbooks. As a consultant and trainer, Peter has taught tutorials and given talks on security and systems administration worldwide.