Cover V14, i10
oct2005.tar

Oversubscribing an Oracle Server Using SolarisTM FSS

Eric Forgette

One of the biggest challenges in consolidating database instances onto a single, multiprocessor system is the limited control over the priority of processes running in the time share scheduling class. In this class, each runnable thread is individually scheduled based on its priority, time spent on CPU, and time spent waiting for a CPU. Thus, a database instance with many threads can get more work done than an instance with only a few threads.

Systems in this configuration tend to perform poorly and unpredictably during periods of peak loads because components of individual instances become starved for CPU. These performance problems manifest themselves as poor application response, longer batch run times, and connection timeouts. Using processor sets and creative prioritization scripts can mitigate these challenges. However, these techniques create a configuration only marginally better than having a separate system per instance.

The most successful consolidated database servers are created by strategically grouping instances with different load signatures. A load signature consists of the system resources (CPU, memory, and IO) that a database instance consumes over a specific period of time. While sizing and consolidation strategies are beyond the scope of this article, an example of a good grouping strategy would be to combine instances that experience heavy utilization during normal business hours (online) with instances that experience heavy loads throughout the night (batch). Overlap of the load signatures, which cause contention for CPU resources, tend to occur even in well-planned environments.

This article describes how the Fair Share Scheduler (FSS) class, available in Solaris 9 (and later), can be used to maintain Service Level Agreements (SLAs) on a consolidated Oracle database server (a single server with multiple instances), even during periods of oversubscription.

Understanding the FSS class

The Solaris Resource Management (SRM) framework provides three control mechanisms -- constraints, scheduling, and partitioning. This article will concentrate primarily on scheduling as it is implemented in Solaris 9. The FSS class offers a weight-based method of allocating CPU time to a group of processes. These groups are called projects. Each project is assigned a number of shares. Runnable processes in projects that have been assigned more shares (more weight) will be scheduled more often. This behavior is most evident during periods of heavy load. When CPU resources are not being competed for, a project's use of CPU is not restricted. In other words, the use of the FSS class enables full utilization of CPU resources under a normal load, while providing controllable and predictable behavior under a heavy load.

The basic building block of the SRM framework is the project. To allocate resources between individual Oracle database instances, each instance must be configured as an independent entity. To accomplish this, a separate project is created for each database instance. The system configuration file that controls the allocation of resources is /etc/project. The project.cpu-shares property controls how many shares each project is allocated. This article assumes that the Oracle system is owned by the Unix user named "oracle" and that the oracle user's primary Unix group is named "dba". The following example shows two entries from the /etc/project file:

myfirstinstance:1001:Database \
  Instance Number 1:oracle:dba:project.cpu-shares=(privileged,100,deny)
mysecondinstance:1002:Database \
  Instance Number 2:oracle:dba:project.cpu-shares=(privileged,300,deny)
The fields of the /etc/project file are separated by colons (":"). For a full description of fields and syntax, consult the project man page. The example above defines two projects (one project per instance) -- myfirstinstance and mysecondinstance. The Unix user named "oracle" and users in the Unix group named "dba" are allowed to start tasks in both of these projects. The project named myfirstinstance has been allocated 100 shares, and the project named mysecondinstance has been allocated 300 shares. If these were the only projects defined on a system running at full load, the project named mysecondinstance would receive approximately 75% of the CPU time, while the project named myfirstinstance would receive approximately 25%.

Enabling FSS

Peter Baer Galvin introduced FSS in "Solaris Resource Management -- The Fair Share Scheduler" (Sys Admin, June 2003: http://www.samag.com/documents/s=8228/sam0306k/0306k.htm). The following steps review how to enable FSS in Solaris 9 and are explained in more detail in Galvin's article.

The default scheduler for Solaris 9 is time share (TS). To make FSS the default class at boot time, use the following command:

# dispadmin -d FSS
To effect this change on a live system (without reboot), simply move all of the processes in the TS class into the FSS class:

#  priocntl -s -C FSS -i class TS
Configuring Oracle Instances to Fit into the SRM Framework

The Solaris project for every database instance will consist of two tasks -- a listener task and a background process task (the instance itself). To ensure that each instance is properly associated with its project, a separate Oracle database listener must be configured for each instance because the listener process (tnslsnr) forks child processes to handle each network-based database session. These new processes, which consume most of the CPU under normal conditions, inherit the project ID from the listener process that spawned them.

While there are different ways to accomplish creating a listener per instance, the use of virtual network interfaces is the simplest and most portable configuration. To create a virtual network interface, use the ifconfig command to bind an IP address to a physical network adapter. For example, to add the address 10.1.1.51 in a C class subnet to the ge0 interface, use the following commands:

# ifconfig ge0:2 plumb
# ifconfig ge0:2 10.1.1.51 netmask 255.255.255.0 broadcast 10.1.1.255 up
To make this interface available after a reboot, create a /etc/hostname.ge0:2 file. It is best to create the first virtual interface using ":2". This reserves the ":1" for use in a network multipathing configuration.

The Solaris project, the database instance, and the hostname for the virtual interface should all be named the same thing. This simplifies the configuration and initialization scripts. For example, once we associate the hostname "myfirstinstance" to the IP address 10.1.1.51, we can bind the listener for our instance (also named myfirstinstance) to the virtual interface using the lsnrctl utility. The following is a section of a listener.ora file defining a listener for myfirstinstance:

lsnr_myfirstinstance =
  (DESCRIPTION_LIST =
    (DESCRIPTION =
      (ADDRESS_LIST =
        (ADDRESS = (PROTOCOL = TCP)(HOST = myfirstinstance)(PORT = 1521))
      )
    )
  )
SID_LIST_LSNR_myfirstinstance =
  (SID_LIST =
    (SID_DESC =
      (global_dbname = myfirstinstance )
      (ORACLE_HOME = /u01/oracle/product/9.2.0)
      (SID_NAME = myfirstinstance)
    ))
	
Implementing Oracle Instances under FSS

The /etc/project file can be manipulated using the projadd, projmod, and projdel utilities, or by simply editing the file directly. Once a project has been added, a virtual interface created, and an independent listener configured, each database instance can start under the SRM framework using the newtask command. To start the instance, switch the user to oracle, set the required Oracle environmental variables, and execute the following commands:

export ORACLE_SID=myfirstinstance

/usr/bin/newtask -p $ORACLE_SID sqlplus /nolog << EOF
connect / as sysdba
startup
EOF

/usr/bin/newtask -p $ORACLE_SID lsnrctl start lsnr_$ORACLE_SID
The -p flag of the newtask command specifies the project in which to run the sqlplus utility. Setting the project name to match the instance name allows the $ORACLE_SID variable to be used in multiple places. Each of these commands creates a new task under the project with the same name as the instance. A "task" is a set of one or more processes within a project. A "project" contains tasks, which contain processes. This allows a logical grouping of processes within a project. The example above creates two tasks in the project named myfirstinstance (Figure 1). The first task will include all of the instance background processes (Listing 1). The second task will include the listener process and all network-based database sessions (Listing 2) that it has forked.

The user.oracle and group.dba Projects

A project named user.oracle should be defined in the /etc/project table. The user.oracle project will be the default project for any process (or task) initiated by the Unix user named oracle when a project isn't explicitly specified. Examples of these processes include cron jobs, command-line Oracle utilities, the dbsnmp daemon, backup processes, and any other process that is initiated by the oracle user without using the newtask command.

If the user.oracle project is not created, new processes initiated by oracle will be assigned the project named group.dba. If neither the user nor the group-level project is defined, the process will be assigned to the "default" project. This could cause performance issues because by default the "default" project is assigned only one share. The only exception to this is an entry in /etc/user_attr (see the man page for more information). Below is an example of a /etc/project file entry for the oracle user:

user.oracle:2000::::project.cpu-shares=(privileged,300,deny)
Batch processes can be assigned to other projects by modifying the process that initiates them to use the newtask command. For example, the scripts that initiate RMAN backups could use the newtask command to initiate the backup and assign the process to the project of the particular instance being backed up. The following syntax would be used in a backup script to initiate rman and assign the process (or processes) to the project of the instance (not the user.oracle project):

/usr/bin/newtask -p $ORACLE_SID ${ORACLE_HOME}/bin/$RMAN cmdfile "$CMDFILE"
Interactive processes can be assigned to other projects by executing the newtask command within the user's shell. For example, the oracle user can "charge" the myfirstinstance project for the CPU time used during an interactive session. The -p option of the id command returns the current project. The following example shows that the oracle user's shell is initially running in the user.oracle project. The newtask command is then used to start a new shell, which is assigned to the myfirstinstance project. Last, the id command is executed again to demonstrate that the project has, in fact, been changed to the myfirstinstance project:

$ id -p
uid=100(oracle) gid=100(dba) projid=2000(user.oracle)

$ newtask -p myfirstinstance

$ id -p
uid=100(oracle) gid=100(dba) projid= 1001(myfirstinstance)
Initialization Scripts

Oracle supplies standard start and stop scripts for instances and listeners. These scripts must be modified to take advantage of the FSS configuration at boot. Listing 3 is an example of an Oracle database instance start script that initializes instances based on the settings defined in the oratab file. Listing 4 is an example of an Oracle listener start script, which initializes listeners in a similar fashion. These scripts are "stripped down" clones of the scripts provided by Oracle used to illustrate how to start the database instance and listeners within the project framework.

Creating projects to control the execution of other utilities, such as system processes, is also important. Careful consideration should be given to which processes are run in the "system" project since, by default, it has unlimited shares. Services that are required but have the potential to fall victim to denial-of-service attacks should be run in projects with tighter controls. The Samba processes, for example, are run in the "system" project. The danger, of course, is that the most recent "share walking Windows worm or virus" could potentially impact your database server. To prevent this, create a project named "samba" and assign it fewer shares than the database instances:

samba:3003:Samba - NT Access:::project.cpu-shares=(privileged,10,deny)
Then modify the init script to start one or both daemons using newtask as shown in Listing 5. Restricting services such as Samba will provide more predictable system response by greatly reducing the potential impact of extra services.

Utilities

After implementation, the prstat command can be used to view system activity. Similar in function to the top command, prstat reports utilization on a per-project basis, every few seconds. The -J flag specifies that project and process information should be reported (Listing 6). The -T -j combination can be used to view the processes and task associated with a project (Listing 7). The prstat command has a rich set of options; consult the man page for more information.

The ps command in Solaris 9 has been update to include new options to support FSS. Again, the best source for command option information is the manual page. It is useful to create an alias, which displays project information in addition to the more common options:

'ps -eo project,user,pid,ppid,pcpu,args'
The alias above will show the project association as well as the process information. The pgrep and pkill commands also have options that support the project syntax.

The prctl command can display or dynamically modify resource controls for projects. The control we are primarily concerned with is project.cpu-shares. To dynamically change the CPU shares for the myfirstinstance project to 2000 shares, enter the following command:

/usr/bin/prctl -r -n project.cpu-shares -v 2000 -i project myfirstinstance
This change takes effect immediately but will persist after reboot only if /etc/project is also updated. This command can be placed in a cron table to give a particular instance more CPU shares at a specific time. A second cron job can be added to bring the instance back to its normal value. This technique can be used to allow an instance serving an online application (given the higher weight as described above) to remain responsive while allowing an instance serving a long-running batch process to consume unused CPU cycles.

Tuning the allocation of CPU shares is often an exercise in trial and error. The prctl utility can be used to quickly change the weight of any project, task, or process. It is a "best practice" to modify /etc/project before running the prctl command to alter the scheduler. This practice maintains the weight of each project even if the system is rebooted. The Perl script, srm_apply_project (Listing 8), is a utility that reads the /etc/project file and applies the current values to the system. The script reads /etc/project and applies the "project.cpu-shares" settings to all projects or a specified project (depending on command-line options). While the script also supports the "project.pool" setting, pools and processor sets are beyond the scope of this article. The script will ask the user to approve each change, or it can be run with the "--yes2all" flag, which automatically answers yes to all questions. The following example applies the settings for all projects without prompting the user:

$ ./srm_apply_project --yes2all
The next example modifies only the project named "myfirstinstance", and only after the user approves the change:

$ ./srm_apply_project --project myfirstinstance

Change myfirstinstance cpu-shares to 100? (yes or no) : yes
Executing: prctl -r -n project.cpu-shares -v 100 -i project myfirstinstance
Occasionally an instance may be started or restarted without using the newtask command. This generally happens as database and systems administrators are becoming familiar with this new framework. As previously described, the instance and listener (in this case) would be running under the user.oracle project. Luckily, neither the instance nor the listener will need to be restarted in order to associate the instance with the correct project. Any process can be dynamically reassigned to a project using the newtask command. The following example moves process id 2345 into the myfirstinstance project:

newtask -v -p myfirstinstance -c 2345
Manually moving each process associated with a busy Oracle database instance into a project can be a tedious task. Luckily, the Oracle 8 and 9 process naming convention allows this process to be automated quite easily. The srm_instance2project Perl script (Listing 9) dynamically moves processes of an Oracle instance into a project of the same name. The script assumes that there is an entry in /etc/project for every running database instance. The script requires one parameter -- an instance name or the string "ALL". The script describes what it will do and prompts the user for confirmation. Before running this script for a particular instance, the tnslsnr process should be associated with the correct project using the newtask command, as in the example above.

When FSS is enabled, user processes not started under a specific project will belong to the project named default. Because this project has only one share (by default), application processes should be moved from this project or they could perform poorly. These processes can be identified using the pgrep or ps utilities. The following command will list the process ids of all processes in the default project:

pgrep -J default
For more detail, use the ps command with grep. For example:

ps -eo project,user,pid,args | grep default
With slight modification, either command can also help identify processes in the system project.

Automation can greatly reduce the time it takes to move processes into projects. The srm_user2project Perl script (Listing 10) is a simple example of how to move processes into a project based on user ID. This script requires a user name as the first argument and an optional project name as a second argument. The process flow is a simple loop, reading through the output of the ps command and executing the newtask command as necessary. This example can be extended to handle many different sets of criteria.

Other Considerations

In the timeshare class, the load average (reported by uptime) is a good indicator of system performance. The load average is based on the number of running and runnable processes over a specific time (1, 5, and 15 minutes). This number, in relation to the number of processors, is a good indication of the demand for CPU. This is true when using the FSS class as well, while the system load is less than the number of processors. However, once the demand is greater than the available resources, a high load average doesn't necessarily indicate that all instances are performing poorly. This is because the scheduler takes into account the weight assigned to the project of the runnable thread when determining its scheduling priority. Therefore, an extremely high load average is likely to occur if a project with only a few shares has many runnable threads at the same time one or more projects with greater share values have many runnable threads. This is because the higher priority threads are being serviced while the lower ones remain in a runnable state (increasing the value of load average). For this reason, the load average may be higher using the FSS class than it was using the TS class on an oversubscribed system.

Based on the greater control over the priority of groups of processes, it may be concluded that development (and/or test) instances should now reside on the same physical machine as production instances. While this is entirely possible, in most cases the risk of negative impact is still too high. The challenge is that the FSS is only controlling the priority of the threads. The safest way to implement this in a Solaris 9 environment is to implement Resource Pools and Processor Sets. Although partitioning isn't covered in this article, the srm_apply_project script (Listing 6) supports dynamically binding a process to a pool.

Conclusion

Many small- to medium-sized IT shops have embraced Oracle's RDBMS because of its power, speed, and stability. For financial reasons, these shops tend to license Oracle on a per-CPU basis. The Solaris 9 Resource Management framework can help maximize a company's investment in Oracle by allowing the consolidation of multiple database instances onto a single physical machine, while maintaining Service Level Agreements through the use of the Fair Share Scheduler. This article focuses on Oracle's RDBMS, however, the methodology described here also can be applied and extended to other consolidation projects.

References

Lawson, Stuart J. Resource Management in the Solaris 9 Operating Environment. Santa Clara: Sun Microsystems, Inc., 2002

Galvin, Peter Baer. "Solaris TM Resource Management -- The Fair Share Scheduler." Sys Admin Magazine June 2003: 12-6 (http://www.samag.com/documents/s=8228/sam0306k/0306k.htm)

System Administration Guide: Resource Management and Network Services. Santa Clara: Sun Microsystems, Inc., 2003

Eric Paul Forgette is a Unix Systems Engineer for a transportation company based in Richmond, Virginia. Eric graduated from Buffalo State College with a B.S. in Computer Information Systems. He can be reached at: e4jet@mac.com.