Oversubscribing
an Oracle Server Using SolarisTM FSS
Eric Forgette
One of the biggest challenges in consolidating database instances
onto a single, multiprocessor system is the limited control over
the priority of processes running in the time share scheduling class.
In this class, each runnable thread is individually scheduled based
on its priority, time spent on CPU, and time spent waiting for a
CPU. Thus, a database instance with many threads can get more work
done than an instance with only a few threads.
Systems in this configuration tend to perform poorly and unpredictably
during periods of peak loads because components of individual instances
become starved for CPU. These performance problems manifest themselves
as poor application response, longer batch run times, and connection
timeouts. Using processor sets and creative prioritization scripts
can mitigate these challenges. However, these techniques create
a configuration only marginally better than having a separate system
per instance.
The most successful consolidated database servers are created
by strategically grouping instances with different load signatures.
A load signature consists of the system resources (CPU, memory,
and IO) that a database instance consumes over a specific period
of time. While sizing and consolidation strategies are beyond the
scope of this article, an example of a good grouping strategy would
be to combine instances that experience heavy utilization during
normal business hours (online) with instances that experience heavy
loads throughout the night (batch). Overlap of the load signatures,
which cause contention for CPU resources, tend to occur even in
well-planned environments.
This article describes how the Fair Share Scheduler (FSS) class,
available in Solaris 9 (and later), can be used to maintain Service
Level Agreements (SLAs) on a consolidated Oracle database server
(a single server with multiple instances), even during periods of
oversubscription.
Understanding the FSS class
The Solaris Resource Management (SRM) framework provides three
control mechanisms -- constraints, scheduling, and partitioning.
This article will concentrate primarily on scheduling as it is implemented
in Solaris 9. The FSS class offers a weight-based method of allocating
CPU time to a group of processes. These groups are called projects.
Each project is assigned a number of shares. Runnable processes
in projects that have been assigned more shares (more weight) will
be scheduled more often. This behavior is most evident during periods
of heavy load. When CPU resources are not being competed for, a
project's use of CPU is not restricted. In other words, the use
of the FSS class enables full utilization of CPU resources under
a normal load, while providing controllable and predictable behavior
under a heavy load.
The basic building block of the SRM framework is the project.
To allocate resources between individual Oracle database instances,
each instance must be configured as an independent entity. To accomplish
this, a separate project is created for each database instance.
The system configuration file that controls the allocation of resources
is /etc/project. The project.cpu-shares property controls how many
shares each project is allocated. This article assumes that the
Oracle system is owned by the Unix user named "oracle" and that
the oracle user's primary Unix group is named "dba". The following
example shows two entries from the /etc/project file:
myfirstinstance:1001:Database \
Instance Number 1:oracle:dba:project.cpu-shares=(privileged,100,deny)
mysecondinstance:1002:Database \
Instance Number 2:oracle:dba:project.cpu-shares=(privileged,300,deny)
The fields of the /etc/project file are separated by colons (":").
For a full description of fields and syntax, consult the project man
page. The example above defines two projects (one project per instance)
-- myfirstinstance and mysecondinstance. The Unix user named "oracle"
and users in the Unix group named "dba" are allowed to start tasks
in both of these projects. The project named myfirstinstance has been
allocated 100 shares, and the project named mysecondinstance has been
allocated 300 shares. If these were the only projects defined on a
system running at full load, the project named mysecondinstance would
receive approximately 75% of the CPU time, while the project named
myfirstinstance would receive approximately 25%.
Enabling FSS
Peter Baer Galvin introduced FSS in "Solaris Resource Management
-- The Fair Share Scheduler" (Sys Admin, June 2003: http://www.samag.com/documents/s=8228/sam0306k/0306k.htm).
The following steps review how to enable FSS in Solaris 9 and are
explained in more detail in Galvin's article.
The default scheduler for Solaris 9 is time share (TS). To make
FSS the default class at boot time, use the following command:
# dispadmin -d FSS
To effect this change on a live system (without reboot), simply move
all of the processes in the TS class into the FSS class:
# priocntl -s -C FSS -i class TS
Configuring Oracle Instances to Fit into the SRM Framework
The Solaris project for every database instance will consist of
two tasks -- a listener task and a background process task (the
instance itself). To ensure that each instance is properly associated
with its project, a separate Oracle database listener must be configured
for each instance because the listener process (tnslsnr) forks child
processes to handle each network-based database session. These new
processes, which consume most of the CPU under normal conditions,
inherit the project ID from the listener process that spawned them.
While there are different ways to accomplish creating a listener
per instance, the use of virtual network interfaces is the simplest
and most portable configuration. To create a virtual network interface,
use the ifconfig command to bind an IP address to a physical
network adapter. For example, to add the address 10.1.1.51 in a
C class subnet to the ge0 interface, use the following commands:
# ifconfig ge0:2 plumb
# ifconfig ge0:2 10.1.1.51 netmask 255.255.255.0 broadcast 10.1.1.255 up
To make this interface available after a reboot, create a /etc/hostname.ge0:2
file. It is best to create the first virtual interface using ":2".
This reserves the ":1" for use in a network multipathing configuration.
The Solaris project, the database instance, and the hostname for
the virtual interface should all be named the same thing. This simplifies
the configuration and initialization scripts. For example, once
we associate the hostname "myfirstinstance" to the IP address 10.1.1.51,
we can bind the listener for our instance (also named myfirstinstance)
to the virtual interface using the lsnrctl utility. The following
is a section of a listener.ora file defining a listener for myfirstinstance:
lsnr_myfirstinstance =
(DESCRIPTION_LIST =
(DESCRIPTION =
(ADDRESS_LIST =
(ADDRESS = (PROTOCOL = TCP)(HOST = myfirstinstance)(PORT = 1521))
)
)
)
SID_LIST_LSNR_myfirstinstance =
(SID_LIST =
(SID_DESC =
(global_dbname = myfirstinstance )
(ORACLE_HOME = /u01/oracle/product/9.2.0)
(SID_NAME = myfirstinstance)
))
Implementing Oracle Instances under FSS
The /etc/project file can be manipulated using the projadd,
projmod, and projdel utilities, or by simply editing
the file directly. Once a project has been added, a virtual interface
created, and an independent listener configured, each database instance
can start under the SRM framework using the newtask command.
To start the instance, switch the user to oracle, set the required
Oracle environmental variables, and execute the following commands:
export ORACLE_SID=myfirstinstance
/usr/bin/newtask -p $ORACLE_SID sqlplus /nolog << EOF
connect / as sysdba
startup
EOF
/usr/bin/newtask -p $ORACLE_SID lsnrctl start lsnr_$ORACLE_SID
The -p flag of the newtask command specifies the project
in which to run the sqlplus utility. Setting the project name
to match the instance name allows the $ORACLE_SID variable to be used
in multiple places. Each of these commands creates a new task under
the project with the same name as the instance. A "task" is a set
of one or more processes within a project. A "project" contains tasks,
which contain processes. This allows a logical grouping of processes
within a project. The example above creates two tasks in the project
named myfirstinstance (Figure 1). The first task will include all
of the instance background processes (Listing 1). The second task
will include the listener process and all network-based database sessions
(Listing 2) that it has forked.
The user.oracle and group.dba Projects
A project named user.oracle should be defined in the /etc/project
table. The user.oracle project will be the default project for any
process (or task) initiated by the Unix user named oracle when a
project isn't explicitly specified. Examples of these processes
include cron jobs, command-line Oracle utilities, the dbsnmp daemon,
backup processes, and any other process that is initiated by the
oracle user without using the newtask command.
If the user.oracle project is not created, new processes initiated
by oracle will be assigned the project named group.dba. If neither
the user nor the group-level project is defined, the process will
be assigned to the "default" project. This could cause performance
issues because by default the "default" project is assigned only
one share. The only exception to this is an entry in /etc/user_attr
(see the man page for more information). Below is an example of
a /etc/project file entry for the oracle user:
user.oracle:2000::::project.cpu-shares=(privileged,300,deny)
Batch processes can be assigned to other projects by modifying the
process that initiates them to use the newtask command. For
example, the scripts that initiate RMAN backups could use the newtask
command to initiate the backup and assign the process to the project
of the particular instance being backed up. The following syntax would
be used in a backup script to initiate rman and assign the process
(or processes) to the project of the instance (not the user.oracle
project):
/usr/bin/newtask -p $ORACLE_SID ${ORACLE_HOME}/bin/$RMAN cmdfile "$CMDFILE"
Interactive processes can be assigned to other projects by executing
the newtask command within the user's shell. For example, the
oracle user can "charge" the myfirstinstance project for the CPU time
used during an interactive session. The -p option of the id
command returns the current project. The following example shows that
the oracle user's shell is initially running in the user.oracle project.
The newtask command is then used to start a new shell, which
is assigned to the myfirstinstance project. Last, the id command
is executed again to demonstrate that the project has, in fact, been
changed to the myfirstinstance project:
$ id -p
uid=100(oracle) gid=100(dba) projid=2000(user.oracle)
$ newtask -p myfirstinstance
$ id -p
uid=100(oracle) gid=100(dba) projid= 1001(myfirstinstance)
Initialization Scripts
Oracle supplies standard start and stop scripts for instances
and listeners. These scripts must be modified to take advantage
of the FSS configuration at boot. Listing 3 is an example of an
Oracle database instance start script that initializes instances
based on the settings defined in the oratab file. Listing 4 is an
example of an Oracle listener start script, which initializes listeners
in a similar fashion. These scripts are "stripped down" clones of
the scripts provided by Oracle used to illustrate how to start the
database instance and listeners within the project framework.
Creating projects to control the execution of other utilities,
such as system processes, is also important. Careful consideration
should be given to which processes are run in the "system" project
since, by default, it has unlimited shares. Services that are required
but have the potential to fall victim to denial-of-service attacks
should be run in projects with tighter controls. The Samba processes,
for example, are run in the "system" project. The danger, of course,
is that the most recent "share walking Windows worm or virus" could
potentially impact your database server. To prevent this, create
a project named "samba" and assign it fewer shares than the database
instances:
samba:3003:Samba - NT Access:::project.cpu-shares=(privileged,10,deny)
Then modify the init script to start one or both daemons using newtask
as shown in Listing 5. Restricting services such as Samba will provide
more predictable system response by greatly reducing the potential
impact of extra services.
Utilities
After implementation, the prstat command can be used to
view system activity. Similar in function to the top command,
prstat reports utilization on a per-project basis, every
few seconds. The -J flag specifies that project and process
information should be reported (Listing 6). The -T -j combination
can be used to view the processes and task associated with a project
(Listing 7). The prstat command has a rich set of options;
consult the man page for more information.
The ps command in Solaris 9 has been update to include
new options to support FSS. Again, the best source for command option
information is the manual page. It is useful to create an alias,
which displays project information in addition to the more common
options:
'ps -eo project,user,pid,ppid,pcpu,args'
The alias above will show the project association as well as the process
information. The pgrep and pkill commands also have
options that support the project syntax.
The prctl command can display or dynamically modify resource
controls for projects. The control we are primarily concerned with
is project.cpu-shares. To dynamically change the CPU shares for
the myfirstinstance project to 2000 shares, enter the following
command:
/usr/bin/prctl -r -n project.cpu-shares -v 2000 -i project myfirstinstance
This change takes effect immediately but will persist after reboot
only if /etc/project is also updated. This command can be placed in
a cron table to give a particular instance more CPU shares at a specific
time. A second cron job can be added to bring the instance back to
its normal value. This technique can be used to allow an instance
serving an online application (given the higher weight as described
above) to remain responsive while allowing an instance serving a long-running
batch process to consume unused CPU cycles.
Tuning the allocation of CPU shares is often an exercise in trial
and error. The prctl utility can be used to quickly change
the weight of any project, task, or process. It is a "best practice"
to modify /etc/project before running the prctl command to
alter the scheduler. This practice maintains the weight of each
project even if the system is rebooted. The Perl script, srm_apply_project
(Listing 8), is a utility that reads the /etc/project file and applies
the current values to the system. The script reads /etc/project
and applies the "project.cpu-shares" settings to all projects or
a specified project (depending on command-line options). While the
script also supports the "project.pool" setting, pools and processor
sets are beyond the scope of this article. The script will ask the
user to approve each change, or it can be run with the "--yes2all"
flag, which automatically answers yes to all questions. The following
example applies the settings for all projects without prompting
the user:
$ ./srm_apply_project --yes2all
The next example modifies only the project named "myfirstinstance",
and only after the user approves the change:
$ ./srm_apply_project --project myfirstinstance
Change myfirstinstance cpu-shares to 100? (yes or no) : yes
Executing: prctl -r -n project.cpu-shares -v 100 -i project myfirstinstance
Occasionally an instance may be started or restarted without using
the newtask command. This generally happens as database and
systems administrators are becoming familiar with this new framework.
As previously described, the instance and listener (in this case)
would be running under the user.oracle project. Luckily, neither the
instance nor the listener will need to be restarted in order to associate
the instance with the correct project. Any process can be dynamically
reassigned to a project using the newtask command. The following
example moves process id 2345 into the myfirstinstance project:
newtask -v -p myfirstinstance -c 2345
Manually moving each process associated with a busy Oracle database
instance into a project can be a tedious task. Luckily, the Oracle
8 and 9 process naming convention allows this process to be automated
quite easily. The srm_instance2project Perl script (Listing 9) dynamically
moves processes of an Oracle instance into a project of the same name.
The script assumes that there is an entry in /etc/project for every
running database instance. The script requires one parameter -- an
instance name or the string "ALL". The script describes what it will
do and prompts the user for confirmation. Before running this script
for a particular instance, the tnslsnr process should be associated
with the correct project using the newtask command, as in the
example above.
When FSS is enabled, user processes not started under a specific
project will belong to the project named default. Because this project
has only one share (by default), application processes should be
moved from this project or they could perform poorly. These processes
can be identified using the pgrep or ps utilities.
The following command will list the process ids of all processes
in the default project:
pgrep -J default
For more detail, use the ps command with grep. For example:
ps -eo project,user,pid,args | grep default
With slight modification, either command can also help identify processes
in the system project.
Automation can greatly reduce the time it takes to move processes
into projects. The srm_user2project Perl script (Listing 10) is
a simple example of how to move processes into a project based on
user ID. This script requires a user name as the first argument
and an optional project name as a second argument. The process flow
is a simple loop, reading through the output of the ps command
and executing the newtask command as necessary. This example
can be extended to handle many different sets of criteria.
Other Considerations
In the timeshare class, the load average (reported by uptime)
is a good indicator of system performance. The load average is based
on the number of running and runnable processes over a specific
time (1, 5, and 15 minutes). This number, in relation to the number
of processors, is a good indication of the demand for CPU. This
is true when using the FSS class as well, while the system load
is less than the number of processors. However, once the demand
is greater than the available resources, a high load average doesn't
necessarily indicate that all instances are performing poorly. This
is because the scheduler takes into account the weight assigned
to the project of the runnable thread when determining its scheduling
priority. Therefore, an extremely high load average is likely to
occur if a project with only a few shares has many runnable threads
at the same time one or more projects with greater share values
have many runnable threads. This is because the higher priority
threads are being serviced while the lower ones remain in a runnable
state (increasing the value of load average). For this reason, the
load average may be higher using the FSS class than it was using
the TS class on an oversubscribed system.
Based on the greater control over the priority of groups of processes,
it may be concluded that development (and/or test) instances should
now reside on the same physical machine as production instances.
While this is entirely possible, in most cases the risk of negative
impact is still too high. The challenge is that the FSS is only
controlling the priority of the threads. The safest way to implement
this in a Solaris 9 environment is to implement Resource Pools and
Processor Sets. Although partitioning isn't covered in this article,
the srm_apply_project script (Listing 6) supports dynamically binding
a process to a pool.
Conclusion
Many small- to medium-sized IT shops have embraced Oracle's RDBMS
because of its power, speed, and stability. For financial reasons,
these shops tend to license Oracle on a per-CPU basis. The Solaris
9 Resource Management framework can help maximize a company's investment
in Oracle by allowing the consolidation of multiple database instances
onto a single physical machine, while maintaining Service Level
Agreements through the use of the Fair Share Scheduler. This article
focuses on Oracle's RDBMS, however, the methodology described here
also can be applied and extended to other consolidation projects.
References
Lawson, Stuart J. Resource Management in the Solaris 9 Operating
Environment. Santa Clara: Sun Microsystems, Inc., 2002
Galvin, Peter Baer. "Solaris TM Resource Management -- The Fair
Share Scheduler." Sys Admin Magazine June 2003: 12-6 (http://www.samag.com/documents/s=8228/sam0306k/0306k.htm)
System Administration Guide: Resource Management and Network Services.
Santa Clara: Sun Microsystems, Inc., 2003
Eric Paul Forgette is a Unix Systems Engineer for a transportation
company based in Richmond, Virginia. Eric graduated from Buffalo
State College with a B.S. in Computer Information Systems. He can
be reached at: e4jet@mac.com. |