Implementing
Semaphores in the Shell
Ed Schaefer and John Spurgeon Welcome to the inaugural edition
of our Scripting Solutions column. Every other month, we will review
scripts that solve systems administration problems, illustrate programming
techniques, or highlight pitfalls to avoid. Our emphasis will be
on sh/ksh/bash shell scripts using mostly traditional Unix tools.
If you have any questions or comments, please email us at sascripts@yahoo.com.
Semaphores are mechanisms for controlling access to shared resources
in a multi-tasking environment. Semaphores can be used to solve
a variety of programming problems. For example, systems administrators
might use semaphores to:
- Ensure that only one instance of a particular cron job is running.
- Limit the use of a program to N concurrent users because of
licensing restrictions.
- Guarantee that critical administration tasks are performed
by only one person at a time.
- Manage resource utilization by allowing up to N instances of
a program to run in parallel.
Semaphores were first described by Dijkstra in 1965. From his
viewpoint, a semaphore is an object comprising:
- An integer value representing the number of resources that
are free.
- A set or queue of processes waiting for a resource.
- The operations Init, P, and V.
The Init operation initializes the value of a semaphore. When
a process needs a resource, the P operation (which comes from the
word "proberen" and means "to test" in Dijkstra's native Dutch language)
tests the value of the semaphore and decrements the value if a resource
is available. When the value of a semaphore is zero, processes attempting
to grab a resource must wait until one becomes available. The V
operation (from "verhogen", which means "to increment") frees a
resource by incrementing the value of the semaphore.
A semaphore that has only two values (0 and 1) is a binary semaphore,
while semaphores that are initialized with a value greater than
1 are called counting semaphores.
The Semaphore Script
In this article, we present a Korn shell implementation of a counting
semaphore. The sidebar shows the syntax of the script and a description
of the supported options and operands.
Our choice of options -P and -V reflects Dijkstra's original work
(though others have used DOWN and UP. or WAIT and SIGNAL. to refer
to the P and V operations). When the -P and -V options are used
together, the order in which the options appear on the command line
determines which operation is performed first.
The -q option can be used to place processes that are waiting
for a resource in a queue. When a resource becomes available, the
process at the head of the queue gets it. Without the -q option,
the order in which waiting processes obtain a resource is indeterminate.
By default, users have their own set of semaphores that are isolated
from the semaphores in other user spaces. Semaphores may be shared
among users by specifying an alternate user space using the -u option.
User spaces are implemented as directories, so ensure that the directory
representing the user space has the appropriate permissions by using
the -o and -m options if necessary.
When the -P option is used to grab a resource, the semaphore process,
by default, associates its parent process id with the obtained resource
(see the function grab_resource). Since a process that has a resource
could terminate without freeing the resource, a function called
wipe_semaphore periodically checks each busy resource to determine
whether the process that grabbed the resource is still running;
if it isn't, the resource is automatically released. In some situations,
it might be necessary to override the default behavior and specify
a parent process id using the -p option. For example, if a program
places jobs in the background and the parent process terminates
before the children do, then you might need to use the -p option
since orphaned processes inherit ppid 1.
The other options may be used without specifying a semaphore by
name. If a semaphore is not specified, then the operation is applied
to all of the semaphores in the user space. For example, the command
semaphore -r deletes all of the user's semaphores. If no
command-line arguments are supplied or if an invalid option is used,
the program prints the user documentation.
An Example: batch_zip/batch_unzip
The batch_zip/batch_unzip program, Listing 1, is a script that
runs multiple zip commands in parallel. The program is designed
so that two files, batch_zip and batch_unzip, can share the same
code. The files can be hard links, one can be a symbolic link, or
they can be separate copies of each other.
Within the script, a semaphore is used to limit the number of
concurrent zip processes that can run at the same time. When the
number of concurrent processes reaches batch_size for a particular
semaphore, additional zip jobs must wait until a resource becomes
available.
Consider the following command:
./batch_zip -s 2 f1 f2 f3
Since the -n option is missing, the default semaphore name (batch_zip)
is used. Since the -z option is missing, the default zip command (gzip)
is used. The -s option followed by the number 2 specifies a batch
size of 2. The operands f1, f2, and f3 are pathnames of files to be
compressed.
As the while loop in batch_zip/batch_unzip executes, separate
processes are created for the block of commands between curly braces.
Each process is placed in the background. The use of a counting
semaphore ensures that at most two instances of gzip will be allowed
to run simultaneously.
High-Level Implementation
A semaphore must be initialized before its value can be incremented
or decremented. This can be done explicitly by calling semaphore
with the -I option. Otherwise, the semaphore is implicitly initialized
with a value of 1 the first time the -P option is used. In both
cases, the Init_semaphore function, Listing 2, is called to perform
the initialization.
When using the -P option without the -q option, the semaphore
script calls keep_trying, Listing 3. Without the -t option, the
program loops until a resource is obtained. The -t option allows
the program to time out if all resources are busy. With the -q option,
the program calls wait_in_queue, which calls keep_trying once the
process reaches the head of a queue.
Keep_trying calls P_semaphore, Listing 4, which checks the value
of the semaphore to determine whether any resources are free at
that moment in time. If there are no free resources, P_semaphore
checks to see how many seconds have elapsed since the semaphore
was last wiped. If the number of seconds exceeds the value of WIPE_INTERVAL,
the semaphore is wiped again. Wiping the semaphore frees resources
that are still being held by processes that have terminated.
If the value of the semaphore is greater than zero when P_semaphore
is called, then the function iterates through the numbered resources,
trying to grab one. As soon as a resource is obtained, the function
returns success. Regardless of the value returned by get_semaphore_value,
each resource could be busy when P_semaphore tries to grab it; if
this is the case, then P_semaphore returns a failure code after
testing each resource individually.
When the semaphore script is called with the -V option, the V_semaphore
function, Listing 5, tries to free one semaphore resource. A resource
is freed if and only if the process id associated with a busy resource
matches the pid value passed to the V_semaphore function.
Low-Level Implementation
Semaphores are one of three basic mechanisms typically used to
implement Interprocess Communication (IPC). In the implementation
of our semaphore script, polling and the filesystem are surrogates
for the other two primitives: messages and shared memory.
When a program calls the semaphore script to obtain a resource,
the process appears to block until a resource is obtained. To achieve
this effect, the semaphore script polls the value of the semaphore.
If the value is zero, then the script calls the sleep command before
checking again. In other semaphore implementations, the V operation
may signal one or more blocked processes to wake up after incrementing
the value of the semaphore. Polling keeps the implementation simple
without adding significant overhead compared to the cost of implementing
the semaphore in the shell to begin with.
Directories, symbolic links, and regular files allow processes
to share data structures. This is analogous to shared memory. The
first three levels of the directory tree used to implement a semaphore
are as follows:
/tmp/semaphores/
<user>/
<semaphore>/
The /tmp/semaphores directory contains a subdirectory, /tmp/semaphores/<user>,
for each user space. Semaphore names must be unique within a given
user space. The data structures associated with a particular semaphore
are implemented by a directory tree rooted at /tmp/semaphores/<user>/<semaphore>.
This directory contains the following files and subdirectories:
last_wipe
num_resources
resources/
0 -> <pid>
1 -> <pid>
n -> <pid>
queue/
<date-time>.<pid>
<date-time>.<pid>
The file ./last_wipe contains an integer representing the date and
time the semaphore was last wiped. The file ./num_resources contains
an integer representing the initial value of the semaphore. The directory
./resources/ contains between zero and num_resources symbolic links.
Each link is an integer representing a busy resource. The link's target
is the process id of the process that holds the resource. The directory
./queue/ can be used to implement a queue for processes that are waiting
for a resource.
The following functions hide these low-level implementation details
from the functions that call them:
- is_initialized (Listing 6) -- Determines whether a semaphore
has been initialized by checking to see if the file /tmp/<user>/<semaphore>/num_resources
contains a positive integer.
- set_num_resources (Listing 7) -- Called by Init_semaphore to
initialize a semaphore's value. An integer representing the initial
value of the semaphore is written to the file /tmp/<user>/<semaphore>/num_resources.
- get_num_resources (Listing 8) -- Prints a semaphore's initial
value, which is read from the file /tmp/<user>/<semaphore>/num_resources.
- get_semaphore_value (Listing 9) -- Prints the current value
of a semaphore. The value is calculated by subtracting the number
of resources in use (a count of the number of symbolic links in
/tmp/<user>/<semaphore>/resources/) from the semaphore's
initial value, stored in /tmp/<user>/<semaphore>/num_resources.
- get_resource_holder (Listing 10) -- Prints the process id associated
with a particular resource. The function calls get_link_target
(Listing 11) to obtain the target of the symbolic link /tmp/<user>/<semaphore>/resources/<n>.
- grab_resource (Listing 12) -- Attempts to obtain a particular
resource. The function must essentially test the semaphore's value
and decrement the value if a resource is available, all in a single
atomic step. This is accomplished by attempting to create the
symbolic link /tmp/<user>/<semaphore>/resources/<n>.
Michael Wang discusses this technique in detail in "lock_unlock
-- Creating and Removing a Lock File" (http://www.unixreview.com/documents/s=1344/ur0402g/).
- return_resource (Listing 13) -- Frees a resource, thus incrementing
a semaphore's value, by deleting the symbolic link /tmp/<user>/<semaphore>/resources/<n>.
Caveats
The main advantage of the semaphore script lies in how easily
other scripts can use it to solve concurrency problems. On the other
hand, you need to watch out for certain security, performance, and
correctness pitfalls.
Permissions on the directories and files used to implement various
data structures are a potential security concern. For example, if
users have permission to write to a semaphore's resources directory,
they can launch a "denial of resource" attack by creating symbolic
links in the resources directory.
Any shell script implementation of a semaphore is inefficient
compared to semaphores provided by the operating system. The degree
to which this is a problem depends on the performance requirements
of the programs that use the semaphore script, as well as the impact
the script has on overall system performance. In general, the performance
impact of a single semaphore is proportional to the number of resources
the semaphore has, and the number of processes that are waiting
for a resource.
Efforts have been made to identify and address performance bottlenecks.
For example, wiping a semaphore can be an expensive operation. In
an earlier implementation, P_semaphore called the wipe_semaphore
function each time P_semaphore was called. The current version of
P_semaphore doesn't bother calling wipe_semaphore if the value of
the semaphore is greater than zero, since a resource will probably
be obtained regardless of whether the semaphore is wiped first.
Furthermore, when the semaphore value is zero, calls to wipe_semaphore
are limited to one every WIPE_INTERVAL seconds.
As previously mentioned, programmers must be careful when placing
jobs in the background, since the background job could be inherited
by pid 1. This could lead to unexpected results if the parent process
id is not specified explicitly using the -p option. The init process
(pid 1) typically runs indefinitely, so a resource that is held
by pid 1, will never be freed by the wipe_semaphore function.
Another pitfall involves the symbolic links used to represent
a semaphore's resources. These links should not reside on a shared
file system. Otherwise, a semaphore process running on system A
could grab a resource, and a process running on system B could free
the resource unexpectedly when the wipe_semaphore function is called.
Despite these caveats, semaphore is a powerful script. When used
judiciously, it provides a convenient and reliable way to solve
a variety of shared resource problems.
The Tarball
The tarball (http://www.samag.com/code/) contains all source
code and documentation found in the directory /opt/local/semaphore/.
It's easy to move the semaphore files to a different location. Simply
change the SOURCE_DIR variable in the file /opt/local/semaphore/bin/semaphore.
References
Dijkstra, Edsger W. Cooperating Sequential Processes in Programming
Languages, ed F. Genuys, Academic Press, New York, N.Y. 1968.
Additional Dijkstra manuscripts at: http://www.cs.utexas.edu/users/EWD/
Wang, Michael. "lock_unlock -- Creating and Removing a Lock File",
UnixReview.com's Shell Corner, February, 2004. http://www.unixreview.com/documents/s=1344/ur0402g/
Ed Schaefer is a frequent contributor to Sys Admin. He
is a software developer and DBA for Intel's Factory Integrated Information
Systems, FIIS, in Aloha, Oregon. Ed also hosts the monthly Shell Corner
column on UnixReview.com. He can be reached at: shellcorner@comcast.net.
John Spurgeon is a software developer and systems administrator
for Intel's Factory Integrated Information Systems, FIIS, in Aloha,
Oregon. Outside of work, he enjoys turfgrass management, triathlons,
and spending time with his family.
|