Implementing Semaphores in the Shell

Ed Schaefer and John Spurgeon

Welcome to the inaugural edition of our Scripting Solutions column. Every other month, we will review scripts that solve systems administration problems, illustrate programming techniques, or highlight pitfalls to avoid. Our emphasis will be on sh/ksh/bash shell scripts using mostly traditional Unix tools. If you have any questions or comments, please email us at sascripts@yahoo.com.

Semaphores are mechanisms for controlling access to shared resources in a multi-tasking environment. Semaphores can be used to solve a variety of programming problems. For example, systems administrators might use semaphores to:

Ensure that only one instance of a particular cron job is running.
Limit the use of a program to N concurrent users because of licensing restrictions.
Guarantee that critical administration tasks are performed by only one person at a time.
Manage resource utilization by allowing up to N instances of a program to run in parallel.

Semaphores were first described by Dijkstra in 1965. From his viewpoint, a semaphore is an object comprising:

An integer value representing the number of resources that are free.
A set or queue of processes waiting for a resource.
The operations Init, P, and V.

The Init operation initializes the value of a semaphore. When a process needs a resource, the P operation (which comes from the word "proberen" and means "to test" in Dijkstra's native Dutch language) tests the value of the semaphore and decrements the value if a resource is available. When the value of a semaphore is zero, processes attempting to grab a resource must wait until one becomes available. The V operation (from "verhogen", which means "to increment") frees a resource by incrementing the value of the semaphore.

A semaphore that has only two values (0 and 1) is a binary semaphore, while semaphores that are initialized with a value greater than 1 are called counting semaphores.

The Semaphore Script

In this article, we present a Korn shell implementation of a counting semaphore. The sidebar shows the syntax of the script and a description of the supported options and operands.

Our choice of options -P and -V reflects Dijkstra's original work (though others have used DOWN and UP. or WAIT and SIGNAL. to refer to the P and V operations). When the -P and -V options are used together, the order in which the options appear on the command line determines which operation is performed first.

The -q option can be used to place processes that are waiting for a resource in a queue. When a resource becomes available, the process at the head of the queue gets it. Without the -q option, the order in which waiting processes obtain a resource is indeterminate.

By default, users have their own set of semaphores that are isolated from the semaphores in other user spaces. Semaphores may be shared among users by specifying an alternate user space using the -u option. User spaces are implemented as directories, so ensure that the directory representing the user space has the appropriate permissions by using the -o and -m options if necessary.

When the -P option is used to grab a resource, the semaphore process, by default, associates its parent process id with the obtained resource (see the function grab_resource). Since a process that has a resource could terminate without freeing the resource, a function called wipe_semaphore periodically checks each busy resource to determine whether the process that grabbed the resource is still running; if it isn't, the resource is automatically released. In some situations, it might be necessary to override the default behavior and specify a parent process id using the -p option. For example, if a program places jobs in the background and the parent process terminates before the children do, then you might need to use the -p option since orphaned processes inherit ppid 1.

The other options may be used without specifying a semaphore by name. If a semaphore is not specified, then the operation is applied to all of the semaphores in the user space. For example, the command semaphore -r deletes all of the user's semaphores. If no command-line arguments are supplied or if an invalid option is used, the program prints the user documentation.

An Example: batch_zip/batch_unzip

The batch_zip/batch_unzip program, Listing 1, is a script that runs multiple zip commands in parallel. The program is designed so that two files, batch_zip and batch_unzip, can share the same code. The files can be hard links, one can be a symbolic link, or they can be separate copies of each other.

Within the script, a semaphore is used to limit the number of concurrent zip processes that can run at the same time. When the number of concurrent processes reaches batch_size for a particular semaphore, additional zip jobs must wait until a resource becomes available.

Consider the following command:

./batch_zip -s 2 f1 f2 f3

Since the -n option is missing, the default semaphore name (batch_zip) is used. Since the -z option is missing, the default zip command (gzip) is used. The -s option followed by the number 2 specifies a batch size of 2. The operands f1, f2, and f3 are pathnames of files to be compressed.

As the while loop in batch_zip/batch_unzip executes, separate processes are created for the block of commands between curly braces. Each process is placed in the background. The use of a counting semaphore ensures that at most two instances of gzip will be allowed to run simultaneously.

High-Level Implementation

A semaphore must be initialized before its value can be incremented or decremented. This can be done explicitly by calling semaphore with the -I option. Otherwise, the semaphore is implicitly initialized with a value of 1 the first time the -P option is used. In both cases, the Init_semaphore function, Listing 2, is called to perform the initialization.

When using the -P option without the -q option, the semaphore script calls keep_trying, Listing 3. Without the -t option, the program loops until a resource is obtained. The -t option allows the program to time out if all resources are busy. With the -q option, the program calls wait_in_queue, which calls keep_trying once the process reaches the head of a queue.

Keep_trying calls P_semaphore, Listing 4, which checks the value of the semaphore to determine whether any resources are free at that moment in time. If there are no free resources, P_semaphore checks to see how many seconds have elapsed since the semaphore was last wiped. If the number of seconds exceeds the value of WIPE_INTERVAL, the semaphore is wiped again. Wiping the semaphore frees resources that are still being held by processes that have terminated.

If the value of the semaphore is greater than zero when P_semaphore is called, then the function iterates through the numbered resources, trying to grab one. As soon as a resource is obtained, the function returns success. Regardless of the value returned by get_semaphore_value, each resource could be busy when P_semaphore tries to grab it; if this is the case, then P_semaphore returns a failure code after testing each resource individually.

When the semaphore script is called with the -V option, the V_semaphore function, Listing 5, tries to free one semaphore resource. A resource is freed if and only if the process id associated with a busy resource matches the pid value passed to the V_semaphore function.

Low-Level Implementation

Semaphores are one of three basic mechanisms typically used to implement Interprocess Communication (IPC). In the implementation of our semaphore script, polling and the filesystem are surrogates for the other two primitives: messages and shared memory.

When a program calls the semaphore script to obtain a resource, the process appears to block until a resource is obtained. To achieve this effect, the semaphore script polls the value of the semaphore. If the value is zero, then the script calls the sleep command before checking again. In other semaphore implementations, the V operation may signal one or more blocked processes to wake up after incrementing the value of the semaphore. Polling keeps the implementation simple without adding significant overhead compared to the cost of implementing the semaphore in the shell to begin with.

Directories, symbolic links, and regular files allow processes to share data structures. This is analogous to shared memory. The first three levels of the directory tree used to implement a semaphore are as follows:

/tmp/semaphores/
<user>/
<semaphore>/

The /tmp/semaphores directory contains a subdirectory, /tmp/semaphores/<user>, for each user space. Semaphore names must be unique within a given user space. The data structures associated with a particular semaphore are implemented by a directory tree rooted at /tmp/semaphores/<user>/<semaphore>. This directory contains the following files and subdirectories:

last_wipe
num_resources
resources/
        0 -> <pid>
        1 -> <pid>
        n -> <pid>
queue/
        <date-time>.<pid>
        <date-time>.<pid>

The file ./last_wipe contains an integer representing the date and time the semaphore was last wiped. The file ./num_resources contains an integer representing the initial value of the semaphore. The directory ./resources/ contains between zero and num_resources symbolic links. Each link is an integer representing a busy resource. The link's target is the process id of the process that holds the resource. The directory ./queue/ can be used to implement a queue for processes that are waiting for a resource.

The following functions hide these low-level implementation details from the functions that call them:

is_initialized (Listing 6) -- Determines whether a semaphore has been initialized by checking to see if the file /tmp/<user>/<semaphore>/num_resources contains a positive integer.
set_num_resources (Listing 7) -- Called by Init_semaphore to initialize a semaphore's value. An integer representing the initial value of the semaphore is written to the file /tmp/<user>/<semaphore>/num_resources.
get_num_resources (Listing 8) -- Prints a semaphore's initial value, which is read from the file /tmp/<user>/<semaphore>/num_resources.
get_semaphore_value (Listing 9) -- Prints the current value of a semaphore. The value is calculated by subtracting the number of resources in use (a count of the number of symbolic links in /tmp/<user>/<semaphore>/resources/) from the semaphore's initial value, stored in /tmp/<user>/<semaphore>/num_resources.
get_resource_holder (Listing 10) -- Prints the process id associated with a particular resource. The function calls get_link_target (Listing 11) to obtain the target of the symbolic link /tmp/<user>/<semaphore>/resources/<n>.
grab_resource (Listing 12) -- Attempts to obtain a particular resource. The function must essentially test the semaphore's value and decrement the value if a resource is available, all in a single atomic step. This is accomplished by attempting to create the symbolic link /tmp/<user>/<semaphore>/resources/<n>. Michael Wang discusses this technique in detail in "lock_unlock -- Creating and Removing a Lock File" (http://www.unixreview.com/documents/s=1344/ur0402g/).
return_resource (Listing 13) -- Frees a resource, thus incrementing a semaphore's value, by deleting the symbolic link /tmp/<user>/<semaphore>/resources/<n>.

Caveats

The main advantage of the semaphore script lies in how easily other scripts can use it to solve concurrency problems. On the other hand, you need to watch out for certain security, performance, and correctness pitfalls.

Permissions on the directories and files used to implement various data structures are a potential security concern. For example, if users have permission to write to a semaphore's resources directory, they can launch a "denial of resource" attack by creating symbolic links in the resources directory.

Any shell script implementation of a semaphore is inefficient compared to semaphores provided by the operating system. The degree to which this is a problem depends on the performance requirements of the programs that use the semaphore script, as well as the impact the script has on overall system performance. In general, the performance impact of a single semaphore is proportional to the number of resources the semaphore has, and the number of processes that are waiting for a resource.

Efforts have been made to identify and address performance bottlenecks. For example, wiping a semaphore can be an expensive operation. In an earlier implementation, P_semaphore called the wipe_semaphore function each time P_semaphore was called. The current version of P_semaphore doesn't bother calling wipe_semaphore if the value of the semaphore is greater than zero, since a resource will probably be obtained regardless of whether the semaphore is wiped first. Furthermore, when the semaphore value is zero, calls to wipe_semaphore are limited to one every WIPE_INTERVAL seconds.

As previously mentioned, programmers must be careful when placing jobs in the background, since the background job could be inherited by pid 1. This could lead to unexpected results if the parent process id is not specified explicitly using the -p option. The init process (pid 1) typically runs indefinitely, so a resource that is held by pid 1, will never be freed by the wipe_semaphore function.

Another pitfall involves the symbolic links used to represent a semaphore's resources. These links should not reside on a shared file system. Otherwise, a semaphore process running on system A could grab a resource, and a process running on system B could free the resource unexpectedly when the wipe_semaphore function is called.

Despite these caveats, semaphore is a powerful script. When used judiciously, it provides a convenient and reliable way to solve a variety of shared resource problems.

The Tarball

The tarball (http://www.samag.com/code/) contains all source code and documentation found in the directory /opt/local/semaphore/. It's easy to move the semaphore files to a different location. Simply change the SOURCE_DIR variable in the file /opt/local/semaphore/bin/semaphore.

References

Dijkstra, Edsger W. Cooperating Sequential Processes in Programming Languages, ed F. Genuys, Academic Press, New York, N.Y. 1968. Additional Dijkstra manuscripts at: http://www.cs.utexas.edu/users/EWD/

Wang, Michael. "lock_unlock -- Creating and Removing a Lock File", UnixReview.com's Shell Corner, February, 2004. http://www.unixreview.com/documents/s=1344/ur0402g/

Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. Ed also hosts the monthly Shell Corner column on UnixReview.com. He can be reached at: shellcorner@comcast.net.

John Spurgeon is a software developer and systems administrator for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. Outside of work, he enjoys turfgrass management, triathlons, and spending time with his family.