Writing Maintainable Perl Programs and Shell Scripts
Brian Tanaka
It's all too common for system administrators, especially novices, to fall into the trap of writing sloppy Perl programs and shell scripts. Sloppy programs and scripts are problematic because, although they may function perfectly well, they are harder to maintain, pass on to others, enhance, and debug than tidy, well-organized scripts.
Fortunately, by using a handful of simple techniques, you can write tidy programs and scripts. This will save you time and aggravation in the long run, and help you finish your scripts with speed and efficiency in the short run. By using these techniques, you will make your job and the jobs of your colleagues easier.
This article is not intended to be an exhaustive exploration of good programmatic style. Rather, it describes some techniques that have worked well for me over the years. The techniques are easy to use and go a long way toward making programs and scripts easier to read and maintain.
The techniques I will cover can be summarized as follows:
- Be consistent
- Include an informative prologue
- Use whitespace effectively
- Comment your code
- Be terse with caution
- Include documentation
Consistency of Style How you indent, line up your braces, use whitespace, and place comments constitutes your own style of programming. How you decide to handle each element of your style is up to you, provided it is logical. But, once you find a style you like, use it consistently.
For instance, if you like to brace your "while" loops in Perl with the opening { on the same line, like this:
while <FOO> {
&dostuff;
}
then always do it that way. If you ever get the urge to write:
while <FOO>
{
&dostuff;
}
resist it. Consistency throughout your programs and scripts significantly increases readability.
Prologue The beginning of your program or script is very important. A quick review of the first dozen lines or so should provide the reader with critical information that will help her interpret or modify the code.
Here is an example of a good prologue:
#!/usr/bin/perl
# diskhogs - This script generates a report of the ten users
# who use the most diskspace on this host.
#
# Author: Brian Tanaka
# Date: Mon Jul 6 18:21:14 PDT 1998
#
By skimming these lines, the reader knows which interpreter is being used (Perl), what the name of the script is (diskhogs), what it is supposed to do (report the top disk hogs), who wrote it, and when it was written.
The statement of purpose is particularly important. Don't make your reader read through your whole script just to find out what it's supposed to do. Be as verbose as you need to be, but make sure you have a clear summary of the purpose of the program.
If revisions are made, use this section to document who made the revisions, when they were made, and provide a brief description of the changes. The prologue is also a great place to put special instructions to the reader. For instance, you could note that the program uses a configuration file that can be modified to customize behavior, or mention which variables to check and modify if the program or script is moved to a new host.
Whitespace Liberal and logical use of whitespace is one of the most effective ways of making your programs and scripts more readable. Be aware of the amount of whitespace between words, between lines, and between groups of lines. Don't crowd everything together. For instance, you can emphasize the boundaries between functional or logical units within your program by increasing whitespace between them. So, instead of:
# Write a timestamp to the log file to show when the script ran
$date = `date`;
chomp($date);
open (LOGFILE, ">>$logfile") || die "could not open $logfile\n";
print LOGFILE "$date: ";
# Count how many times the string "deblobulator.html" appears
# in the web access log
open (ACCESS, "$access") || die "could not open $access\n";
while ($line = <ACCESS>) {
$count++ if ($line =~ m/deblobulator\.html/i);
}
close (ACCESS);
# Write the result to the log file
print LOGFILE "deblobulator.html hits: $count\n";
close (LOGFILE);
you might do something like this:
#
# Write a timestamp to the log file to show when the script ran
#
$date = `date`;
chomp($date);
open (LOGFILE, ">>$logfile") || die "could not open $logfile\n";
print LOGFILE "$date: ";
#
# Count how many times the string "deblobulator.html" appears
# in the web access log
#
open (ACCESS, "$access") || die "could not open $access\n";
while ($line = <ACCESS>) {
$count++ if ($line =~ m/deblobulator\.html/i);
}
close (ACCESS);
#
# Write the result to the log file
#
print LOGFILE "deblobulator.html hits: $count\n";
close (LOGFILE);
Although this is a trivial example, you can see that the second example is easier to understand because the functional groups are visually differentiated. In a long, complex program, visual clarity becomes even more critical.
Indentation is another important form of whitespace. A chunk of code like this:
while <FOO> {
&dostuff;
if ( $stuff ne "blah" ) {
&domorestuff;
}
}
is much easier to read than:
while <FOO> {
&dostuff;
if ( $stuff ne "blah" ) {
&domorestuff;
}
}
Be as consistent and predictable as possible, so that your use of whitespace reliably means something to the reader (even if only subconciously). Given that, you should pay attention to your use of whitespace within single lines as well.
Comments Inheriting a long, complicated program or script that lacks useful comments is aggravating. Comment your code. Too many comments are better than not enough.
I adhere to the following guidelines:
1. Comments that introduce a new, major section of code are formatted as follows:
#
# Description of following section goes here. It's as
# detailed as the situation requires.
#
I use at least three blank lines before this type of comment in order to set it apart from the section above, because it heralds a new distinct logical chunk.
2. Comments that explain a line (or set of lines) precede that line and are formatted as follows:
# Explanation of following line or lines goes here.
3. Very short descriptive comments can be included on the same line, like so:
$datafile = "/home/btanaka/data"; # Important data file
$tmpfile = "/var/tmp/deblobulator.$$"; # Temporary blob file
Note that the short comments line up.
A richly commented program is easier to understand and modify. Get into the habit of explaining how each section of your code works. If you're ever in doubt about whether a section or line requires a comment, go ahead and write one. It's better to err on the side of too much information than not enough.
The Long Road vs. the Short Road The Perl slogan says, "There's More Than One Way To Do It." This is often true in shell scripts as well. Some ways of doing a given task are more terse than others.
It's up to you to decide how terse any given chunk of code should be. However, bear in mind that in terms of readability, it's better to be less terse unless there's a compelling reason to do otherwise. If you do have a compelling reason to be very terse, you can mitigate the negative effect on readability with good, clear, concise comments. In general, I prefer to use a less terse style unless by doing so I incur some performance cost that I cannot afford.
Documentation A well-written program or script is not complete without an appropriate amount of documentation. In many cases, the comments in the code are enough. In other cases, more thorough documentation is necessary, and it's worth taking the time to create a README file, a man page, or POD format documentation.
Summary Be consistent; always include an informative prologue; use whitespace effectively; comment your code; be terse with caution; and include documentation. By doing these simple things, you will make your Perl programs and shell scripts much easier to read and maintain.
If you would like to learn more about stylistic principles in programming, many books about programming devote at least a short section to it. For Perl programmers, Progamming Perl, 2nd Edition by Wall, Christiansen, and Schwartz from O'Reilly and Associates covers style issues in Chapter 8: Other Oddments.
About the Author
Brian Tanaka is a system administrator currently working at RealNetworks in Seattle, Washington. He can be reached at: btanaka@real.com.
|