Cover V14, i03

Article

mar2005.tar

Understanding the Command Line

Randal L. Schwartz

In past columns, I've talked a lot about the Perl language, but have never said much about perl at the Unix shell command line. So, let's fix that by looking at some commonly used command-line constructs for Perl.

Let's take the simplest invocation:

perl my-script
This invokes my-script, using the relative or absolute path to the script as given, thus not using the PATH in any way. We can include arguments to the script:

perl my-script arg1 arg2 arg3
This sets up @ARGV to be the three individual values of arg1, arg2, and arg3, as if we had said:

@ARGV = qw(arg1 arg2 arg3);
If we want a space within one of the values, we need to use shell quoting rules:

perl my-script 'arg1a arg1b' arg2
This passes two arguments now, not three. We get the same result with:

perl my-script arg1a\ arg1b arg2
using a backslash to quote the space between the arguments. If there are any shell wildcard ("glob") characters, the shell expands them before calling our program:

perl my-script *.html
which might turn into (given three matching files):

@ARGV = qw(index.html problem.html results.html);
Note that Perl has no clue that a shell wildcard was involved here; it's as if we had typed the three names individually.

Perl doesn't interpret the @ARGV values in any particular way. They could be keywords, filenames, or some combination of the two. Traditionally, leading @ARGV elements that begin with a minus are considered "options", which we can process with modules such as Getopt::Std or Getopt::Long.

We can also have options to Perl itself by placing leading-minus values to the left of the script name. For example, we can invoke the debugger by adding -d:

perl -d my-script arg1 arg2 arg3
Now, the program is run under the normal Perl debugger. We can pick an alternate debugger (or module using the debugging interface for other analysis) with a colon argument following the -d:

perl -d:DProf my-script
This command selects the Devel::DProf module as the alternate "debugger", invoking a profiling of the Perl code.

Another common option ("switch") is -c, which compiles a Perl script without executing it:

perl -c my-script
You would do this to verify that the syntax of your script is good before actually moving it into place for production, including ensuring that all use'ed modules were also available. Any modules loaded at runtime (via require) or code constructed at runtime (like eval) wouldn't be checked, however. Also, all BEGIN and CHECK blocks are executed, so "compile only" is merely a casual definition.

You can enable warnings on the command line with -w:

perl -w my-script
although this is more frequently handled within the program as:

use warnings;
Sometimes, your program is small enough that it makes sense to include it entirely on the command line. Simply throw an -e switch there instead of the filename, and you're set:

perl -e 'print "Hello world!\n"'
Note that the quoting can get a bit weird. I typically use single quotes to keep the single argument to -e together and Perl's double quotes within the argument for Perl quoting. Sometimes, alternate quoting (via q// can come in handy):

perl -e 'print qq/Hello world!\n/'
Multiple -e arguments are concatenated, with only a space character between:

perl -e print -e 'qq/Hello!\n/'
By now, the number of options is a bit hard to remember. Luckily, Perl has a built-in help message, available with -h:

perl -h
And for a few more switches that aren't about running programs, let's look at the version information with -v in short form:

perl -v
and in long form with -V:

perl -V
The -V switch also gives us access to the various configuration options that Perl was built with and that it uses to compile binary extensions and install local programs. For example, to get the C compiler used to compile Perl:

perl -V:cc
and to get all the options related to where binaries are found or installed:

perl -V:'.*bin'
The regular expression pattern here is in quotes so that the shell doesn't try to expand it as a filename pattern. The output is in a form that can be evaluated by a Bourne-style shell easily:

eval 'perl -V:'.*bin''
echo $sitebin
No attempt is made to accommodate C-shell-style shells, of course.

Modules can be included from the command line with -M:

perl -MFile::Find -e 'find sub { print $File::Find::name, $/ }, "."'
The -MFile::Find is equivalent to including:

use File::Find;
in the resulting script. If you don't want the imports (such as find in this case), use lowercase -m or be specific with a trailing = syntax:

perl -MFile::Find=find,finddepth -e '...'
which turns into:

use File::Find qw(find finddepth);
Note the automatic comma splitting. Nice.

For text processing from a series of one or more files, we can add -n, which puts a wrapper around our program that looks like:

LINE:
  while (<>) {
    ... # rest of your program here
  }
In other words, the @ARGV list is interpreted as a series of files to be opened, and each line is placed in $_ until all the lines are processed. To print each line with a sequential line number in front, we can use the $. variable for the numbers:

perl -n -e 'print "$.: $_"' file1 file2 file3
We can bundle the switches that don't take arguments together with the following switch, as in:

perl -ne 'print ...'
Another way to approach this problem is the -p, which adds a print at the end of the loop:

LINE:
  while (<>) {
    ... # your program here
    print;
  }
So, we could just substitute the line number into the beginning of each line:

perl -pe 's/^/$.: /' file1 file2 file3
Going one step further, we could rewrite these modified lines back into the original files with the "inplace edit" switch: -i:

perl -i.bak -pe 's/^/$.: /' file1 file2 file3
Now, file1 will be renamed file1.bak, and the new updated contents written to a new file1. Similarly, file2 becomes file2.bak, and file3 becomes file3.bak.

If you leave off the option to -i, the "inplace edit without backup file" mode is enabled, which can save space but gives you no way to go back if you've toasted your files. Be very careful.

The line-looping modes (-n and -p) respect the current value of $/ to read a "line", which defaults to \n. However, you can specify alternate values with the -0 (that's a zero) switch. By default, -0 sets the delimiter to the NUL byte, which can be handy with GNU find's -print0 switch (which delimits the filenames with NUL bytes):

find . -name '*.html' -print0 | perl -n -0 -e unlink
Any octal value can also follow -0, indicating the corresponding ASCII character. For example, to delimit only on spaces, use -040.

If the value is -0777, then $/ is set to undef, slurping the entire file as one "line". Thus, we can wrap the entire file with a BEGIN/END marker as:

perl -0777 -pi.bak -e '$_ = "BEGIN\n$_\nEND\n"' file1 file2 file3
Here, the statement is executed three times, with $_ being the entire contents of first file1, then file2, and file3.

Note that the following command mangles the lines, because the concatenate is happening after the terminating newline:

perl -pe '$_ .= "END"' file1 file2 file3
But we can fix that with -l, which chomps each line as read, and then restores the delimiter on a print:

perl -l -pe '$_ .= "END"' file1 file2 file3
Now the $_ contains only the line without a newline, and the concatenate happens in the right place, before the newline that gets automatically added by the implicit print at the end of the implicit loop.

Well, I hope you enjoyed this brief tour through the most common Perl command-line options. You can read more at the perlrun manpage, available either as man perlrun or perldoc perlrun at your prompt. Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.