Understanding
the Command Line
Randal L. Schwartz
In past columns, I've talked a lot about the Perl language, but
have never said much about perl at the Unix shell command
line. So, let's fix that by looking at some commonly used command-line
constructs for Perl.
Let's take the simplest invocation:
perl my-script
This invokes my-script, using the relative or absolute path
to the script as given, thus not using the PATH in any way.
We can include arguments to the script:
perl my-script arg1 arg2 arg3
This sets up @ARGV to be the three individual values of arg1,
arg2, and arg3, as if we had said:
@ARGV = qw(arg1 arg2 arg3);
If we want a space within one of the values, we need to use shell
quoting rules:
perl my-script 'arg1a arg1b' arg2
This passes two arguments now, not three. We get the same result with:
perl my-script arg1a\ arg1b arg2
using a backslash to quote the space between the arguments. If there
are any shell wildcard ("glob") characters, the shell expands them
before calling our program:
perl my-script *.html
which might turn into (given three matching files):
@ARGV = qw(index.html problem.html results.html);
Note that Perl has no clue that a shell wildcard was involved here;
it's as if we had typed the three names individually.
Perl doesn't interpret the @ARGV values in any particular
way. They could be keywords, filenames, or some combination of the
two. Traditionally, leading @ARGV elements that begin with
a minus are considered "options", which we can process with modules
such as Getopt::Std or Getopt::Long.
We can also have options to Perl itself by placing leading-minus
values to the left of the script name. For example, we can invoke
the debugger by adding -d:
perl -d my-script arg1 arg2 arg3
Now, the program is run under the normal Perl debugger. We can pick
an alternate debugger (or module using the debugging interface for
other analysis) with a colon argument following the -d:
perl -d:DProf my-script
This command selects the Devel::DProf module as the alternate
"debugger", invoking a profiling of the Perl code.
Another common option ("switch") is -c, which compiles
a Perl script without executing it:
perl -c my-script
You would do this to verify that the syntax of your script is good
before actually moving it into place for production, including ensuring
that all use'ed modules were also available. Any modules loaded
at runtime (via require) or code constructed at runtime (like
eval) wouldn't be checked, however. Also, all BEGIN
and CHECK blocks are executed, so "compile only" is merely
a casual definition.
You can enable warnings on the command line with -w:
perl -w my-script
although this is more frequently handled within the program as:
use warnings;
Sometimes, your program is small enough that it makes sense to include
it entirely on the command line. Simply throw an -e switch
there instead of the filename, and you're set:
perl -e 'print "Hello world!\n"'
Note that the quoting can get a bit weird. I typically use single
quotes to keep the single argument to -e together and Perl's
double quotes within the argument for Perl quoting. Sometimes, alternate
quoting (via q// can come in handy):
perl -e 'print qq/Hello world!\n/'
Multiple -e arguments are concatenated, with only a space character
between:
perl -e print -e 'qq/Hello!\n/'
By now, the number of options is a bit hard to remember. Luckily,
Perl has a built-in help message, available with -h:
perl -h
And for a few more switches that aren't about running programs, let's
look at the version information with -v in short form:
perl -v
and in long form with -V:
perl -V
The -V switch also gives us access to the various configuration
options that Perl was built with and that it uses to compile binary
extensions and install local programs. For example, to get the C compiler
used to compile Perl:
perl -V:cc
and to get all the options related to where binaries are found or
installed:
perl -V:'.*bin'
The regular expression pattern here is in quotes so that the shell
doesn't try to expand it as a filename pattern. The output is in a
form that can be evaluated by a Bourne-style shell easily:
eval 'perl -V:'.*bin''
echo $sitebin
No attempt is made to accommodate C-shell-style shells, of course.
Modules can be included from the command line with -M:
perl -MFile::Find -e 'find sub { print $File::Find::name, $/ }, "."'
The -MFile::Find is equivalent to including:
use File::Find;
in the resulting script. If you don't want the imports (such as find
in this case), use lowercase -m or be specific with a trailing
= syntax:
perl -MFile::Find=find,finddepth -e '...'
which turns into:
use File::Find qw(find finddepth);
Note the automatic comma splitting. Nice.
For text processing from a series of one or more files, we can
add -n, which puts a wrapper around our program that looks
like:
LINE:
while (<>) {
... # rest of your program here
}
In other words, the @ARGV list is interpreted as a series of
files to be opened, and each line is placed in $_ until all
the lines are processed. To print each line with a sequential line
number in front, we can use the $. variable for the numbers:
perl -n -e 'print "$.: $_"' file1 file2 file3
We can bundle the switches that don't take arguments together with
the following switch, as in:
perl -ne 'print ...'
Another way to approach this problem is the -p, which adds
a print at the end of the loop:
LINE:
while (<>) {
... # your program here
print;
}
So, we could just substitute the line number into the beginning of
each line:
perl -pe 's/^/$.: /' file1 file2 file3
Going one step further, we could rewrite these modified lines back
into the original files with the "inplace edit" switch: -i:
perl -i.bak -pe 's/^/$.: /' file1 file2 file3
Now, file1 will be renamed file1.bak, and the new updated
contents written to a new file1. Similarly, file2 becomes
file2.bak, and file3 becomes file3.bak.
If you leave off the option to -i, the "inplace edit without
backup file" mode is enabled, which can save space but gives you
no way to go back if you've toasted your files. Be very careful.
The line-looping modes (-n and -p) respect the current
value of $/ to read a "line", which defaults to \n.
However, you can specify alternate values with the -0 (that's
a zero) switch. By default, -0 sets the delimiter to the
NUL byte, which can be handy with GNU find's -print0 switch
(which delimits the filenames with NUL bytes):
find . -name '*.html' -print0 | perl -n -0 -e unlink
Any octal value can also follow -0, indicating the corresponding
ASCII character. For example, to delimit only on spaces, use
-040.
If the value is -0777, then $/ is set to undef,
slurping the entire file as one "line". Thus, we can wrap the entire
file with a BEGIN/END marker as:
perl -0777 -pi.bak -e '$_ = "BEGIN\n$_\nEND\n"' file1 file2 file3
Here, the statement is executed three times, with $_ being
the entire contents of first file1, then file2, and
file3.
Note that the following command mangles the lines, because the
concatenate is happening after the terminating newline:
perl -pe '$_ .= "END"' file1 file2 file3
But we can fix that with -l, which chomps each line as read,
and then restores the delimiter on a print:
perl -l -pe '$_ .= "END"' file1 file2 file3
Now the $_ contains only the line without a newline, and the
concatenate happens in the right place, before the newline that gets
automatically added by the implicit print at the end of the
implicit loop.
Well, I hope you enjoyed this brief tour through the most common
Perl command-line options. You can read more at the perlrun
manpage, available either as man perlrun or perldoc perlrun
at your prompt. Until next time, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration, security,
technical writing, and training. He has coauthored the "must-have"
standards: Programming Perl, Learning Perl, Learning
Perl for Win32 Systems, and Effective Perl Programming.
He's also a frequent contributor to the Perl newsgroups, and has
moderated comp.lang.perl.announce since its inception. Since 1985,
Randal has owned and operated Stonehenge Consulting Services, Inc. |