Article

sep2004.tar

Strictly Speaking about "use strict"

Randal L. Schwartz

In many of my writings about Perl, I give the strong admonition to place use strict at the beginning of the program. I've often explained the line with a few short phrases, but I thought it would be interesting to focus on this one construct in detail for a change.

The use strict line is a pragma. The purpose of a pragma is to regionally or globally alter the way the language is translated for execution. For the strict pragma, we get three sub-features enabled or disabled within a particular program scope. The scope extends to the end of the curly-brace-delimited block in which the pragma appears or to the end of the file if otherwise outside all blocks. Inner pragma controls override outer controls, so we can get as specific as needed to process a particular chunk of code.

The use strict pragma has three aspects: vars, subs, and refs. Each aspect may be enabled or disabled individually by explicit name but, most often, all three are enabled at once with a simple use strict. For example, we can enable all three aspects initially and disable just the vars aspect for a portion of the code, like so:

use strict; # all enabled
...
sub marine {
  no strict 'vars'; # disable vars
  ...
}
# all enabled again

The vars aspect is probably the most useful of the three aspects and is the one most likely to give trouble to a beginner. Scalar, array, and hash variables are mapped into package and lexical variables using one of five methods. The vars aspect disables one of these methods, leaving the remaining four enabled.

For example, the variable $bammbamm might be referring to a lexical variable named $bammbamm, introduced earlier in the same scope through the use of the my declaration, as in:

my $bammbamm = 5;
...
print $bammbamm; # lexical $bammbamm in scope

Or, it might be a package variable declared earlier by use vars in the same package, such as:

package This::One;
use vars qw($bammbamm);
...
print $bammbamm; # same as $This::One::bammbamm
...
package That::One;
# $bammbamm no longer legal here

The variable name might also be declared through the our declaration, which associates a simple name with a package variable in the current package for the remainder of the scope. For example:

package This::One;
sub nominal {
  our $bammbamm; # $bammbamm is $This::One::bammbamm
  ...
  package That::One;
  print $bammbamm; # still prints $This::One::bammbamm
}
# $bammbamm is no longer permitted

Or, if the name contains a package delimiter (double colon), it's an explicit use of a package variable:

package This::One;

print $This::One::bammbamm; # always permitted

Finally, the variable $bammbamm may be just a package variable in the current package, if no prior declaration exists:

package This::One;
print $bammbamm; # $This::One::bammbamm;
package That::One;
print $bammbamm; # $That::One::bammbamm;

It is this particular method that is disabled by use strict, because it can lead to the most errors in larger programs. By default, any mention of any simple scalar, array, or hash name is simply accepted as a package variable in that package, even if the name is a typo!

By enabling use strict 'vars', the troublesome automatic acceptance of any variable name is prevented, forcing you to declare your variables through one of the other methods. This isn't all that important on a five-line program, but I have rarely seen any program stay at only five lines unless it was a one-off task.

The subs aspect of use strict disables the interpretation of "barewords" as text strings. By default, a Perl identifier (a sequence of letters, digits, and underscores, not starting with a digit unless it is completely numeric) that is not otherwise a built-in keyword or previously seen subroutine definition is treated as a quoted text string:

@daynames = (sun, mon, tue, wed, thu, fri, sat);

However, this is considered a dangerous practice, because obscure bugs may result:

@monthnames = (jan, feb, mar, apr, may, jun,
               jul, aug, sep, oct, nov, dec);

Can you spot the bug? Yes, the 10th entry is not the string oct, but rather an invocation of the built-in oct() function, returning the numeric equivalent of the default $_ treated as an octal number. And if you wrote this program in April, you might not even notice that it breaks for six months. I'm not saying that this has happened to anyone I know, because I believe I'm protected from self-incrimination.

Although the problem arises mostly from collisions with built-in words, simply watching for built-ins is insufficient. Suppose we added a sun function earlier in the same scope:

sub sun { ... }

Now our first day name is also messed up, being a call to the subroutine instead of the three-character string. But it's not sufficient to simply scan in the source text for a same-named subroutine. The name can also be imported from other code by one of the earlier use directives!

So, the proper method out of this madness is to avoid the use of "bare" words in most circumstances. This list of day names can be created easily with qw() instead:

my @daynames = qw(sun mon tue wed thu fri sat);

And now there's no possibility of conflict, because we're using a quoted string instead of a bareword. The nifty part is that use strict 'subs'> (included as part of use strict) takes care of enforcing this automatically. Once enabled, barewords will be flagged while the program is being parsed, before execution even begins.

Note that barewords are still permitted in a few specific locations. For example, the key to a hash can always be specified as a bareword:

my $age = $data{age}; # same as $data{"age"}

Also, the left side of a "fat arrow" is also automatically quoted if it resembles a bareword:

my %data = (age => 19); # same as ("age", 19)

These two automatic quotings make working with hashes with program-significant keys easier, presuming the keys you choose are all barewords.

Finally, a pre-declared subroutine can be treated as a subroutine call, even if the definition of the subroutine had not yet been seen:

sub deeper; # declaration
...
my $result = deeper;

I don't recommend this practice, since it is just as easy (and clearer) to follow the subroutine call with empty parens:

my $result = deeper();  # no declaration needed

The final aspect of the use strict pragma is the disabling of soft references (or symbolic references). Normal references (sometimes called hard references to distinguish them from soft references) come from an explicit referencing operation:

my $ref = \@foo;  # now $ref is a reference to @foo

or from one of the anonymous reference constructors:

my $ref2 = [3, 4, 5];  # array reference created

An auto-vivification will also create a hard reference:

my $ref3; # variable is undef initially
$ref3->[5] = 10; # $ref3 is now an array reference

Following this reference using a dereferencing operation gets us back to the original data:

print $ref2->[2]; # prints 5, from the anon array

However, the dereferencing operation can also be performed against a simple scalar string:

my $sref = "happy";
$sref->[3] = "hello"; # symbolic reference

This dereferencing is performed at execution time. Perl looks up the value to be dereferenced, notes that it is not a hard reference, and then examines the package variable symbol table for a same-named variable. Because package variables spring into existence as needed, nearly any name in $sref will be considered legal, causing new variables to be created dynamically.

As if that weren't already scary enough, the variable name does not need to be a standard Perl identifier. Any string will do:

my $sref = "A [variable] {name} !normally! *illegal*";
$$sref = 12;

We now have a scalar package variable in the current package with a very crazy name.

Because of the likelihood of an accidental symbolic dereference operation, the use strict 'refs' aspect is recommended for every program that uses references.

If all three of these restrictions are good, why are they not enabled by default? The answer is "backward compatibility". Perl version 4 (last updated more than a decade ago) permitted casual variable naming (and didn't have any option for lexically declared variables), didn't have the convenient qw() for defining lists of short values, and used soft references for indirect subroutine invocation. Thus, adding use strict by default would have broken nearly every Perl version 4 program!

But Perl4 is now long dead. Be sure to use strict in your modern Perl5 programs, and you'll get a guaranteed reduction in development time or double your money back! Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.