Cover V12, I05

Article

may2003.tar

Variables and Scoping

Randal L. Schwartz

Programs push data around. In Perl, this data lives in variables, and the variables can be associated with various scopes. Let's take a look at Perl's peculiar scoping rules.

To begin, let's define a term, lexical scope. A lexical scope provides the boundaries of some property of the program associated with the text of the program itself, as opposed to properties that are associated with the runtime state of the program. Lexical properties might include variable declarations, compiler directives, exceptions being caught, and so on.

In Perl, the largest lexical scope is the source file itself. Lexically scoped items never affect anything larger than a file. Additionally, nearly all blocks also introduce a nested lexical scope that ends where the block ends. Because blocks are nested and not overlapping, the lexical scopes also nest. This will become clearer in the examples that follow.

Some of the variables in a Perl program are package variables (also called symbol-table variables). A package variable's full name consists of a package prefix followed by the specific identifier for the variable. The prefix is separated from the identifier by a double colon.

For example, in $Animal::count, Animal is the package prefix, while count is the variable within the package. Both the package and the identifier contain one or more alphanumerics and/or underscores. Additionally, packages can have multiple, double-colon separated parts, as in $Animal::Dog::count. Again, count is the variable, and Animal::Dog is the package prefix. There's no necessary relationship between Animal and Animal::Dog, although people tend to give related names to related packages.

Although package variables are formally named with colons, you won't see many colons in most uses of package variables. That's because by default variable name without colons are automatically placed into the current package. The initial current package is package main, so the following two code snippets are identical:

print "What is your name? ";
chomp($input = <STDIN>);
$length = length $input;
print "Your name $input is $length characters long.\n";
and

print "What is your name? ";
chomp($main::input = <STDIN>);
$main::length = length $main::input;
print "Your name $main::input is $main::length characters long.\n";
It's a good thing we don't have to have main all over the place.

So, why do we have packages, if everything already defaults to package main? Well, it's so that we can have multiple portions of code brought together into one program. Suppose the code above were to be added into a program that already had a meaning for $main::input or $main::length. We'd have a collision of names. But we can fix that by using a different package prefix:

print "What is your name? ";
chomp($Query::input = <STDIN>);
$Query::length = length $Query::input;
print "Your name $Query::input is $Query::length characters long.\n";
Now $Query::input has nothing to do with $main::input, so we no longer have a naming collision. Of course, this is a lot of typing, and we can shorten this by changing the current package, using the package directive:

package Query;
print "What is your name? ";
chomp($input = <STDIN>);
$length = length $input;
print "Your name $input is $length characters long.\n";
Wow, that's easier to type, and yet the $input variable there is really $Query::input, and won't conflict with $main::input used elsewhere.

The package directive is lexically scoped (thought we forgot about that term, eh?). This means that the package directive stays in effect until the end of the current scope, or until another package directive changes the current package again. For example, we could put that piece of code into the middle of the rest of our program as:

# initial package main
...
$input = "Hey"; # $main::input

package Query;  # now in package Query
print "What is your name? ";
chomp($input = <STDIN>);  # $Query::input
$length = length $input;
print "Your name $input is $length characters long.\n";

package main;  # back to package main

print $input;  # $main::input again
print "that length was $Query::length\n"; # reference prior value
However, we have to remember to reset the package back to what it was before. This is error-prone, and perhaps not easy to maintain, especially if we're not sure what the prior package might be. But, since the package directive is lexically scoped, we can introduce a block to limit the directive's influence:

# initial package main
...
$input = "Hey"; # $main::input

{ # start scope
  package Query;  # now in package Query
  print "What is your name? ";
  chomp($input = <STDIN>);  # $Query::input
  $length = length $input;
  print "Your name $input is $length characters long.\n";
} # end scope

# automatically back to package main

print $input; # $main::input again
print "that length was $Query::length\n"; # reference prior value
Ahh, that's a bit simpler.

As that last example showed, we can access any package variable from any location in our program, much as we can spell out the full path to any accessible file in a UNIX filesystem regardless of our current directory, even though the files at or below the current directory are easier to type. But these global variables can lead to global headaches, since we can't really know at a glance about all the code that can examine or modify the variable.

Like most modern programming languages, Perl also includes the notion of a lexical variable. Lexical variables do not belong to a package, so they cannot be referenced outside the lexical scope in which they are declared. Their names also cannot contain colons, because they do not have a package prefix.

Lexical variables are introduced with the my keyword:

print "What is your name? ";
chomp(my $input = <STDIN>);  # lexical $input
my $length = length $input; # lexical $length
print "Your name $input is $length characters long.\n";
Because these variables are introduced outside any block in this example, they are lexically scoped to the file in which they appear. If this code is part of a file being included with eval, do, require, or use, there's no chance that this $input will conflict with any other use of $input. There's also no syntax that would let any other code outside of this code access those variables, so we can be assured that our variables won't be changing mysteriously.

Besides file-scoped lexical variables, another common appearance is in the block that belongs to a subroutine:

sub get_name_length {
  print "What is your name? ";
  chomp(my $input = <STDIN>); # lexical $input
  my $length = length $input; # lexical $length
  print "Your name $input is $length characters long.\n";
}
When the subroutine returns, the lexical variables are discarded, automatically recycling the memory that had been used. Additionally, any outer declaration of $input or $length is temporarily shadowed within the subroutine, protecting the outer variables from accidental alteration.

We can also create temporary variables this way:

{ # start temporary scope
  print "What is your name? ";
  chomp(my $input = <STDIN>); # lexical $input
  my $length = length $input; # lexical $length
  print "Your name $input is $length characters long.\n";
} # end temporary scope
The variables declared and used in this block will be recycled at the end of the block, just as if we had placed this code into a subroutine.

A frequent admonition in the Perl literature is "Always use strict!". What does this do, precisely? Well, among other things, use strict disables the automatic prepending of the package to a variable name. Once use strict is in effect, a name without colons must have been declared, either as a lexical variable, or as a specially noted package variable.

The primary purpose of use strict is to catch any random erroneous variations of a variable name:

print "What is your name? ";
chomp($input = <STDIN>); # $main::input
my $length = length $input; # $main::length
print "Your name $input is $lenth characters long.\n"; # broken
Oops! That's $main::lenth, not $main::length. But by turning on use strict, we no longer get main:: in front of anything we mention, and thus we must declare the variables lexically at first use instead:

use strict;

print "What is your name? ";
chomp(my $input = <STDIN>); # lexical $input
my $length = length $input; # lexical $length
print "Your name $input is $lenth characters long.\n"; # caught
The compiler will abort at that last line, because we can't just turn $lenth into $main::lenth any more.

To refer to package variables, we can simply use the full prefix-included colon name:

use strict;

print "$Animal::Dog::count dogs were seen!\n";
print "$main::length characters in that name.\n";
If we want to refer to a package variable without the package prefix, we can use the use vars compiler directive:

use strict;

use vars qw($length); # now permits $length to mean $main::length

print "$length characters\n"; # $main::length
Any name in the use vars list can be referenced in the current package as if it were fully specified. Once seen, the directive is in effect for that variable name as long as the current package is the same as the package in which the use vars appeared. So, this is an error:

use strict;
use vars qw($length); # $length is $main::length in main

{ package Query;
  print $length; # COMPILE ERROR... $Query::length not permitted
}

print $length; # would have been ok, back to $main::length
In recent versions of Perl, the our keyword was introduced as a parallel to my. It functions similarly to use vars, but the declaration of the package variable is lexically scoped, not dependent on the current package.

use strict;
our $length; # $length is $main::length in this scope

{ package Query;

  our $input; # $input is $Query::input in this scope

  print $length; # permitted access to $main::length here
  print $input; # permitted access to $Query::input here

} # end of scope, so $input goes out of scope

print $length; # still $main::length
print $input; # COMPILE ERROR, no access to $main::input permitted
As you can see, use vars and our are not precisely the same thing, but in general, they both serve to permit selected package variables to be used without colons.

I hope this brief overview of package and lexical variables and scoping has been useful. Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.