The
Magic of mod_perl
Frank Wiles
I often run into people who are confused about what mod_perl is.
Some people think mod_perl is only useful to speed up CGI scripts.
Some oddly believe it to be a heretical and incompatible version
of the Perl programming language, while others are very confused
and think mod_perl is just another fancy way of saying Perl/CGI.
Fortunately, all of them are wrong. With the upcoming release of
mod_perl 2.0 (mod_perl 2.0 is very near release as of the writing
of this article and likely will be released before this is published),
I wanted to better explain mod_perl.
In short, mod_perl embeds a Perl interpreter directly into your
Apache Web server. It does have the wonderful added benefit of speeding
up your CGI scripts, but that is just a taste of its power. The
real power of mod_perl is the ability for you to directly use all
of the Apache API from Perl. This is also what sets mod_perl apart
from other similar technologies such as PHP (mod_php) and Python
(mod_python), which only allow you to control the content or response
phase of the Apache server.
When a browser requests a page from an Apache server, the request
goes through several processing phases. Some are related to access,
authentication, logging, but the most common is the response phase.
The response phase is what you work with when building a Perl CGI
or a PHP script. It is the part of the process that generates the
actual HTML page and returns it to the browser.
The power of mod_perl is that it gives you the ability to replace
the default behaviors of any of these phases with your own phase
handlers. mod_perl handlers can be thought of as true Apache modules,
plugged directly into the server, rather than a script or other
outside process. Each handler is a different Perl module that you
have written to deal with a particular phase. This can also help
code reuse by allowing you to share a particular logging or authentication
handler on many different sites or servers without having to alter
any other aspects of the Apache process.
Here are some examples of mod_perl's abilities:
- Log all requests to http://www.domain.com/admin/ into
a SQL database, thus capturing the information we care about and
still logging the rest of the site to the normal Apache log file
for traffic analysis.
- Replace Apache's flat file Basic auth with a SQL database that
controls access both by username/password as well as by date and
time of the request. This could be used to allow employees access
to an application only during office hours.
- mod_perl gives you the ability to configure your Apache Web
server with Perl code. Use this to easily configure a large amount
of VirtualHosts by querying a database instead of manually configuring
them in httpd.conf.
- Apache filters allow you to filter the output of any flat file,
a script written in another language, or even another mod_perl
handler before sending the page on to the browser. This can be
used to clean up the output of a legacy system without having
to modify the original code.
- Because Apache 2.0 is protocol agnostic, you can even make
your server speak protocols other than HTTP. An example of this
is to build a SMTP server in mod_perl and a corresponding Web
application to control how it operates.
Installation Installing mod_perl is a relatively easy task.
If you are using a recent Linux distribution, you may have it installed
already. mod_perl 2.0 does have some prerequisites, namely a recent
Perl and Apache 2.x. If both of these are already installed, all
that is required is downloading the mod_perl source code from http://perl.apache.org/download/
and issuing the following commands:
# tar -xvzf mod_perl-2.x.xx.tar.gz
# cd mod-perl-2.x.xx
# perl Makefile.PL MP_APXS=/path/to/apxs MP_INST_APACHE2=1
# make
# make test
# make install
Once mod_perl is installed, you will need to configure it in your
httpd.conf by adding the following configuration options:
LoadModule perl_module modules/mod_perl.so
PerlRequire /path/to/perl/libs/startup.pl
The startup.pl script allows you to set up your @INC library path
and preload any modules that you want shared among your Apache server
children. For the following examples, I'm using the minimal startup.pl:
use lib qw(/path/to/perl/libs);
use Apache2;
1;
mod_perl and CGI
mod_perl speeds up your CGI scripts by getting rid of the infamous
"fork, compile, execute" problem. When running a normal CGI, the Apache
Web server forks a Perl interpreter, which in turn compiles and executes
the Perl source. With normal CGIs, this process is repeated for each
request made to the CGI. We remove the expensive forking step with
mod_perl by having an embedded interpreter inside of our Web server.
However, mod_perl also will compile the Perl source on server startup
and keep it in memory. This leaves only the actual execution of the
code on each request.
As I'm sure you can imagine, this greatly increases the speed
of most CGIs, often as much as 100 times their original speed. To
configure this for all of your Perl CGIs in the directory /modperl/,
simply add this to your Apache's httpd.conf:
<Location /modperl/>
SetHandler perl-script
PerlResponseHandler Modperl::Registry
PerlOptions +ParseHeaders
Options +ExecCGI
</Location>
This instructs mod_perl to compile the Perl scripts in the /modperl/
directory -- once for each Apache child -- and store it in memory.
The script edited on disk mod_perl is smart enough to recompile on
the next request to reflect your changes.
URL Transformation
One challenge that faces many Web site administrators is that
of reworking your file system layout without breaking existing bookmarks
and deep links into your site. Sometimes these can be fixed with
redirects or mod_rewrite rules, but these can quickly become unwieldy
on large sites.
Suppose you run a news Web site where your articles are stored
into a different directory each day and you want to change this
layout slightly. You want to be able to change requests for:
http://www.example.com/20041106/article-title.html
into something like:
http://www.example.com/archive/2004/11/06/article-title.html
on the fly. A URI is mapped to filenames in the TransHandler phase
of the Apache life cycle. Using mod_perl, we can easily change the
default behavior with a translation handler like this:
package My::LayoutChanger;
use strict;
use warnings;
use Apache::RequestRec ();
use Apache::Const -compile => qw(DECLINED);
sub handler {
my $r = shift;
# See if the requested URI follows our old style of having
# an eight digit directory in the form of /YYYYMMDD/
if( $r->uri =~ m|^/\d{8}/|o ) {
# Extract the parts of the date and the filename from the
# requested URI
my ($year, $month, $day, $file) =
$r->uri =~ m|^/(\d\d\d\d)(\d\d)(\d\d)/(.*?)$|o;
# Replace the URI transparently
$r->uri("/archive/$year/$month/$day/$file");
}
# Return DECLINED so that other trans handlers can be
# called if necessary
return( Apache::DECLINED );
}
1;
You configure this in Apache's httpd.conf with the following directives:
PerlModule My::LayoutChanger
PerlTransHandler +My::LayoutChanger
A handler like this could easily be converted to handle multiple site
redesigns in the same module, or you can stack the handlers so that
each new filesystem layout is a different Perl module, each handling
a different set of URI rewrites. You can also map an existing static
HTML site into a new dynamic application by building the URI and the
HTTP query string with $r->args.
Dynamic Content
One of mod_perl's most interesting features is I/O filtering. Filtering
can be used to modify static files on the fly or even the output
of another program. If, for example, you would like to automatically
add the last modified date and time to the bottom of all of the
pages in a particular directory, you would use a filter much like
this:
package My::Filter;
use strict;
use warnings;
use base qw(Apache::Filter);
use APR::Finfo (); # For file information
use APR::Table (); # For $f->$r->headers_out->unset
use Apache::Const -compile => qw(OK);
use constant BUFFER => 1024;
sub handler {
my $f = shift; # Our filter object
my $r = $f->r; # Our Request object
my $finfo = $r->finfo; # Our file info
# Convert last modified time into a human readable format
my $time = localtime($finfo->mtime);
# Unset our Content-Length header since we will be changing
# the content's length.
unless( $f->ctx ) {
$f->r->headers_out->unset('Content-Length');
$f->ctx(1);
}
# Read the file 1024 bytes at a time
while( $f->read(my $buf, BUFFER) ) {
# Replace the closing BODY tag with our last modified
# date and time
if( $buf =~ /<\/BODY>/i ) {
$buf =~ s/<\/BODY>/Last modified: $time<\/BODY>/i;
}
$f->print($buf);
}
return( Apache::OK );
}
1;
To configure this filter for the /content/ directory, you would add
the following to your httpd.conf:
PerlModule My::Filter
<Directory /content/>
SetHandler modperl
PerlOutputFilterHandler My::Filter
</Directory>
Conclusion While mod_perl gives
you easy access to tweaking how your Apache server behaves, it also
proves to be an efficient and scalable platform on which to build
large Web sites and enterprise applications. mod_perl is used for
such large traffic sites as slashdot.org, livejournal.com, and ticketmaster.com.
I've been building LAMP (Linux Apache Mod_perl PostgreSQL) applications
with mod_perl for years and have never felt limited by it.
I hope this introduction to mod_perl 2.0 has piqued your interest.
The mod_perl homepage (http://perl.apache.org) offers a large
collection of online documentation, links to print resources, and
the user's mailing list, which can help you take full advantage
of mod_perl.
If you get stuck solving a mod_perl problem, send a question to
the mod_perl users mailing list, which often provides answers within
a few short minutes. If you don't mind a slower response time, feel
free to email me directly (frank@revsys.com).
Frank Wiles is the IT Manager for Sunflower Broadband in Lawrence,
Kansas (http://www.sunflower.com). He also does systems administration
and programming consulting via his company Revolution Systems, LLC.
Frank can be reached at: frank@revsys.com. |