Cover V13, i01

Article
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6

jan2004.tar

Sharing Open Source Code Through the CPAN

Randal L. Schwartz

The Comprehensive Perl Archive Network (CPAN) is a wonderful place, full of contributed items for you to use, such as scripts and modules. Modules are the core of the CPAN: little building blocks for you to include into your applications.

Modules end up in the CPAN by being wrapped in a distribution. A distribution is typically a compressed tar archive, identified with a particular version number, and containing one or more modules separated out into packages. A distribution also contains installation instructions and often contains tests to validate proper build and installation.

Let's look at a sample distribution that I entered into the CPAN this past summer to see the essential parts of any distribution. As a joke, I created the Acme::Current module which gives the current year, month, and day... as constants! The module is out-dated and updated every 24 hours, using an automated script that I'll describe shortly.

The module itself is in Current.pm, shown in Listing 1. Line 1 defines the module name by switching to its package. Line 2 enables strict -- a good thing for larger program development. Lines 4 and 6 declare the package variables $VERSION, $YEAR, $MONTH, and $DAY. Although I could have done this with our rather than use vars, that would have restricted my module to installations with Perl 5.6 or later. When contributing to the CPAN, it's important to think about portability.

Line 8 provides the version number of the module. In this joke module, it happens to be the entire active code of the module as well. The values for the three constants are established using hardwired values, and then they are wrestled into a single integer by sprintf. The name $VERSION is special to Perl's require operator, and also to the indexing software used to track changes to CPAN modules. Because the indexing software is simplistic, I had to keep this line within the guidelines established in the ExtUtils::MakeMaker manpage.

The resulting version number for these constants is 20030720. Any version number greater than this supersedes this release, so it's important that the version number monotonically increase with time.

Line 10 provides the mandatory "true value" required by the require operator, on which the use operator is built. Line 12 defines the end of the executable portion of the file, as we are now beginning the POD for the module. While this line is essentially optional, it does save a bit of time in the parser, because the parser doesn't need to scan beyond the __END__ marker to find non-POD code after the POD.

The POD starts in line 14. Again, the format of POD for a CPAN distribution is a bit restricted because the automated tools are extracting key information. The NAME heading is very specific, for example. The synopsis here shows a typical use of this module. I left out the remainder of the POD for space... you can see it in the CPAN if you wish.

To create a distribution from this module, we need to create a distribution directory, and put this file beneath that directory. I called mine Acme-Current and placed this module into lib/Acme/Current.pm below that directory.

At the top of the directory, I include the next essential component -- the Makefile.PL, shown in Listing 2. This file is a Perl program that creates a Makefile that can be used to build, test, install, and create distribution archives for this particular distribution. Line 1 pulls in the ExtUtils::MakeMaker module, which defines the meaning of the rest of the file.

Lines 2 through 8 create a Makefile when executed. The parameters declare the name of my distribution (Acme::Current), which file I'm getting the distribution version number from (the only module here), and also that to properly install this module, the user must have any version of Test::More installed. It's important to list every non-core module in this list if I've used them in my module, because the automated installation tools (like CPAN.pm) can then install any necessary dependencies before continuing on to my module.

But what is Test::More? I've used it in my one test file for this distribution, which I've placed in t/01-core.t below Acme-Current, the contents of which are shown in Listing 3. Once again, it's a Perl program, using the Test::More testing harness. To be recognized as a test, the file should match the fileglob of t/*.t. If there's more than one test, they're run in alphabetical order.

Line 1 pulls in the Test::More module, declaring that I'm not prepared to define how many tests I'm running. If this had been a real module and not a joke module, I should change this to:

use Test::More tests => 4;
because at the moment, there are four tests. Of course, I'd have to change this every time I added or deleted a test, so the no_plan option was good while I was monkeying around.

The first test (in line 2) is a use_ok test, which verifies the syntax of the Acme::Current module. At this point, we should now have the three constants defined in the Acme::Current package namespace.

To test the values, I compute the current time vector using gmtime. Then, using a series of is tests, I compare the value as given by the module with the value as it should be. If the numbers are the same, the test passes. Otherwise, the test fails, and the discrepancy is noted. A failed test normally prevents the module from being installed as well, unless the user is fairly insistent (and a bit crazy).

These tests succeed as long as "today" (from a GMT perspective) is the same as the values that I set when making up the version number. If someone tries to install a stale version, the tests will fail. Obviously, for this to work, a new version must be uploaded every day, and the process to do that will be described shortly.

Besides the module itself, the Makefile.PL, and one or more test files, I also need a MANIFEST file to denote the contents of the distribution archive. My MANIFEST looks like Listing 4. It's just a list of files. Note that a comment is permitted following some delimiting whitespace. One additional file is listed there -- the README file, which should be a simple information file. The README file is extracted separately and archived in the CPAN so that users can download and examine just the README without having to unpack the entire distribution. My README is given in Listing 5.

So, now I've got the essentials of a distribution. To make a distribution archive, I issue the following commands:

perl Makefile.PL
make all test tardist
The first command creates a Makefile, which the second command uses to build the module (copying it into a staging area), run the tests on the distribution (everything matching t/*.t), and then if that all works, create a file whose name looks like Acme-Current-20030720.tar.gz. Whew!

Once we have the distribution archive, it's time to get that into the CPAN. I had already applied and received my CPAN PAUSE ID, by visiting http://pause.cpan.org. Using that same Web site, I can also insert items into my CPAN author directory, by file upload, URL fetching, or FTP transfer.

I decided that since this joke would need to be updated nightly, I would build a Perl program to update the constants in the .pm file and then submit the file using a URL fetch. By experimenting with WWW::Mechanize, I found the right combination of steps, and put that into a cron-trigger job that I call Maintain in the same Acme-Current directory, presented in Listing 6.

Lines 1 through 3 start nearly every long Perl program that I write, turning on the compiler restrictions. Lines 5 and 6 force the current directory to be the same directory in which the script is located.

Lines 8 and 9 pull in the WWW::Mechanize (found in the CPAN) and File::Copy modules. Lines 11 through 14 define the necessary paths and authentications for the CPAN upload. I use a private directory on my Web server to provide the source for the upload. My CPAN PAUSE ID is merlyn, but my CPAN PAUSE password is not shown, because it is provided on STDIN from cron for security.

Lines 16 through 26 edit the .pm file to create the proper version number and date constants. I use an in-place edit here, looking for the $VERSION assignment line by recognizing the specific pattern. Once that's complete, I can now go about the task of testing and creating a new distribution.

Line 28 performs the steps shown earlier, throwing away any output text. Because this is a joke module, I fail silently, erring on the side of just not uploading a new module. Had this been a more important module, I'd be parsing through the output text to verify that it is as expected. At this point, we've either rebuilt the same distribution archive as before, or we now have a new archive.

Line 30 cleans up some cruft that had been left over when the same version number was used twice (which is true 23 times a day). Line 31 does the actual business of attempting an upload for any new distributions now made. Each name that fits the fileglob pattern is submitted to the CPAN, using the subroutine beginning in line 33.

Line 34 saves the local file name parameter. Line 36 computes this same name in the Web server directory, and if it already exists, then we've already uploaded this file and return immediately (in line 37).

Line 39 copies this new file to the Web server directory. Line 40 creates a WWW::Mechanize object to let us talk to the PAUSE site and schedule a new upload.

Line 42 establishes my PAUSE ID and password using HTTP "basicauth". Line 44 gets the initial form that I see to log in. Line 45 follows the login link, which will look at the "basicauth" credentials, and present me with a menu of actions to take regarding my CPAN author directory.

Line 46 follows the link to upload a file. The resulting form has some very specific form elements, into which I stuff the URL that maps to my new distribution (line 47). Line 49 submits that form, and line 50 reports that a new distribution was submitted. Because this is run from cron, I get one email every day with this single text line in it.

The next step is to set up this script to run every hour. Why every hour? Because I don't want to figure out when GMT rolls over, so I let the machine just do the work every hour, and upload only when there's a new distribution. Simple, but effective. The cron entry looks like this:

## Acme::Current joke module
11 * * * * /home/merlyn/Perl/Acme-Current/Maintain%MYPASSWORD
I do this at 11 minutes past the hour every hour of every day. Why 11? Because I have different things firing from my crontab, and I try to stagger them so that they don't all start at the same time, and "11 minutes past the hour" was the next slot available.

Thus, 24 times a day, the pm file gets edited, and the Makefile.PL is transformed into a Makefile, and the distribution gets tested and an archive gets created. But only once a day is this version number different from the previous number, and then the PAUSE machinery is contacted to trigger an upload.

OK, so it's a lot of work for a joke that I eventually disabled anyway, but hey, that's what Acme is for. I hope I've shown how simple it is to create a distribution, though, and inspired you to get your open source code pieces into the CPAN. Until next time, enjoy!

Randal L. Schwartz is a two-decade veteran of the software industry -- skilled in software design, system administration, security, technical writing, and training. He has coauthored the "must-have" standards: Programming Perl, Learning Perl, Learning Perl for Win32 Systems, and Effective Perl Programming. He's also a frequent contributor to the Perl newsgroups, and has moderated comp.lang.perl.announce since its inception. Since 1985, Randal has owned and operated Stonehenge Consulting Services, Inc.