Lightweight
Persistent Data
Randal L. Schwartz
Frequently, you have data with a strong will to live. That is,
your data must persist between invocations of your program and occasionally
even be shared between simultaneous invocations.
At the high end of this demand, we have entire companies devoted
to creating high-performance, multi-user, SQL-interfaced databases.
These databases are usually accessed from Perl via the DBI
package, or by some wrapper slightly above DBI, such as Class::DBI
or DBIx::SQLEngine. The details of SQL might even be entirely
hidden away using a higher level package like Tangram or
Alzabo.
But further down the scale, there are some new solutions popping
onto the scene, which invite further observation, as well as some
old classic solutions. For example, since Perl version 2 we've been
able to put a hash out on disk with dbmopen:
dbmopen(%HASH, "/path/on/disk", 0644) || die;
$HASH{"key"} = "value";
dbmclose(%HASH);
The effect of such code is that we now have a key/value pair stored
in an external structured file. We can later come along and reopen
the database as a hash again, and treat it as if it were a hash with
preexisting values:
dbmopen(%HASH, "/path/on/disk", 0644) || die;
foreach $key (sort keys %HASH) {
print "$key => $HASH{$key}\n";
}
dbmclose(%HASH);
While the interface was relatively simple, I wrote quite a few programs
before Perl5 came around using this storage mechanism for my persistence.
However, this storage suffered some limitations: the keys and values
had to be under a given size, access to the structure could not handle
multi-user reads and writes, and the resulting data files were not
necessarily portable to other machines (because they used incompatible
libraries or byte orders).
When Perl5 came long, new problems arose. No longer were we limited
to just arrays and hashes, but we could now have complex data types
with arbitrary structure. Luckily, the mechanism "behind" the dbmopen
was made available directly at the Perl code level, through the
tie operator, described in the perltie manpage. This
let others besides Larry Wall create "magical" hashes that could
perform actions on every fetch and store.
One early use of the tie mechanism was the MLDBM
package, which could take a complex value to be assigned for a given
key, and serialize it to a single string value, which could
then be stored much like before. For example:
use MLDBM;
tie my %hash, 'MLDBM' or die;
$hash{my_array} = [1..5];
$hash{my_scores} = { fred => 205, barney => 195, dino => 30 };
As each complex data structure was stored into the hash, it was converted
into a string, using Data::Dumper, FreezeThaw, or Storable.
If a value were fetched, it would be converted back from a string
to the complex data structure. However, the resulting value was no
longer related to the tied hash. For example:
my $scores = $hash{my_scores};
$scores->{fred} = 215;
would no longer affect the stored data. Instead, we got warnings on
the MLDBM manpage to "not do this". Also, we still had all
the limitations of a standard dbmopen-style database: size
limits, multi-user access, and non-portability.
One solution that I used on more than one occasion was to take
over the serialization myself, and to use Storable's retrieve
and nstore operations directly. My code would look something
like:
use Storable qw(nstore retrieve);
my $data = retrieve('file');
... perform operations with $data ...
nstore $data, 'file';
Now my $data value could be an arbitrarily complex data structure,
and any changes I made would be completely reflected in the updated
file. The result was that I simply had a Perl data strucure that persisted.
It appears that the author of Tie::Persistent had the same
idea to use Storable on the entire top-level structure as
well, except with a tie wrapper instead of explicit fetch-store
phases, although I can't vouch for the code. In fact, I see a number
of CPAN entries that all seemed to find similar mechanisms, but
none of them seemed to have found the "holy grail" of object persistence,
making it as absolutely transparent as possible in a nice portable
(and hopefully multi-user) manner.
That is, until I noticed DBM::Deep. According to the Changelog,
this distribution has been around for about two years (as I write
this), but only on the CPAN for a few months. From its own description:
A unique flat-file database module, written in pure perl. True
multi-level hash/array support (unlike MLDBM, which is faked), hybrid
OO / tie() interface, cross-platform FTPable files, and quite fast.
Can handle millions of keys and unlimited hash levels without significant
slow-down. Written from the ground-up in pure perl -- this is NOT
a wrapper around a C-based DBM. Out-of-the-box compatibility with
Unix, Mac OS X and Windows.
And with a promotional paragraph like that, I just had to look.
It looks simple enough. I merely say:
use DBM::Deep;
my $hash = DBM::Deep->new("foo.db");
$hash->{my_array} = [1..5];
$hash->{my_scores} = { fred => 205, barney => 195, dino => 30 };
And that's it. In my next program:
use DBM::Deep;
my $hash = DBM::Deep->new("foo.db");
$hash->{my_scores}->{fred} = 215; # update score
And finally, retrieving it all:
use DBM::Deep;
my $hash = DBM::Deep->new("foo.db");
print join(", ",@{$hash->{my_array}}), "\n";
for (sort keys %{$hash->{my_scores}}) {
print "$_ => $hash->{my_scores}->{$_}\n";
}
which prints:
1, 2, 3, 4, 5
barney => 195
dino => 30
fred => 215
And, in fact, that all just plain worked. I'm impressed. We've avoided
the MLDBM problem, because the update to the nested data worked. And,
there's no dependency on traditional DBMs here, so there's no size
limitation or byte ordering, or even the need for a C compiler to
install.
I'm told, although I haven't tested it, that I can also add:
$hash->lock;
... do some shared things ...
$hash->unlock;
and thereby access shared data in multiple processes.
There also seems to be some cool stuff around encrypting or compressing
the data as well. This definitely bears further examination.
The limitations of DBM::Deep seem rather expected. Because
this is a single data file, it's being locked using flock,
so we can't persist data for multiple users across machines or reliably
across NFS. Also, we have to clean up after ourselves from
time to time by calling an optimize method -- otherwise,
unused space starts accumulating in the database.
One other recent addition to the CPAN also caught my eye -- OOPS.
Unlike DBM::Deep, OOPS uses a DBI-style database (currently
only compatible with PostgreSQL, MySQL, and SQLite) for its persistent
store. However, like DBM::Deep, once a connection is made,
you pretty much do anything you want with the data structure, and
it gets reflected into the permanent storage. The database tables
are created on request, and managed by the module transparently.
The basic mode of OOPS looks like:
use OOPS;
transaction(sub {
OOPS->initial_setup(
dbi_dsn => 'dbi:SQLite:/tmp/oops',
username => undef, # no matter with SQLite
password => undef, # ditto
) unless -s "/tmp/oops";
my $hash = OOPS->new(
dbi_dsn => 'dbi:SQLite:/tmp/oops',
username => undef, # no matter with SQLite
password => undef, # ditto
);
$hash->{my_array} = [1..5];
$hash->{my_scores} = { fred => 205, barney => 195, dino => 30 };
$hash->{my_scores}->{fred} = 215; # update score
$hash->commit;
});
The wrapper of transaction forces this update to all be within
a single transaction. We fetch the data similarly:
use OOPS;
transaction(sub {
my $hash = OOPS->new(
dbi_dsn => 'dbi:SQLite:/tmp/oops',
username => undef, # no matter with SQLite
password => undef, # ditto
);
print join(", ",@{$hash->{my_array}}), "\n";
for (sort keys %{$hash->{my_scores}}) {
print "$_ => $hash->{my_scores}->{$_}\n";
}
});
And, in fact, this retrieved exactly the values I had expected. I'll
be exploring these two modules in greater depth in the future, and
until then, enjoy!
Randal L. Schwartz is a two-decade veteran of the software
industry -- skilled in software design, system administration, security,
technical writing, and training. He has coauthored the "must-have"
standards: Programming Perl, Learning Perl, Learning
Perl for Win32 Systems, and Effective Perl Programming.
He's also a frequent contributor to the Perl newsgroups, and has
moderated comp.lang.perl.announce since its inception. Since 1985,
Randal has owned and operated Stonehenge Consulting Services, Inc. |