Creating
Informix HPL Jobs in Scripts
Ed Schaefer and John Spurgeon
If you are administering Informix OnLine databases containing
very large tables, the High Performance Loader (HPL) bundled with
the engine can be a godsend. Loading or unloading a very large table
using the HPL is usually much faster than executing the Informix
load or unload commands from a utility such as isql or dbaccess.
The HPL is also faster than a typical application program that uses
embedded SQL.
The procedure recommended by the Informix manuals for creating
HPL jobs involves manual interaction with a GUI utility. We present
an alternative to this approach in the form of a shell script called
makehpl.ss.
Anatomy of the HPL Utility
According to the Guide to the High-Performance Loader, there are
three major parts to the High Performance Loader:
1. The onpload utility executes the load or unload jobs. In contrast
to building a job, it's easy to run a job from a script or
at the command-line using the onpload command.
2. The ipload utility is the GUI program used to "prepare
the parameters" that the onpload utility uses when executing
a load or unload job. In other words, the ipload utility is used
to build HPL jobs.
3. The onpload database stores the parameters configured via the
ipload utility. If the database doesn't exist, the ipload utility
will create it. When executing a job, the onpload utility retrieves
data from the database.
Key Tables in the Onpload Database
To build an HPL job, makehpl.ss populates the session, device,
and query tables in the onpload database. The session table controls
the parameters that onpload uses to invoke a load or unload job.
The device table defines the elements of a "device array".
Three device types are supported: tape devices, file names, and
pipe commands. Our script supports only file names. The query table
stores the queries that are used for unloading data from an Informix
database.
Template Files and Command-Line Options
When using makehpl.ss to build a job, the session, device, and
query tables are populated using three template files: sessiontemplate.txt,
devicetemplate.txt, and querytemplate.txt. The template files are
pipe-symbol delimited files. A sed command replaces variables
in the template files with values passed to the script via command-line
arguments. The Informix isql/dbaccess utility uses the output from
the sed command to populate the three key tables in the onpload
database.
The variables in the template files are:
TABLE_NAME: This is a name of a table to be loaded or unloaded.
By convention, it is also the name used to identify a particular
job.
DATABASE: This is the name of the database where the TABLE_NAME
resides.
SELECT_STMT: This is a select statement created by TABLE_NAME,
select * from TABLE_NAME, used to identify the rows to be
loaded or unloaded. An optional "where clause" passed
as a command-line argument can be appended to the select statement.
DEVICE_NUMROWS: This argument is the number of data files used
during the load and unload process. The default is one; more on
this below.
The session template file contains the following two records:
U|A|N|0|TABLE_NAME| ||TABLE_NAME|TABLE_NAME||DATABASE||||/tmp/
TABLE_NAME.rej|/tmp/TABLE_NAME.log|0|0|0|0|0|0|0|4|0|0|0|0|2|1|0|0|0|0|
U|A|N|0|TABLE_NAME| ||TABLE_NAME|TABLE_NAME||DATABASE||||/tmp/
TABLE_NAME.rej|/tmp/TABLE_NAME.log|0|0|0|0|0|0|0|4|0|0|0|0|2|2|0|0|0|0|
The reject file and the log files are named /tmp/TABLE_NAME.rej and
/tmp/TABLE_NAME.log, respectively.
The device template file contains the following record:
TABLE_NAME|DEVICE_NUMROWS|FILE|
DIRPATHTABLE_NAME_DEVICE_NUMROWS.unl|0|0||N||
A while loop creates the number of records specified by the value
of DEVICE_NUMROWS. Each record has a unique file name created by the
script. For example, if the DEVICE_NUMROWS is 3, the device file might
look like this:
event_table|1|FILE|/tmp/event_table_1.unl|0|0||N||
event_table|2|FILE|/tmp/event_table_2.unl|0|0||N||
event_table|3|FILE|/tmp/event_table_3.unl|0|0||N||
When executing a load or unload, data may be unloaded to or from the
filenames listed in column 4. More device files present an opportunity
for parallel processing. However, the HPL utility itself determines
whether to use all of the available device files.
The following is the contents of the query template file:
0|0|TABLE_NAME|DATABASE||N|SELECT_STATEMENT|
Installing the Script
The variable PROGRAM_LOCATION identifies the directory where the
template files and the resulting load files are located. Be sure
to set the value of this variable to the appropriate path when you
install the files (see "What's in the tarball").
The DIRPATH variable points to the directory where the load/unload
file resides. Obviously, the file system should be large enough
to hold the file. By default, DIRPATH is set to /tmp:
DIRPATH="\/tmp\/" # directory to the file
Since the variable is used to build a sed command, it's
important to escape the slashes.
Other variables that might be different in your environment are
the Informix home directory (INFORMIXDIR), the PATH, and the INTERFACE_CMD
variable if you are using the Informix isql utility.
What the Script Does
The following is the program design logic for makehpl.ss:
- Create the session, device, and query load files from the previously
described templates.
- Using the Informix database access utility, delete the job
from the session, device, and query tables in the onpload database.
- Using the Informix database access utility, load the session,
device, and query load files into the onpload database.
Note these three issues:
1. Only effective user "informix" can execute this script.
2. Each file loads individually instead of together so we can
readily identify where errors occur.
3. If you're an experienced SQL programmer, you might be
wondering why we use the Informix load command instead of
a more generic SQL insert statement. The query table stores the
SQL statement in a binary large object (BLOB), data type, and the
insert statement executed from the Informix access utilities doesn't
support BLOB data types.
Executing the makehpl.ss Script
The makehpl.ss command-line options are:
-d -- Database that contains the table to be unloaded/loaded
-t -- Table name for the HPL job
-w -- The optional where clause
-n -- The optional number of files in the device array
Generally, the makehpl.ss script is called from another script.
For example, assume we want to create an HPL job for the event table
located in the testdb database using three files for output devices.
Remember that our convention is to name the job after the table
so the following command creates the HPL job event_table acting
on the table of the same name:
makehpl.ss -d testdb -t event_table -n 3 2> /tmp/err.log > /tmp/test.log
In the tarball, the above command exists as the testmakehpl.ss shell
script.
The following example includes an optional where clause:
makehpl.ss -d testdb -t event_table -n 3 \
-w "where the_part = \"109868-X01\"" 2> /tmp/err.log > /tmp/test.log
Obviously, it's up to the calling program to make sure the double
quotes are escaped properly.
Executing the onpload Utility
Once the HPL job exits, simply execute the appropriate onpload
command. This stub performs the event_table unload:
# unload
JOB=event_table
onpload -j $JOB -fu
while this stub executes the load:
# load
JOB=event_table
onpload -j $JOB -fl
What's in the tarball?
In a directory named hpl, we've placed the makehpl.ss and
testmakehpl.ss scripts together with the load and unload stubs previously
mentioned, and the three template files necessary to build the query,
device, and session tables. We've also included the onpload.exp
and testdb.exp directories. These directories contain the data files
and SQL scripts necessary to create the onpload and testdb databases
using the Informix dbimport utility.
A discussion of dbimport is beyond the scope of this article,
but if your current working directory is the hpl directory, the
following command creates the onpload database if it doesn't
exist:
dbimport onpload
Since the makehpl.ss script was developed under Informix Online 7.31,
the onpload.exp export was created with Online 7.31. Importing this
database into the new Online 9.x requires running a conversion script
that Informix bundles with the version 9 engine. For example, as user
"informix", to upgrade the onpload database from 7.31 to
9.4, execute:
$INFORMIXDIR/etc/conv/conpload.sh 7.31 9.40
Since the actual structure of the device, query, and session tables
doesn't change between online versions, we recommend using your
own instance of the onpload database created with the ipload GUI interface;
this eliminates the need for running the conversion script.
Conclusion
Informix recommends changing the onpload database only with the
ipload utility. But sometimes you may be in a situation where you
want to automate the process of creating an HPL job. In that case,
our makehpl.ss script may be the solution you're looking for
if you're willing to bend the rules.
We thank Rainer Raab for testing this script with Informix Online
9.4.
References
Guide to the High-Performance Loader, Version 7.3. 1998. Informix
Press: Menlo Park, CA
John Spurgeon is a software developer and systems administrator
for Intel's Factory Integrated Information Systems, FIIS, in
Aloha, Oregon. He is currently training for the Furnace Creek 508
and still enjoys turfgrass management, triathlons, and spending
time with his family.
Ed Schaefer is a frequent contributor to Sys Admin. He is a
software developer and DBA for Intel's Factory Integrated Information
Systems, FIIS, in Aloha, Oregon. Ed also edits the monthly Shell
Corner column on UnixReview.com. He can be reached at: shellcorner@comcast.net. |