jul2005.tar

Creating Informix HPL Jobs in Scripts

Ed Schaefer and John Spurgeon

If you are administering Informix OnLine databases containing very large tables, the High Performance Loader (HPL) bundled with the engine can be a godsend. Loading or unloading a very large table using the HPL is usually much faster than executing the Informix load or unload commands from a utility such as isql or dbaccess. The HPL is also faster than a typical application program that uses embedded SQL.

The procedure recommended by the Informix manuals for creating HPL jobs involves manual interaction with a GUI utility. We present an alternative to this approach in the form of a shell script called makehpl.ss.

Anatomy of the HPL Utility

According to the Guide to the High-Performance Loader, there are three major parts to the High Performance Loader:

1. The onpload utility executes the load or unload jobs. In contrast to building a job, it's easy to run a job from a script or at the command-line using the onpload command.

2. The ipload utility is the GUI program used to "prepare the parameters" that the onpload utility uses when executing a load or unload job. In other words, the ipload utility is used to build HPL jobs.

3. The onpload database stores the parameters configured via the ipload utility. If the database doesn't exist, the ipload utility will create it. When executing a job, the onpload utility retrieves data from the database.

Key Tables in the Onpload Database

To build an HPL job, makehpl.ss populates the session, device, and query tables in the onpload database. The session table controls the parameters that onpload uses to invoke a load or unload job.

The device table defines the elements of a "device array". Three device types are supported: tape devices, file names, and pipe commands. Our script supports only file names. The query table stores the queries that are used for unloading data from an Informix database.

Template Files and Command-Line Options

When using makehpl.ss to build a job, the session, device, and query tables are populated using three template files: sessiontemplate.txt, devicetemplate.txt, and querytemplate.txt. The template files are pipe-symbol delimited files. A sed command replaces variables in the template files with values passed to the script via command-line arguments. The Informix isql/dbaccess utility uses the output from the sed command to populate the three key tables in the onpload database.

The variables in the template files are:

TABLE_NAME: This is a name of a table to be loaded or unloaded. By convention, it is also the name used to identify a particular job.

DATABASE: This is the name of the database where the TABLE_NAME resides.

SELECT_STMT: This is a select statement created by TABLE_NAME, select * from TABLE_NAME, used to identify the rows to be loaded or unloaded. An optional "where clause" passed as a command-line argument can be appended to the select statement.

DEVICE_NUMROWS: This argument is the number of data files used during the load and unload process. The default is one; more on this below.

The session template file contains the following two records:

U|A|N|0|TABLE_NAME| ||TABLE_NAME|TABLE_NAME||DATABASE||||/tmp/ 
  TABLE_NAME.rej|/tmp/TABLE_NAME.log|0|0|0|0|0|0|0|4|0|0|0|0|2|1|0|0|0|0|
U|A|N|0|TABLE_NAME| ||TABLE_NAME|TABLE_NAME||DATABASE||||/tmp/ 
  TABLE_NAME.rej|/tmp/TABLE_NAME.log|0|0|0|0|0|0|0|4|0|0|0|0|2|2|0|0|0|0|

The reject file and the log files are named /tmp/TABLE_NAME.rej and /tmp/TABLE_NAME.log, respectively.

The device template file contains the following record:

TABLE_NAME|DEVICE_NUMROWS|FILE| 
  DIRPATHTABLE_NAME_DEVICE_NUMROWS.unl|0|0||N||

A while loop creates the number of records specified by the value of DEVICE_NUMROWS. Each record has a unique file name created by the script. For example, if the DEVICE_NUMROWS is 3, the device file might look like this:

event_table|1|FILE|/tmp/event_table_1.unl|0|0||N||
event_table|2|FILE|/tmp/event_table_2.unl|0|0||N||
event_table|3|FILE|/tmp/event_table_3.unl|0|0||N||

When executing a load or unload, data may be unloaded to or from the filenames listed in column 4. More device files present an opportunity for parallel processing. However, the HPL utility itself determines whether to use all of the available device files.

The following is the contents of the query template file:

0|0|TABLE_NAME|DATABASE||N|SELECT_STATEMENT|

Installing the Script

The variable PROGRAM_LOCATION identifies the directory where the template files and the resulting load files are located. Be sure to set the value of this variable to the appropriate path when you install the files (see "What's in the tarball").

The DIRPATH variable points to the directory where the load/unload file resides. Obviously, the file system should be large enough to hold the file. By default, DIRPATH is set to /tmp:

DIRPATH="\/tmp\/"   # directory to the file

Since the variable is used to build a sed command, it's important to escape the slashes.

Other variables that might be different in your environment are the Informix home directory (INFORMIXDIR), the PATH, and the INTERFACE_CMD variable if you are using the Informix isql utility.

What the Script Does

The following is the program design logic for makehpl.ss:

Create the session, device, and query load files from the previously described templates.
Using the Informix database access utility, delete the job from the session, device, and query tables in the onpload database.
Using the Informix database access utility, load the session, device, and query load files into the onpload database.

Note these three issues:

1. Only effective user "informix" can execute this script.

2. Each file loads individually instead of together so we can readily identify where errors occur.

3. If you're an experienced SQL programmer, you might be wondering why we use the Informix load command instead of a more generic SQL insert statement. The query table stores the SQL statement in a binary large object (BLOB), data type, and the insert statement executed from the Informix access utilities doesn't support BLOB data types.

Executing the makehpl.ss Script

The makehpl.ss command-line options are:

-d -- Database that contains the table to be unloaded/loaded

-t -- Table name for the HPL job

-w -- The optional where clause

-n -- The optional number of files in the device array

Generally, the makehpl.ss script is called from another script. For example, assume we want to create an HPL job for the event table located in the testdb database using three files for output devices. Remember that our convention is to name the job after the table so the following command creates the HPL job event_table acting on the table of the same name:

makehpl.ss -d testdb -t event_table -n 3  2> /tmp/err.log > /tmp/test.log

In the tarball, the above command exists as the testmakehpl.ss shell script.

The following example includes an optional where clause:

makehpl.ss -d testdb -t event_table -n 3   \
  -w "where the_part = \"109868-X01\"" 2> /tmp/err.log > /tmp/test.log

Obviously, it's up to the calling program to make sure the double quotes are escaped properly.

Executing the onpload Utility

Once the HPL job exits, simply execute the appropriate onpload command. This stub performs the event_table unload:

# unload
JOB=event_table
onpload -j $JOB -fu

while this stub executes the load:

# load
JOB=event_table
onpload -j $JOB -fl

What's in the tarball?

In a directory named hpl, we've placed the makehpl.ss and testmakehpl.ss scripts together with the load and unload stubs previously mentioned, and the three template files necessary to build the query, device, and session tables. We've also included the onpload.exp and testdb.exp directories. These directories contain the data files and SQL scripts necessary to create the onpload and testdb databases using the Informix dbimport utility.

A discussion of dbimport is beyond the scope of this article, but if your current working directory is the hpl directory, the following command creates the onpload database if it doesn't exist:

dbimport onpload

Since the makehpl.ss script was developed under Informix Online 7.31, the onpload.exp export was created with Online 7.31. Importing this database into the new Online 9.x requires running a conversion script that Informix bundles with the version 9 engine. For example, as user "informix", to upgrade the onpload database from 7.31 to 9.4, execute:

$INFORMIXDIR/etc/conv/conpload.sh  7.31  9.40

Since the actual structure of the device, query, and session tables doesn't change between online versions, we recommend using your own instance of the onpload database created with the ipload GUI interface; this eliminates the need for running the conversion script.

Conclusion

Informix recommends changing the onpload database only with the ipload utility. But sometimes you may be in a situation where you want to automate the process of creating an HPL job. In that case, our makehpl.ss script may be the solution you're looking for if you're willing to bend the rules.

We thank Rainer Raab for testing this script with Informix Online 9.4.

References

Guide to the High-Performance Loader, Version 7.3. 1998. Informix Press: Menlo Park, CA

John Spurgeon is a software developer and systems administrator for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. He is currently training for the Furnace Creek 508 and still enjoys turfgrass management, triathlons, and spending time with his family.

Ed Schaefer is a frequent contributor to Sys Admin. He is a software developer and DBA for Intel's Factory Integrated Information Systems, FIIS, in Aloha, Oregon. Ed also edits the monthly Shell Corner column on UnixReview.com. He can be reached at: shellcorner@comcast.net.