vGen
Jim Pinson
Originally a tool of Webmasters, Hypertext Markup Language (HTML) is being used increasingly by system administrators for the production of documentation. Some UNIX-style operating systems such as FreeBSD distribute their manuals and FAQs in HTML format. These documents are easily read using text-based browsers such as Lynx. In such cases, there may not even be a Web server, because the files are read directly off the host file system. HTML has many advantages for the distribution of documentation. Most users already have access to Web browsers, so there is no extra expense in buying and distributing viewers for the documentation. Nor is there any requirement for expensive authoring tools, since HTML can be created with simple text editors or public domain editors.
The documents produced can be quite attractive, with all the hypertext links, graphics, and other features that users find familiar. No retraining of users is required. The process of creating HTML material is essentially the same for Web sites and file system-based distribution. In all cases, however, administrators need to produce documents quickly and free of errors. Additionally, Web sites often try to appeal to a wide range of users, some of whom prefer text-only versions, some who want only minimum graphics, and others who want dizzying, state-of-the-art presentations. As the size, complexity, and frequency of updating of all the versions increase, the difficulty in maintaining the site can skyrocket. The busy system administrator or Webmaster needs a tool to automate the process of generating multiple versions. This article describes a simple but powerful "version generator" called vGen.
Web Site Evolution The heart of most sites is the text-based content, which is then enhanced for particular audiences. The crude way to do this is to create a text-only version first, then modify it by hand to create other versions. The problem with this approach is that when the text changes, all the various versions of the site must be updated.
Likewise, the navigation buttons that are so popular (Home, Previous, Up, Next) must be created differently on text and graphic-based systems. Inserting a new link in the Web tree requires relinking the pages for each version of the site. Needless to say, this process is very prone to errors and omissions that could make the site less useful.
I experienced the problem of maintaining multiple versions of a Web site and navigation links while creating the original Web site for the Charles Darwin Research Station, Galapagos Islands (http://fcdarwin.org.ec). The original setup was not difficult. The site had a few basic pages, and included navigation links at the bottom of each page. To compensate for the slow Internet link to the Station, all the pages were text-only. Later, as the site grew, the difficulty of maintaining the navigation links increased. Every time we added a new page, it required rewriting the navigation links for the adjacent pages. Editing the pages by hand was tedious and prone to errors.
The problems increased when I set up a mirror site containing graphic images on a new host site with a higher-speed Internet connection. Now it became necessary to maintain both graphic and text versions of the Web site, even though much of the textual content was identical. Adding fill-in forms to the pages required yet more customization to refer to the correct CGI scripts for each version's host site. It was a very time-consuming and error-prone process to mark up every new document for each version of the Web site.
My experience is not unique. Any site that wants to appeal to a wide variety of users must keep multiple versions online. A well-designed Web site that wishes to reach the widest audience will usually offer the visitor a choice of formats, such as, text vs. graphics, or frames vs. non-frame versions.
Version Generator A wide variety of programs can be found on the Internet to help produce Web documents. In addition to HTML editors, there are specialized programs to help check the validity of HTML tags, and even programs to automatically build navigation links. Despite their usefulness, these programs could not help streamline the process of maintaining the multiple versions of the Web site.
My goal was to have only one version of each HTML document in a single source directory, and from that one document to generate multiple customized versions of the Web site. When a document required upgrading, I would need only to edit the single source file, and all the versions would automatically reflect the change. I also wanted the navigation links updated whenever I added a new page to the Web tree. I searched for a tool to fulfill all my requirements, and was surprised to discover that none was available.
I therefore designed Version Generator (vGen), a fairly simple program that would:
Automatically create navigation links between pages.
Create multiple versions of the Web site from a single set of source files.
Never alter the original files.
Run under different operating systems.
Developing Java Applications I chose to create this application using Java instead of the more traditional Perl. I chose Java for several reasons:
Java is highly portable, an important consideration for system administrators working in a heterogeneous environment.
Program development can take place in an Integrated Development Environment, complete with debugging tools (I used Symantec's Visual Cafe).
Java source code is more readable than Perl.
Java provides the option of distributing compiled bytecode instead of source code.
Over the years I have been frustrated by having useful utilities that would only run under a single OS. Using Java allowed me to start writing programs that could run on the systems I worked with the most, UNIX and Windows.
Running Java Applications To run vGen, you must have a java interpreter on your computer. This can be either Sun's original JDK, (Java Development Kit) or a Java interpreter from Symantec or some other provider. You can obtain Sun's JDK from http://java.sun.com.
The documentation that comes with your interpreter will provide a detailed explanation of how to run Java. Generally, running a Java program consists of two steps: compiling the source code into bytecode, which can be thought of as a compiled program for an imaginary computer; and running the Java interpreter, which uses native binary code to run the instructions it finds in the bytecode.
vGen comes with both source code and bytecode. Since the bytecode can be run on any system for which there is a Java interpreter, there is no need for you to use the source code unless you wish to modify the program.
vGen bytecode comes in two parts: the main program vgen.class in Listing 1, and the string class stringlib.class in Listing 2. You should install these two classes in the same directory. All the examples that follow assume you are running the precompiled bytecode.
Example 1: Building Navigation Links Building navigation links is a fairly simple use for vGen. This example creates a Web site that reviews books and movies. The Web site starts with a home page that has two sublevels, one for books and one for movies. Each sublevel has further divisions. Conceptually, the Web tree outline looks like this:
welcome.html
movies.html
adventure.html
western.html
romance.html
books.html
poetry.html
history.html
Science.html
The Home, Previous, Up, and Next navigation links reflect the structure of this outline in the Web pages. For example, a user browsing western.html would use the navigation guide [Up] to go to the superior level movies.html, whereas the navigation guides [Previous] and [Next] would lead to adjoining elements on the same level, adventure.html and romance.html, respectively.
To apply vGen to a document, the first step is to put a special marker in each Web page to show where the navigation links are to be placed. These markers all begin with $$$ followed by a command. The markers always occur on the left side of the page on a line by themselves.
vGen has two navigation markers, $$$text-link for text-based navigation, and $$$graphic-link for graphic buttons. The text-based navigation uses the words [Home][Previous][Up][Next] to aid the user, and is typically found in text-only Web pages. The graphic version uses graphic buttons, and is probably the most familiar to most people. Place the appropriate text or graphic marker in the <BODY> section of the source HTML code where you want the navigation guides to appear, usually at the bottom of the page.
Now, using the Web tree outline seen earlier, you can create the following configuration file for vGen:
[config]
sourcedirectory=/usr/local/html/movie-source
outputdirectory=/usr/local/html/movie-out
[webtree]
welcome.html
movies.html
adventure.html
western.html
romance.html
books.html
poetry.html
history.html
science.html
This configuration file is modeled after the Windows .ini files, and has two sections, [config], and [webtree]. The [config] section specifies the source and destination directories. When working on a Windows-based system, you could use DOS-style paths, such as sourcedirectory=c:\html\movie-source.
Note that the initial home page, welcome.html, is not indented. The movies.html and books.html entries are each indented one space to indicate that they are both one level down the tree. Sublevels beneath books and movies are indented two spaces to show that they are two levels down.
Assuming you saved the configuration file as reviews.cfg, you would run vGen using the command java vgen reviews.cfg. vGen will read the source HTML files from the specified directory, calculate the links, and insert the navigation markers into the output HTML files.
The graphic version of the navigation links requires GIF files to represent the buttons that will appear on the Web page. These GIF files must be manually placed in the output directory. vGen uses the GIF files btn-home.gif, btn-last.gif, btn-up.gif, and btn-next.gif. These buttons are included with the vGen source code on the Sys Admin ftp site, or you may create your own.
Example 2: Generating Multiple Versions of the Web Site Using source files, the previous example created a single output directory that contained copies of the original HTML files with added navigation markers. You could use this technique if you wanted only a single version of a site, and used vGen only to create navigation links.
But you may want to put output in two directories, one directory having only text and text-based navigation, and the other containing graphical images and graphical navigation buttons. First, place markers in the source HTML file to indicate what kind of material follows the marker. These markers are called mode markers and have the form $$$mode=name. The name portion of the marker can be anything, so I'll call my mode markers $$$mode=text and $$$mode=graphic. Once again, markers begin on the left side of the page and are on a line by themselves in the <BODY> section of the HTML document.
Mode markers differ from the HTML tags in that they do not come in pairs such as the <P> </P> paragraph tag of HTML. Mode markers tell vGen that everything following them is a certain mode (or version). The mode markers remain in effect until a new marker is encountered. We place $$$mode=text before any kind of material that will go into the text version of our Web site, and $$$mode=graphic before any material that goes into the graphic version.
The special predefined mode $$$mode=common is used to mark material that is shared by all versions of the document. The common mode is the initial default mode for all documents, which means that until a different mode is specified, all text in the source documents will be copied to the destination directory.
After marking up the source documents, the <BODY> section of the source file might look like this:
<BODY>
$$$mode=graphic
<IMG SRC="Welcome-banner.gif" ALIGN=RIGHT>
$$$mode=text
<P>Welcome to my movie Web site</P>
$$$mode=common
<P> Shared text goes here, and is printed in all versions.</P>
$$$mode=graphic
$$$graphic-link
$$$mode=text
$$$text-link
</BODY>
To tell vGen to process this document using only the text mode, change the [config] section of the configuration file to
[config]
mode=text
sourcedirectory=/usr/local/html/movie-source
outputdirectory=/usr/local/html/movie-out-text
vGen runs this new configuration file with the command
java vgen reviews.cfg
vGen starts out using the default common mode, outputting all the text. When vGen encounters the $$$mode=graphic marker, it checks to see whether graphic mode is defined. When it sees that graphic mode is not defined, it shuts off output, skipping the graphic welcome-banner.gif.
A few lines down, vGen encounters the $$$mode=text marker, and starts writing output again, copying the text-based welcome message. The $$$mode=common mode is always defined, so the output continues, printing the shared portion of the Web page. At the end of the document, vGen skips the $$$graphic-link marker, and instead processes the $$$text-link flag, which it converts to the text-based navigation links.
Changing reviews.cfg for each version of the Web site is not very efficient. And, maintaining different configuration files for each version would violate the whole concept of vGen, since adding new files to the [webtree] section would require updating each configuration file.
Instead, specify the mode and output directory on the command line, and leave only common information in the configuration file. The [webtree] entries are the same for all versions, leaving the [webtree] section unmodified. Remove the outputdirectory, because output is sent to two different directories. The new [config] section would look like this:
[config]
sourcedirectory=/usr/local/html/movie-source
vGen now runs with the command:
java vgen review.cfg outputdirectory=/usr/local/html/ move-out-text mode=text
You could issue another command to generate the graphic version, but it's rather slow to issue multiple commands, so you are better off writing a script to generate multiple versions of the Web site. A script to generate both text and graphic versions is:
java vgen reviews.cfg outputdirectory=/usr/local/html/ \
movie-out-text mode=text
java vgen reviews.cfg outputdirectory=/usr/local/html/ \
movie-out-graphic mode=graphic
There is no real limit to the number of modes you can use in a given run. The previous example had separate runs for graphic and text versions, but you could generate documents with multiple modes. For example, if you had a primary site and a mirror site, each with different content, you might have a run with mode=graphic, mode=mirror, assuming you marked up your source document appropriately. Extra modes can be specified on the command line by separating them with spaces or by placing them on individual lines in the [config] section of the configuration file.
Other Uses for vGen The examples given here have all been for HTML documents, but vGen can process any text-based documents. For example, a plain text report could be marked up with modes to represent public or confidential, and vGen could be used to create multiple reports.
About the Author
Jim Pinson is a Systems Programmer for Florida's Agency for Health Care Administration. A former resident of the Galapagos Islands, he continues to remotely manage the Charles Darwin Research Station's UNIX server and Web site.
|