Article
Figure 1
Figure 2
Listing 1
Listing 2
Listing 3

mar2005.tar

A User-Friendly Web Site Update Tool

Lisa Hamet Bernard

Because I come from the command-line world, I am amazed at the level of dependence on GUI-based administration tools that has evolved in recent years. These interfaces have become so prevalent that many IT staffers with system management responsibilities are loath to use any package without a point-and-click front end. I recently encountered such a scenario with one of my customers.

The Challenge

The organization's Web environment consists of a local development Web server and two load-balanced, ISP-hosted production Web servers, all of which are Unix machines. However, all staff members responsible for Web site content, including the Webmaster, are exclusively Windows and Mac users, not at all comfortable working in the traditional Unix command-line environment.

I needed to enable the Webmaster (a non-root user) to list and selectively deploy files from the development Web server to the production Web servers with the ease of a simple point-and-click interface on his PC. The obvious Unix solution was to use the rsync open source data synchronization tool for its flexibility, reliability, and efficiency. My task was to build a Web interface around it. In this article, I will describe the application I developed, including configuration of the underlying open source tools used.

The Environment

The development and production Web servers are Sun servers running Solaris 8 with Apache 1.3.33. The development Web server is firewalled from external access, and two dedicated T1 lines connect the customer site to the ISP's data center where the production Web servers are hosted. Staffers create and update Web pages using various Windows- and Mac-based packages. The Webmaster regularly reviews all new files contributed to the development Web server and deploys them to production. Previously, this was an inefficient, error-prone process.

The Webmaster, with only basic knowledge and a low comfort level in a Unix environment, downloaded all of the files on the development Web server to his PC, then manually renamed old versions of the files on the production site and ftp'd the new versions from his PC. He often did this multiple times a day. My proposal to create a Web-based tool to automate and harden this process was enthusiastically received by both the Webmaster and his management.

Requirements

The application had to satisfy the following requirements:

1. Display the list of files on the development Web site that are new or modified relative to the files on the production Web site.

2. Preserve copies of previous versions of production Web site files on the production servers.

3. Run as the Apache user, nobody, while preserving file attributes including ownership, which is usually user webmaster.

4. Securely transfer files between the development and production servers (i.e., no clear-text passwords).

5. Support access restriction.

6. Be easily supportable by in-house Unix admin staff long after I'm gone!

Design

There are three parts to the code: a Web page front end (deployprod.html) and two Perl CGI scripts (do_rsync and move2prod) that do the work.

The Web page

The Web page, deployprod.html (Listing 1), is a simple HTML form containing only a submit button to launch the first CGI script, do_rsync (Listing 2). See Figure 1 for a view of the page generated. Knowing that this is just an administrative utility for a few users, I left this page rather bare. This file is located on the development Web site in the /admin directory. /admin (and its contents) is password-protected using Apache digest authentication to prevent unauthorized access. Digest authentication transmits only MD5-encrypted passwords to and from the user's browser. Apache is built with the auth_digest module to support this feature. See the Web site for details:

http://httpd.apache.org/docs/howto/auth.html

When a user clicks the submit button on the Web page (labeled "Click to generate list of candidate files"), do_rsync is executed to begin the process.

Listing Candidate Files

The function of the do_rsync script (Listing 2) is to generate a list of candidate files to be deployed and display that list in HTML. To generate the list, rsync (http://rsync.samba.org) operates in dry-run mode between the development Web site directory tree and the production Web site directory tree. Because rsync is operating in dry-run mode, no files are actually copied, but a list of "candidate" files is generated. A file is considered a candidate if (a) it exists on the development site but not on the production site, or (b) the file on the development site differs from the file with the same name on the production site, and the file on the development site is newer.

Because rsync is executed via a CGI script to a non-local server, I took some security precautions. First, to encrypt the communication, rsync is configured to use OpenSSH as its transport mechanism, with pre-shared keys for authentication. Therefore, a password does not have to be provided when the session is initiated. This strategy removes the inherent security risk of a /etc/hosts.equiv or $HOME/.rhosts file.

For details on OpenSSH and preshared keys, review Matt Lesko's article, "Installing and Configuring OpenSSH", Sys Admin, October 2000 (http://www.samag.com/documents/s=1160/sam0010a/) and refer to "Configuring the Supporting Tools" later in this article. Furthermore, rsync is executed via the public domain sudo utility so that it runs as user webmaster instead of as the Apache user nobody. User nobody is considered quite insecure, so opening an unchallenged connection for it between remote servers would introduce a significant risk. Sudo is also described in the "Configuring the Supporting Tools" section.

The other arguments supplied in the rsync call are recursive (to compare the entire directory tree), update (so newer files on the production side are not overwritten), archive (for consistency with the move2prod code, when rsync is not running in dry-run mode), and exclude-from=/usr/local/apache/cgi-bin/exclude-list.txt (to intentionally ignore the listed specific files or directories that should never be candidates). For a detailed discussion of the rsync utility, see Chris Hare's two-part article, "Keeping Data in Sync::rsync", Sys Admin, June/July 2004 (http://www.samag.com/documents/s=9171/sam0406c/ and http://www.samag.com/documents/s=9216/sam0407b/). The results of the rsync call are written to a temporary file, /tmp/rsynclist.txt.

The contents of rsynclist.txt are displayed in a Web form consisting of a table and check boxes for each named file. Because I was tinkering with the look of the HTML table when I wrote this code, I generated the HTML code with print statements instead of using a Perl module. However, I can eventually redo it using Perl CGI module functions to simplify the look of the code. In any case, after stripping off the rsync informational output, each line of the rsync output is printed in the table, and the value of the associated checkbox is set to the variable $counter. This value is just the line number from /tmp/rsynclist.txt.

I chose this design because the file names (including path) might be quite long, and the number of files checked might be significant. I wanted to limit how much data is potentially passed between CGI scripts, so a list of indexes was a good solution. The submit button for this form, labeled "Move selected files", passes the list of checked check box values to the move2prod CGI script. See Figure 2 for a sample page generated.

Completing the Process

The function move2prod (Listing 3) performs the rsync of the selected files. I used the param function of the Perl CGI module to retrieve the index list passed to it from do_rsync. For each index, the corresponding filename from /tmp/rsynclist.txt is read. This filename is written to a second temporary file, /tmp/copylist.txt, which will be used as the include file for rsync.

Now we're ready for the rsync operations, first to one production Web server and then to the other. The command is identical to the rsync run in do_rsync (minus the --dry-run option), with the following exceptions. First, --files-from=/tmp/copylist.txt replaces the --exclude-from= argument, so rsync targets only those files selected by the Webmaster. Second, the combination of --archive and --relative is needed to force rsync to automatically create parent directories on the target server when necessary. Note that the pathnames generated from the initial rsync dry run are relative to the starting directory name passed.

Finally, to meet the requirement of preserving previous copies of production files that are updated, the combination of --backup and --suffix=.timestamp is used. These options cue rsync to create a backup copy of each previously existing file on the target server before overwriting. The name of the backup copy is filename.timestamp (in mmddyyHHMM format).

Move2prod ends with some housekeeping. Both temp files are deleted, and the user sees a Web page with the following: "rsync operation complete. N files deployed to production." N is replaced appropriately, of course. Again, this application is for just a few internal admins, so I didn't take the time to embellish the page. As an aside, a simple cron job runs nightly on the production servers to delete those time-stamped file copies that are 30 days old.

Configuring the Supporting Tools

In my execution of rsync, I used two other open source tools: OpenSSH and sudo. OpenSSH is used to create an encrypted tunnel between two servers; in this case, it is between the development Web server and each production Web server. See the OpenSSH site for details:

http://www.openssh.com

Sudo's authors describe the tool this way: "Sudo (superuser do) allows a system administrator to give certain users (or groups of users) the ability to run some (or all) commands as root or another user while logging the commands and arguments." See the sudo Web site for details:

http://www.courtesan.com/sudo

For Solaris, all three tools (including rsync) are also available at:

http://www.sunfreeware.com

As with rsync, installation of these tools is beyond the scope of this article. However, beyond that step, I will describe the configuration necessary for my application.

OpenSSH allows several authentication mechanisms. In this case, rsync relies on pre-shared keys. Keys must be generated for the server (host) itself and for user webmaster on all three Web servers using ssh-keygen. ssh-keygen generates a public and a private key for the type of encryption specified. When creating these key pairs, be sure to leave passphrase blank. The host keys should be installed in /etc or whichever path you specify for HostKey in the sshd_config file. User keys should be installed in $HOME/.ssh.

To pre-share the keys, copy Webmaster's public key (keyname.pub) from the source machine (development Web server) into the $HOME/.ssh/authorized_keys file on the target machines (production Web servers). Also, make sure that the Webmaster has manually connected from the development Web server to both production Web servers using ssh at least once, so that both production Web servers' public keys are added to Webmaster's $HOME/.ssh/known_hosts file on the development Web server. Now the Webmaster, and more specifically rsync executing as user webmaster, can ssh between the servers without being prompted for a password.

My application uses sudo to execute the rsync command as user webmaster inside the CGI scripts that are executing as user nobody. Specific sudo privileges are granted via the sudoers file. The man page for sudoers is quite extensive, so I defer to that for explanation. The necessary lines required by this application are:

Runas_Alias     WEBADMIN = webmaster
Cmnd_Alias      RSYNC = /usr/local/bin/rsync
nobody  ALL=(WEBADMIN)        NOPASSWD: RSYNC

For the same reason as that behind the use of OpenSSH pre-shared keys, rsync must run without first prompting for a password, hence, the NOPASSWD keyword.

Freeware Versions

At the time of this writing, the following open source code versions are being used:

Perl 5.8.3
OpenSSH 3.5p1
Rsync 2.6.2
Apache 1.3.33
Sudo 1.6.8p1

Next Steps

I developed this application to address the immediate needs of a specific user (the Webmaster) in a specific situation (selectively deploying Web files from the organization's development Web server to their production Web servers). I accomplished that task. However, as always, the list of future planned enhancements continues to grow. Of highest priority is the addition of a locking mechanism. This would enable multiple users to freely deploy files without prior coordination amongst themselves and prevent the Webmaster from inadvertently launching a second deployment before an initial one has completed.

Equally important, I plan to add error checking on the rsync calls and relay any failure information to the output Web page. These scenarios are not current issues, but these additions would clearly add to the future usability of the code. Next on the enhancement list, I plan to add a button on the final output Web page that would allow the user to view the rsync logging information.

Furthermore, with a complementary application, I intend to enable the Webmaster to selectively delete files from the production Web servers using the same Web interface. And finally, on a grander scale, I hope to add options for the user to customize the rsync arguments through the Web interface. Then this application could be applied to a broader range of file backup/copy functions, putting the power of the rsync tool into the hands of admins who don't want to hear the words "command line".

Future updates can be found at:

http://www.lhb-consulting.com/apps

Lisa Hamet Bernard, SCSA, CCNA, is currently an independent consultant with more than 15 years experience in Unix systems and LAN/WAN network administration for government and industry. She received a BSCS from the University of Maryland Baltimore County and lives in the Baltimore-Washington, DC area. She loves spending time with her husband and three American Eskimo dogs. She can be reached at: lisa@lhb-consulting.com.