A
User-Friendly Web Site Update Tool
Lisa Hamet Bernard
Because I come from the command-line world, I am amazed at the
level of dependence on GUI-based administration tools that has evolved
in recent years. These interfaces have become so prevalent that
many IT staffers with system management responsibilities are loath
to use any package without a point-and-click front end. I recently
encountered such a scenario with one of my customers.
The Challenge
The organization's Web environment consists of a local development
Web server and two load-balanced, ISP-hosted production Web servers,
all of which are Unix machines. However, all staff members responsible
for Web site content, including the Webmaster, are exclusively Windows
and Mac users, not at all comfortable working in the traditional
Unix command-line environment.
I needed to enable the Webmaster (a non-root user) to list and
selectively deploy files from the development Web server to the
production Web servers with the ease of a simple point-and-click
interface on his PC. The obvious Unix solution was to use the rsync
open source data synchronization tool for its flexibility, reliability,
and efficiency. My task was to build a Web interface around it.
In this article, I will describe the application I developed, including
configuration of the underlying open source tools used.
The Environment
The development and production Web servers are Sun servers running
Solaris 8 with Apache 1.3.33. The development Web server is firewalled
from external access, and two dedicated T1 lines connect the customer
site to the ISP's data center where the production Web servers are
hosted. Staffers create and update Web pages using various Windows-
and Mac-based packages. The Webmaster regularly reviews all new
files contributed to the development Web server and deploys them
to production. Previously, this was an inefficient, error-prone
process.
The Webmaster, with only basic knowledge and a low comfort level
in a Unix environment, downloaded all of the files on the development
Web server to his PC, then manually renamed old versions of the
files on the production site and ftp'd the new versions from his
PC. He often did this multiple times a day. My proposal to create
a Web-based tool to automate and harden this process was enthusiastically
received by both the Webmaster and his management.
Requirements
The application had to satisfy the following requirements:
1. Display the list of files on the development Web site that
are new or modified relative to the files on the production Web
site.
2. Preserve copies of previous versions of production Web site
files on the production servers.
3. Run as the Apache user, nobody, while preserving file
attributes including ownership, which is usually user webmaster.
4. Securely transfer files between the development and production
servers (i.e., no clear-text passwords).
5. Support access restriction.
6. Be easily supportable by in-house Unix admin staff long after
I'm gone!
Design
There are three parts to the code: a Web page front end (deployprod.html)
and two Perl CGI scripts (do_rsync and move2prod) that do the work.
The Web page
The Web page, deployprod.html (Listing 1), is a simple HTML form
containing only a submit button to launch the first CGI script,
do_rsync (Listing 2). See Figure 1 for a view of the page generated.
Knowing that this is just an administrative utility for a few users,
I left this page rather bare. This file is located on the development
Web site in the /admin directory. /admin (and its contents) is password-protected
using Apache digest authentication to prevent unauthorized access.
Digest authentication transmits only MD5-encrypted passwords to
and from the user's browser. Apache is built with the auth_digest
module to support this feature. See the Web site for details:
http://httpd.apache.org/docs/howto/auth.html
When a user clicks the submit button on the Web page (labeled "Click
to generate list of candidate files"), do_rsync is executed to begin
the process.
Listing Candidate Files
The function of the do_rsync script (Listing 2) is to generate
a list of candidate files to be deployed and display that list in
HTML. To generate the list, rsync (http://rsync.samba.org)
operates in dry-run mode between the development Web site directory
tree and the production Web site directory tree. Because rsync is
operating in dry-run mode, no files are actually copied, but a list
of "candidate" files is generated. A file is considered a candidate
if (a) it exists on the development site but not on the production
site, or (b) the file on the development site differs from the file
with the same name on the production site, and the file on the development
site is newer.
Because rsync is executed via a CGI script to a non-local server,
I took some security precautions. First, to encrypt the communication,
rsync is configured to use OpenSSH as its transport mechanism, with
pre-shared keys for authentication. Therefore, a password does not
have to be provided when the session is initiated. This strategy
removes the inherent security risk of a /etc/hosts.equiv or $HOME/.rhosts
file.
For details on OpenSSH and preshared keys, review Matt Lesko's
article, "Installing and Configuring OpenSSH", Sys Admin,
October 2000 (http://www.samag.com/documents/s=1160/sam0010a/)
and refer to "Configuring the Supporting Tools" later in this article.
Furthermore, rsync is executed via the public domain sudo utility
so that it runs as user webmaster instead of as the Apache
user nobody. User nobody is considered quite insecure,
so opening an unchallenged connection for it between remote servers
would introduce a significant risk. Sudo is also described in the
"Configuring the Supporting Tools" section.
The other arguments supplied in the rsync call are recursive
(to compare the entire directory tree), update (so newer
files on the production side are not overwritten), archive
(for consistency with the move2prod code, when rsync is not running
in dry-run mode), and exclude-from=/usr/local/apache/cgi-bin/exclude-list.txt
(to intentionally ignore the listed specific files or directories
that should never be candidates). For a detailed discussion of the
rsync utility, see Chris Hare's two-part article, "Keeping Data
in Sync::rsync", Sys Admin, June/July 2004 (http://www.samag.com/documents/s=9171/sam0406c/
and http://www.samag.com/documents/s=9216/sam0407b/). The
results of the rsync call are written to a temporary file, /tmp/rsynclist.txt.
The contents of rsynclist.txt are displayed in a Web form consisting
of a table and check boxes for each named file. Because I was tinkering
with the look of the HTML table when I wrote this code, I generated
the HTML code with print statements instead of using a Perl module.
However, I can eventually redo it using Perl CGI module functions
to simplify the look of the code. In any case, after stripping off
the rsync informational output, each line of the rsync output is
printed in the table, and the value of the associated checkbox is
set to the variable $counter. This value is just the line number
from /tmp/rsynclist.txt.
I chose this design because the file names (including path) might
be quite long, and the number of files checked might be significant.
I wanted to limit how much data is potentially passed between CGI
scripts, so a list of indexes was a good solution. The submit button
for this form, labeled "Move selected files", passes the list of
checked check box values to the move2prod CGI script. See Figure
2 for a sample page generated.
Completing the Process
The function move2prod (Listing 3) performs the rsync of the selected
files. I used the param function of the Perl CGI module to retrieve
the index list passed to it from do_rsync. For each index, the corresponding
filename from /tmp/rsynclist.txt is read. This filename is written
to a second temporary file, /tmp/copylist.txt, which will be used
as the include file for rsync.
Now we're ready for the rsync operations, first to one production
Web server and then to the other. The command is identical to the
rsync run in do_rsync (minus the --dry-run option), with
the following exceptions. First, --files-from=/tmp/copylist.txt
replaces the --exclude-from= argument, so rsync targets only
those files selected by the Webmaster. Second, the combination of
--archive and --relative is needed to force rsync
to automatically create parent directories on the target server
when necessary. Note that the pathnames generated from the initial
rsync dry run are relative to the starting directory name passed.
Finally, to meet the requirement of preserving previous copies
of production files that are updated, the combination of --backup
and --suffix=.timestamp is used. These options cue rsync
to create a backup copy of each previously existing file on the
target server before overwriting. The name of the backup copy is
filename.timestamp (in mmddyyHHMM format).
Move2prod ends with some housekeeping. Both temp files are deleted,
and the user sees a Web page with the following: "rsync operation
complete. N files deployed to production." N is replaced appropriately,
of course. Again, this application is for just a few internal admins,
so I didn't take the time to embellish the page. As an aside, a
simple cron job runs nightly on the production servers to delete
those time-stamped file copies that are 30 days old.
Configuring the Supporting Tools
In my execution of rsync, I used two other open source tools:
OpenSSH and sudo. OpenSSH is used to create an encrypted tunnel
between two servers; in this case, it is between the development
Web server and each production Web server. See the OpenSSH site
for details:
http://www.openssh.com
Sudo's authors describe the tool this way: "Sudo (superuser do) allows
a system administrator to give certain users (or groups of users)
the ability to run some (or all) commands as root or another user
while logging the commands and arguments." See the sudo Web site for
details:
http://www.courtesan.com/sudo
For Solaris, all three tools (including rsync) are also available
at:
http://www.sunfreeware.com
As with rsync, installation of these tools is beyond the scope of
this article. However, beyond that step, I will describe the configuration
necessary for my application.
OpenSSH allows several authentication mechanisms. In this case,
rsync relies on pre-shared keys. Keys must be generated for the
server (host) itself and for user webmaster on all three
Web servers using ssh-keygen. ssh-keygen generates a public and
a private key for the type of encryption specified. When creating
these key pairs, be sure to leave passphrase blank. The host keys
should be installed in /etc or whichever path you specify for HostKey
in the sshd_config file. User keys should be installed in $HOME/.ssh.
To pre-share the keys, copy Webmaster's public key (keyname.pub)
from the source machine (development Web server) into the $HOME/.ssh/authorized_keys
file on the target machines (production Web servers). Also, make
sure that the Webmaster has manually connected from the development
Web server to both production Web servers using ssh at least once,
so that both production Web servers' public keys are added to Webmaster's
$HOME/.ssh/known_hosts file on the development Web server. Now the
Webmaster, and more specifically rsync executing as user webmaster,
can ssh between the servers without being prompted for a password.
My application uses sudo to execute the rsync command as user
webmaster inside the CGI scripts that are executing as user
nobody. Specific sudo privileges are granted via the sudoers
file. The man page for sudoers is quite extensive, so I defer to
that for explanation. The necessary lines required by this application
are:
Runas_Alias WEBADMIN = webmaster
Cmnd_Alias RSYNC = /usr/local/bin/rsync
nobody ALL=(WEBADMIN) NOPASSWD: RSYNC
For the same reason as that behind the use of OpenSSH pre-shared keys,
rsync must run without first prompting for a password, hence, the
NOPASSWD keyword.
Freeware Versions
At the time of this writing, the following open source code versions
are being used:
- Perl 5.8.3
- OpenSSH 3.5p1
- Rsync 2.6.2
- Apache 1.3.33
- Sudo 1.6.8p1
Next Steps
I developed this application to address the immediate needs of
a specific user (the Webmaster) in a specific situation (selectively
deploying Web files from the organization's development Web server
to their production Web servers). I accomplished that task. However,
as always, the list of future planned enhancements continues to
grow. Of highest priority is the addition of a locking mechanism.
This would enable multiple users to freely deploy files without
prior coordination amongst themselves and prevent the Webmaster
from inadvertently launching a second deployment before an initial
one has completed.
Equally important, I plan to add error checking on the rsync calls
and relay any failure information to the output Web page. These
scenarios are not current issues, but these additions would clearly
add to the future usability of the code. Next on the enhancement
list, I plan to add a button on the final output Web page that would
allow the user to view the rsync logging information.
Furthermore, with a complementary application, I intend to enable
the Webmaster to selectively delete files from the production Web
servers using the same Web interface. And finally, on a grander
scale, I hope to add options for the user to customize the rsync
arguments through the Web interface. Then this application could
be applied to a broader range of file backup/copy functions, putting
the power of the rsync tool into the hands of admins who don't want
to hear the words "command line".
Future updates can be found at:
http://www.lhb-consulting.com/apps
Lisa Hamet Bernard, SCSA, CCNA, is currently an independent
consultant with more than 15 years experience in Unix systems and
LAN/WAN network administration for government and industry. She received
a BSCS from the University of Maryland Baltimore County and lives
in the Baltimore-Washington, DC area. She loves spending time with
her husband and three American Eskimo dogs. She can be reached at:
lisa@lhb-consulting.com. |