File Replication
Jim McKinstry
Today, it's a given that UNIX systems need to share data across a network. A less common requirement is to have copies of data on multiple servers. The most common solution for sharing data is NFS. However, there is no widely accepted way to provide copies of data across the network. One possible solution is to replicate the data periodically to meet your needs.
While NFS provides current, real-time access to data on another server, replicated data will lag behind the original by minutes or hours. This is not a problem since replicated data is usually not used for real-time processing. Combined with Replication Server from Sybase, I have seen file replication provide an effective way to off-load defect correction tasks and report generation to a non-production server allowing the production server to run more efficiently. File replication is also ideal for off-loading production data that has been archived. Most importantly, critical system configuration files (vfstab, crontab, volume configuration, etc.) can be replicated to other servers so that the information is recoverable in case of a failure.
Originally, this file replication solution was written in C and used pipes. An associate hacked the solution out of a book and tweaked it to work on our systems. When it stopped working, I was asked to fix the problem. After a couple of days of debugging, I decided to rewrite the solution so that people with minimal coding experience would be able to maintain it. This version of the software has been running successfully in a production shop for more than 6 months and should be very easy to maintain. It is much less complex, much more reliable, and actually faster than the original C implementation. This article describes my solution.
The Code The server does the majority of the work and controls the flow of the replication process. The clients' actions are based on which files the server sends. The process is invoked from the server machine by roots' cron but may be run from the command line or other shells. Here's a sample cron entry on the server that replicates the data in our archive directories to the client:
0 22 * * * /replication/replication_server.sh \
/replication/data/archive_dirs client_1 \
>/dev/null 2>&1
$1 is the file that contains the list of directories to be replicated (/replication/data/archive_dirs). $2 is the name of the machine to replicate to (client_1).
$1 must list each directory to be replicated. Subdirectories are not replicated unless they are listed. /replication/data/archive_dirs follows:
/archive
/archive/199701
/archive/199702
/archive/199703
/archive/199704
/archive/199705
/archive/199706
/archive/199707
The easiest way to create a file containing a list of directories to be replicated is to do a find with a -type d option on the topmost node of the directory structure that you want replicated. The output can then be edited to remove any directories that you don't want replicated. Here's how I created /replication/data/archive_dirs:
find /archive -type d -print > /replication/data/archive_dirs
Replication starts with the server generating the list of files on itself and on the client for the first directory listed in $1. These two lists are compared to see which files are only on the server and which files are only on the client. The files only on the server need to be sent from the server to the client. Files and directories only on the client need to be removed from the client. The files that need to be sent to the client are cpio'd into a compressed file and sent to the client. The list of files that need to be removed on the client is also sent to the client. The client uncompresses and un-cpio's the file sent from the server and also removes each file listed in the list of files to be removed that was sent from the server. Each directory listed in the list of directories to be replicated is processed in this manner until all have been replicated. Note that there are some C files included in this solution. I do not review them in great detail, but they are discussed in the Security section.
replication_server.sh
replication_server.sh is a pretty simple Bourne script, and could easily be rewritten as Perl, C, etc. I chose Bourne because it's extremely portable and easy to maintain. (All listings for this article can be found at: ftp.mfi.com in /pub/sysadmin.) After setting up some variables, the inputs are checked for validity. The file passed in must exist, and the client machine must be known to the server (i.e., must be in /etc/hosts). If either input is bad, then an error is logged and the script exits. If the inputs look good, then it loops through the list of directories that are in the file passed as $1. For each valid directory listed, an ls -an is executed on the server and the files are sorted. I ignore ., .., lost+found, the "total" line, and core files. The same is done on the client by invoking a C program, which runs as replid and does a remsh of a C program that executes an ls -an as root on the client machine. By using ls -an, I can check mode, number of links, owner, group, size in bytes, and time of last modification for each file with one command. If your systems use a lot of hard links, then you may want to cut out the number of links field. Once I have the sorted lists of files from the client and server, I do a comm on them:
comm -23 ${SERVER_FILE_LIST}.sorted \
${CLIENT_FILE_LIST}.sorted \
> ${CPIO_LIST_FILENAME}.long
This comm command generates a list of files that are on the server and are either not on the client or are different (size, date, etc.) on the client. This list is still in ls -an format. All I need is the file name, so I create the list of filenames with this awk routine:
awk '{print $9}' ${CPIO_LIST_FILENAME}.long > ${CPIO_LIST_FILENAME}
Next, I figure out which files need to be removed on the client side. I could have repeated the comm from above using -13 instead of -23. This would have worked for everything except directories. It's been my experience that two directories can have identical contents, but their sizes can legitimately show up as being different. I don't want to remove a whole directory structure because of this idiosyncrasy. This can be avoided by comparing the file names when determining which files need to be deleted. Instead of repeating an ls on the server and across the network on the client, I use the awk routine above to create lists of files on the client and server from the original ls -an outputs. I've found that the awk routine is much faster than running the ls over the network and at least as fast as running the ls on the local machine. In fact, when dealing with hundreds or thousands of files in a directory, the awk is significantly faster than ls in all cases. These lists of files are then sorted, and comm is run against them to get the list of files to be deleted on the client:
comm -13 ${SERVER_SHORT_LIST}.sorted ${CLIENT_SHORT_LIST}.sorted >
${REMOVE_LIST_FILENAME}
At this point, I have the list of files that need to be copied from the server to the client and the list of files that need to be deleted on the client. I now need to get the files from the server to the client while preserving the files' attributes and not overwhelming the network. I found that using cpio combined with gzip works well to package the files to be sent to the client. The if keeps me from trying to send no files to the client. (gzipping an empty file produces a 3-byte file, which would then be sent to the client, un-gzipped, and un-cpio'd.)
if [ -s ${CPIO_LIST_FILENAME} ]
then
cat ${CPIO_LIST_FILENAME} | \
cpio -ovc | /usr/local/bin/gzip \
> ${CPIO_FILENAME}
fi
If you are low on disk storage or network bandwidth, you may want to use gzip with a -9 flag for the best compression. If you have plenty of disk space and network but are low on CPU power, you may want to remove the gzip. This version of replication_server.sh does not compress the "list-of-files-to-be-deleted" file before sending it to the client. If you find that this file is frequently large, you may want to compress it before sending it to the client.
I use ftp to transfer the compressed cpio file and the "list-of-files-to-be-deleted" file to the client. To do this I was forced to keep the password for the replication ID (replid) in a file. I didn't see any way around this, so the password is stored in the TO_PASSWD variable at the top of this script. This script is owned by root, and its permissions are 100 so that no one will have access to the password.
I only ftp the files if they have data in them. There is no need to open an ftp session if there is no data to send.
if [ -s ${CPIO_FILENAME} ] # If there are files to copy
then # then send them to the
# remote machine
ftp -n -v ${TO_NODE} >> ${LOG_FILE} << EOL
user ${TO_ID} ${TO_PASSWD}
bin
put ${CPIO_FILENAME}
quit
EOL
After the ftp, the script (check_ftp.sh) checks whether the files were successfully transmitted then continues appropriately. If there were files to be copied or removed, then it calls the client process on the client machine to process the files as needed. This is executed in the background so that processing can continue on the next directory to be replicated. The server should not sit idle while the client works.
if [ ${COPY_FILE_COUNT} -ne 0 -o \
${REMOVE_FILE_COUNT} -ne 0 ]
then # If files were sent to be copied/removed
# then process them
/admin/shell/replication/process_cpio_file \
${CPIO_FILENAME} ${DIRECTORY} \
${REMOVE_LIST_FILENAME} \
${TO_NODE} \
${LIST_OF_DIRECTORIES} \
> /dev/null 2>&1 &
fi
If your client machine is being overwhelmed by the server, you can call the client process in the foreground so that it only works on one directory at a time.
check_ftp.sh
check_ftp.sh is used to check whether a file was successfully transferred between the client and the server. It does this by comparing the results of a sum of the file on the client and server. Running a sum on a large file twice (once on the client and once on the server) can be very time consuming. If you are confident in your network's reliability or you have a third party replacement for ftp that guarantees the data, you may want to remove the calls to this shell. The guts of it are shown in Listing 1.
replication_client.sh
replication_client.sh is the client part of the replication process. All of the complexity and logic needed to decide what should be replicated and deleted takes place on the server. All the client does is copy or delete what it has been sent. The inputs are:
$1 - Name of the cpio/gzipped file
$2 - Name of the directory being replicated
$3 - List of files to be deleted
$4 - Name of file that contains the list of directories to be replicated
This script is straightforward, and the interesting parts are shown in Listing 2.
Security The cost of a simple, automated, portable replication solution is security. I am not too concerned about the security "holes" that this solution may introduce to a system. It has been my experience that with the proper measures in place, there are no security problems. Each user of this code must decide if the means justify the gains. I will address the following three security concerns:
- The use of the "r" services
- A password stored in a file
- setgid/setuid C routines
It is often recommended that the "r" services (remsh, rlogin, etc.) be turned off as a security precaution. While nothing is more secure than turning the services off, I have found that using TCP Wrappers is an acceptable alternative. Turning on remsh (rsh, rexec, etc.) and configuring TCP Wrappers to allow only your trusted servers access is pretty safe. In conjunction with that, I also have defined no shell for the replication ID. This essentially keeps anyone from logging in with that ID. Here's my entry from /etc/passwd for replid (replid is also in its own group):
replid:x:1234:1234: FILE REPLICATION ID:/replication:
Note that the home directory is the directory in which all of the replication software is stored. This directory is owned by root with permissions 111. As an extra precaution, I also run a nightly cron that searches for any .rhosts that are not in the replication ID's home directory and alerts me through mail so that I can investigate.
Storing a password in a file is painful at best. Unfortunately, it is a necessary evil when automating ftp. To protect this password, the permissions on the file are 100, and its directory has permissions of 111. The directory, and all of the files in it, are owned by root, and the group is the replication group. With permissions of 111 on the home directory and no login shell, there's not much someone could do even if they were to learn the password for the replication ID.
The C routines were written so that I wouldn't need a .rhosts file for root. Essentially, the C routines allow replication_server.sh to issue remsh commands as replid to the client and have those commands run as root on the client. While gathering the ls -an information from the client, for example, replication_server.sh calls client_ls-n. client_ls-n sets its UID to replid and calls client_ls-n.sh. client_ls-n.sh makes the remsh call to the client. The remsh executes ls-an_client on the client, which runs as replid. ls-an_client sets its UID to root and runs ls-an_client.sh to gather the ls -an information. Running as root avoids permission problems when gathering the list of files. A similar set of calls is performed to call replication_client.sh. The C source modules have permissions of 000 and are owned by root. The group ownership is the replication group, and they are held in a secure directory. The executables have permissions of 4110. SETUID is turned on so that the replication ID can execute it as root. If you are not concerned about having a .rhosts file for root, then you can get rid of the C code and issue the remsh commands directly from the main scripts.
Reliability Large replications (hundreds of files, hundreds of megs of data) occur nightly across a LAN and a WAN without incident. If you have problems with the ftps failing, you may want to put each ftp in a loop that will either loop until each ftp is successful or fail after a certain number of retries. Currently only one ftp attempt is made. If this attempt fails, root is notified by email. You can then re-run it manually, or let the next replication catch it.
Setup Some setup is required to use this software. A dedicated replication ID with no login shell in its own dedicated replication group should be added to each system. The code, as well as a .rhosts file containing the appropriate information, should be put on each machine in the replication ID's home directory. Lock down the home directory as described above. PGNAME and USERID should be set up appropriately in the C modules and should be recompiled (use compall.sh to compile and set up the recommended permissions).
Begin with small amounts of data when implementing this software. Also, set up some test directory structures to experiment with until you get a feel for how it is going to work on your systems. Be very careful, this solution will remove entire directory structures on the client. You should always have a complete system backup before testing.
Conclusion File replication has proven to be invaluable to our department. Our Disaster Recovery solution depends heavily on replication. It has cut down recovery time from system outages by allowing us to restore critical information (volume configuration, disk configuration, etc) from another server as opposed to restoring from tape or starting from scratch. We have also been able to off-load all of our development activities and batch reporting to another server by providing hourly replication of production files. Production customers' response time has been greatly improved. I hope you will be able to make use of this software. n
About the Author
Jim is a Technical Analyst specializing in UNIX. He has worked for IBM, Rite Aid and EDS and is currently working for Sprint Paranet. He can be reached at jrmckins@yahoo.com.
|