Article
Figure 1
Figure 2
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6
Listing 7
Sidebar

feb2004.tar

Firewalling HTTP Traffic Using Reverse Squid Proxy

Rajeev Kumar

Ever wonder why those lines such as "default.ida?XXXXX...." or "cmd.exe" are in your Web server logs? If your Web server is running at the default port 80/tcp, it's likely that your Web server is being attacked by one of many Internet worms. Common worms, such as Code Red, send a text string like the following in the form of a URL (read as a single line):

http://<yoursebserver>/default.ida?NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN%u909
0%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u7801%u9090%u6858%ucbd3%u780
1%u9090%u9090%u8190%u00c3%u0003%u8b00%u531b%u53ff%u0078%u0000%u00=a

The presence of this string in the log file does not necessarily mean your computer has been compromised; it only implies that a Code Red worm has attempted to infect your computer. In some cases, the worm might have infected your computer, crashed your Web server process, or (at the minimum) just filled your log files.

A common technique for protecting network resources is to place them behind a port-based firewall. Unfortunately, the practice of denying access by port number does not work well for a Web server. You need to keep port 80 open so that outside users can access the server, but if you pass all messages addressed to port 80 to the interior network, you defeat the whole purpose of a firewall. If you are going to protect a Web server with a firewall, you need an application/protocol-based firewall that allows more diverse and selective access rules. One type of firewall that provides this kind of protection is known as a proxy firewall. In this article, I will describe how to set up Squid as a proxy firewall in front of your Web server.

Squid (http://www.squid-cache.org) is a popular freeware Web-content caching program. The role of Squid as a forward Web server proxy/cache is well known. In its forward proxy configuration, Squid accesses Internet data on behalf of a client on the local network. The configuration I describe in this article is exactly opposite from the common forward-proxy scenario. This article describes the case in which the Web server is on the local network and the client is connecting from the Internet. In other words, Squid is acting as a reverse proxy.

Squid as a Reverse Proxy

Reverse Squid proxy, also known as Web Server Acceleration, is a method of reducing the load on your backend Web server using a Squid cache between the Internet and the Web server. Another advantage of reverse proxy is that you can improve security for your Web server by applying Squid ACLs and exerting control over what Web requests (URLs) are allowed to reach your Web server. The steps for setting up the cache are the same for standard (forward) and reverse squid proxy. This document will focus on the additional task of applying Squid ACL security (HTTP URL filtering/firewalling) for reverse proxy configurations.

A reverse proxy is positioned between the Internet and your real Web server (see Figure 1). When a client browser on the Internet makes an HTTP request, the DNS infrastructure routes the request to the reverse proxy server. The reverse proxy server then connects to the real Web server to get the Web page content and send it back to the client browser. (Optionally, you can take advantage of the Squid cache feature. If the same page has been previously accessed within the timeout period, Squid may be able to serve the request from its cache.) If your Web server is serving mostly dynamic content, such as cgi-bin, ASP, JSP, etc., using the cache feature is not recommended.

Setting up Squid Reverse Proxy

The following discussion applies to Squid version 2.x. Squid version 3.x is in development at the time of this writing. The version 3.x configuration looks a little different, although it follows a similar concept. I suggest you use the latest STABLE version for a production environment. I have tested this setup on Linux and Solaris.

To install Squid, download the latest STABLE version (2.5.STABLE4, as of this writing) from:

http://www.squid-cache.org

Unzip/untar at a temporary location. Change the directory to "squid source" and run ./configure with the following options. (See ./configure --help for more options):

./configure --prefix=/home/packages/squid --enable-ssl
--enable-useragent-log --enable-referer-log --enable-storeio=ufs,null

This command will configure Squid at location /home/packages/squid, enable the SSL feature using OpenSSL (you need to have the OpenSSL package installed for this option), enable a few extra logging features, and, finally, enable storage I/O options for caching mechanism. If --enable-storeio option is not specified, "ufs" is the default cache storage module/format under Squid to store cached objects. However --enable-storeio=ufs,null enable both "ufs" and "null" storage I/0 modules. Note that these are only compile time options to enable various I/O modules code. Actual decision about setting up Squid with or without cache will be taken later at a run time in a Squid configuration file (using the "cache_dir" statement). Either "ufs" or "null" modules can be selected to set up Squid with or without cache, respectively. See the Configuration section for more details about where Squid is set up without cache using the "null" module.

Then type the following commands:

make
make install

These commands install Squid in the directory /home/packages/squid (which I'll call $SQUID_HOME from now on). Create Unix user "squid" now, too. Change the ownership of $SQUID_HOME/var directory as follows. (This step is important because the squid process will write in this area as userid "squid".)

chown -Rh squid $SQUID_HOME/var

Configuring Squid

Edit the default configuration file $SQUID_HOME/etc/squid.conf. Read this file for more detail about the various options. The variables that are most important for reverse Squid proxy are shown in Listing 1. See the sidebar for a discussion of Listing 1.

For initial testing, make sure you are allowing any request to your Web server behind Squid, and once that works, firewall your HTTP server(s) as a final step. Initially, relax your Squid ACL by adding the following line (before the last deny rule http_access deny all; this setting may already be present in the default squid.conf). Your squid.conf may look like:

http_access allow all    #This will allow all request to
                         #web server behind squid.
http_access deny all

Squid will follow a top-down approach in ACL implementation. An http_access rule will be applied on an incoming request. In the above case, Squid will allow all connections, so the last deny rule is not effective in this case. If you are planning to use the cache feature, initialize the cache with the following command, which will create cache infrastructure directories under the $SQUID/var/cache area. If you are using the cache_dir null /tmp option in squid.conf, it won't create any cache directory:

$SQUID_HOME/sbin/squid -z

Testing Your Reverse Proxy Configuration

It is time to test your squid configuration for reverse proxy. Start the squid server in debug mode so that you can see detailed logs. Watch for logs in the $SQUID_HOME/var/logs/ directory:

$SQUID_HOME/sbin/squid -d 100

Watch for any errors and correct your setup before you proceed. Open your Web browser and access the Web server by typing URL (for example): http://www.mywebserver.xxx. Note that external DNS infrastructure actually maps hostname www.myserver.xxx to the IP address of the Squid proxy server. In other words, this URL is pointing to the proxy server for the outside world. Once this Web request is received by the Squid server first, Squid server finally directs this to the real back end Web server. Check the "access.log" file for Squid and see whether you find logs for this access. If this is not working, set up the ACL debug option in squid.conf (as described below) and check "cache.log". (Comment out the existing "debug_options" first):

debug_options ALL,1 33,2

Restart the Squid server and try your URL again. This should print lots of ACL debug information. Correct your problems and then turn off the debug ACL option by resuming the original "debug_options".

Firewalling Your HTTP server

The main objective of this document is to set up an HTTP-level firewall in front of the real Web server. So far, we have set up an operational Squid server in front of the Web server. All requests are passing through the Squid server and reaching to the real Web server behind Squid. It is time to set up a choke point at the Squid server and allow only desired traffic to the real Web server. Look for the "http_access" section in the squid.conf file. Before you set up the "http_access" section in the squid.conf file, first set up ACL variables that will be used in the "http_access" options. The "acl" directives have the following syntax (taken from the default squid.conf):

# acl aclname acltype string1 ...
# acl aclname acltype "file" ...
#     when using "file", the file should contain one item per line

The squid.conf file uses some descriptive name for ACL variables. For example, the default squid.conf may contain the ACL lines shown in Listing 2.

You can define more ACL variable names per your site requirements. The various ACL names are just definitions, and so far this is not affecting any HTTP traffic flow. We will now use these ACL names in a real access control list by applying the "http_access" directive in squid.conf (see example from squid.conf in Listing 3).

I should explain a few important points first. "http_access" ACL filters are applied as top-down filters. This means every http request will match against the rules from the top of the list. The first rule matched will be applied and an action taken accordingly (allow/deny). A "LOGICAL AND" will be performed for more than one "aclname" string listed in the same "http_access" line, and a "LOGICAL OR" will apply for an "aclname" in two different "http_access" lines. Consider the following example.

Suppose a Web server with (real) IP address 10.0.0.1 is running behind the Squid server, which is applying the ACLs shown in Listing 4. The Squid server IP address is 1.2.3.4. The Squid server will redirect the request to a real Web server after applying filters. This configuration will allow any valid URL, such as http://www.mywebserver.xxx, http://1.2.3.4, etc., from "mynetwork" AND (Logical AND) on ports 80 and 8080 (assuming Squid is listening at these ports OR (Logical OR), if a request is coming from somewhere other than "mynetwork" (due to "!mynetwork"), this configuration will allow only URL http://www.mywebserver.xxx (but not http://1.2.3.4) AND at ports 80 and 8080 (i.e., all three "aclnames" must be satisfied in this http_access line) OR any other requests are denied with the last rule.

As Squid notes, the "http_access" list will perform the opposite of the last rule mentioned in the squid.conf file. In the above case, this means Squid will actually perform "http_access allow all" as a last rule, which doesn't have any effect since the previous "deny" rule is denying all. If the last rule is not a "deny all" rule, you may be allowing some unwanted traffic. The following sections describe some case studies.

Case 1: Deny Random Attacks on the Web Server

An attacker or worm may attempt to send a URL containing cmd.exe (on IIS running Microsoft Windows) or /bin/sh on Unix Web servers. Deny such attacks by adding the ACL list in Listing 5.

Any request like http://www.mywebserver.xxx/default.ida?XXXXXXX or http://www.myserver.xxx/blahblah/cmd.exe from anywhere will be dropped at Squid before passing to the real Web server. Just by defining the configuration shown in Listing 5, your Web server will be greatly relieved and will avoid most automated attacks or script kiddie attacks. A seasoned attacker, however, can go beyond this and may use more sophisticated techniques (such as directory traversal and Unicode-based attacks) to evade regular expression and strings-based signatures like the above.

Case 2: Sophisticated Access Level for Your Web Server

Suppose you want to allow only a particular URL path, such as http://www.mywebserver.xxx/open_to_all, to your Web server from anywhere, and you want to allow URL http://www.mywebserver.xxx/secret only from your inside network (e.g., 192.168.0.0/24). Control such access by adding the ACL list shown in Listing 6.

Using a Redirector

So far, we have only discussed the option of defining an access list in the squid.conf file and redirecting HTTP requests to a real Web server. This approach is not very flexible if you have some complicated requirements or are dealing with many Web servers behind Squid. A redirector program lets you redirect your HTTP request in a more useful and flexible manner. A redirector is an external process (an executable program, running in infinite loop, written in shell script, Perl, C/C++, Python, or any other language) that can rewrite a URL. Squid can be configured to pass every incoming URL to a redirector process. The redirector will reply with a modified URL or with a blank line. A blank line specifies no change to the current URL. Redirector programs simply read the standard input, consisting of four arguments (supplied by the Squid process as a part of redirection), and after any modifications, writes these arguments back to standard output, which Squid will accept as a modified query. These four arguments are:

URL
ip-address/fqdn
ident
method

A redirector program is not a part of the Squid distribution. You must write your own redirector, but this is very simple if you know any programming language. For example, a simple redirector written in Perl is shown in Listing 7. Add the following redirector to your squid.conf file and restart the squid server (see Figure 2):

redirect_program=<Full_Path>/ myredirector.pl

When Squid starts, it will spawn the redirector process. The redirector process must be running in an infinite loop (like the while() loop in Listing 7). The redirector listens for its standard input, modifies the URL, and spits back the modified URL at standard output. In the above example, Squid will accept a request for www.mywebserver.xxx at standard HTTP port 80, and then redirect the request to a real Web server running at port 8080.

You can create a complicated redirector for your site needs. Many of the things Squid cannot do for you out of the box can be implemented using your redirector program. You may also use redirectors already written by others, such as:

Squirm -- http://squirm.foote.com.au/

jesred -- http://ivs.cs.uni-magdeburg.de/~eelkner/webtools/jesred/

Squidguard -- http://www.squidguard.org/

Limitations

Squid has a few limitations that are worth mentioning. If your real Web server is using any "HTTP REDIRECT" statement, then the reverse Squid proxy server should listen at the same port as the real (backend) Web server, otherwise unexpected result may happen. This is a problem if you are running a Squid reverse proxy server and a real (backend) Web server on the same machine because you cannot use the same port (for example, port 80) to bind both Squid and the real Web server. A possible workaround is to allow squid to listen at the real IP address' port 80 and bind the real Web server at loopback address (127.0.0.1) port 80, then direct traffic from Squid to the loopback address.

Another limitation is that Squid cannot effectively act as a reverse proxy for an SSL-based HTTPS backend server. Squid allows https connections in the forward direction through CONNECT() mode, but it does not offer an equivalent reverse direction SSL capability. Squid can act as an HTTPS termination point. In other words, reverse Squid proxy can be configured as an HTTPS server in front of any HTTP (non-SSL)-based server. Thus, traffic will be encrypted between the client browser and the reverse Squid proxy, and unencrypted traffic will pass between the reverse Squid proxy server and the real (backend) server. This solution is often acceptable if the traffic to the backend server passes across a protected internal network segment. Squid version 3.x promises to provide additional encryption options.

Conclusions

Conventional port-based firewalls cannot detect sophisticated traffic, such as a non-HTTP protocol running over port 80. Port 80 is a very lucrative avenue for application service providers (ASPs), as well as for intruders. Proper monitoring of such ports using techniques such as those described in this article can be very helpful to organizations wanting to protect their Web servers. Squid can protect any Web server running the HTTP protocol, including Windows IIS server, Apache, or any other commercial Web server. Squid's powerful redirector feature can automate the redirection process. Squid can also work as an SSL-based HTTPS termination point, which means you can provide an HTTPS connection to your existing (HTTP) Web site without touching your real Web server.

References

Squid Web site -- http://www.squid-cache.org

Squid configuration guide example -- http://squid.visolve.com/

Monitoring Squid health using Calamaris -- http://calamaris.cord.de/

Detailed Squid log analysis using Sarge -- http://web.onda.com.br/orso/index.html

Rajeev Kumar is currently working as Senior Systems & Security Administrator for Fluent Inc. USA. He received his B.E from IIT Roorkee and M.Tech from IIT Kanpur, India, in Chemical Engineering. He has more than six years of Unix/Linux and systems security experience. He maintains the Web site http://www.rajeevnet.com, where he publishes freeware code and systems/security documents. Rajeev can be contacted at: rajeev@rajeevnet.com or rxk@fluent.com.