Cover V13, i07

Article

jul2004.tar

Using More Tentacles of Squid

Ralf van Dooren

Many small and medium businesses access the Internet with a relatively small Internet connection (DSL, cable, or even POTS/ISDN). These businesses have a box, much like you may have at home, that handles the traffic to and from the Internet. The box proxies the traffic so all employees can use the Internet.

To take the burden off this Internet connection and to control what employees can access, it is wise to install a caching proxy that proxies the request for a Web site and caches the results on local storage. This way, if Bob visits the same sites as Alice, a lot of content is already local and doesn't have to be transferred through the Internet connection.

In many cases, this Internet link is also used for the corporate Web site, and is full of dynamic content for your (potential) customers. In normal circumstances, this Web server probably runs fine, but what if your company's Web site is Slashdotted? In this article, I'll show you how to enhance the "standard" usage of a caching proxy to also run as a Web server accelerator for your corporate Web server.

Squid is a Web proxy cache. It is open source and is one of the most used proxy servers in the world to proxy requests and cache results. But as its name implies, Squid can do more than that. In a previous article in Sys Admin magazine, Rajeev Kumar described a way to firewall your corporate Web site using Squid (http://www.samag.com/documents/s=9023/sam0402c/). I'll use a combination of these functions: offload (and firewall) your corporate Web site, and cache the traffic for your employees so your connection will have some bandwidth left.

Off-loading your corporate Web server can be advantageous if your Web server is sustaining a rather heavy load. If you use Squid in Web Server Acceleration mode, the static content (images, css-sheets, etc.) won't have to be retrieved from the Web server itself, and the server can then focus its CPU cycles to the dynamic content. Overall, the users of your corporate Web server will have a much better "surfing experience", which is vital for your Web appearance -- a slow responding Web site won't create as much revenue as a fast loading Web site. Furthermore, as described in Kumar's Squid article, this technique can be used to firewall your Web server.

Installing Squid

As always, there are a couple of methods to install Squid. A great many operating systems have developed a way to (semi-)automatically install your software. For example, if you use FreeBSD, installing Squid is as simple as:

(freebsd)# cd /usr/ports/www/squid
(freebsd)# make install clean
If you haven't installed Squid before, you'll get a menu with several choices. Adjust them to your preferences. If you need to make changes to this configuration, use make config in /usr/ports/www/squid.

Alternatively, you can download the source from http://www.squid-cache.org and compile the source yourself. This gives you the flexibility (but also the complexity) of tuning the Squid installation. As most configuration options are specific for your system setup, I won't go into the details of that.

Configuring the Proxy

I assume that you've now installed Squid successfully. I use a FreeBSD ports installation as a path reference for files; the location of files on your system may be different.

To begin, the caching proxy must be configured. I'll only show a basic configuration. You can adjust this to your needs. Many Web sites have configuration examples, so you can search those for inspiration.

So, let's say we have a LAN network, 192.168.10.0/24, and all users on it are allowed to surf the Web. In /usr/local/etc/squid/squid.conf, add/change the following lines:

http_port 192.168.10.1:3128
acl lan_users src 192.168.10.0 255.255.255.0
http_access allow lan_users
You can define the string "lan_users" yourself. Just pick something appropriate, so you'll remember later what it is. Port 3128 will be the port on which Squid listens, on address 192.168.10.1 (the LAN side of your box).

Configuring the Web Accelerator

Installing Squid in front of your Web server means that users will connect to Squid, as if it were the Web server. So, www.yourcompany.tld should resolve to the IP address of the Squid server. In this setup, your Web server can only have an internal IP address; it doesn't have to be reachable from the Internet.

Because the client's browser is exchanging traffic with the Squid server, Squid has to listen to port 80. Squid also needs to know where to find the real content, so we must configure the internal IP address of the Web server. Squid also needs to be told that it should act both as a proxy and a Web accelerator; this is not default mode. To the squid.conf, add the following lines:

http_port 80
httpd_accel_host <ip address of web server>
httpd_accel_with_proxy on
httpd_accel_single_host on
This last line ensures minimal changes in the HTTP request.

Additionally, you must make sure the Internet users cannot use your proxy setup as a stepping stone; Squid should only cache the corporate Web site for them and not the rest of the Internet. ACLs within squid.conf will accomplish this task:

acl webserver <ip address of web server>/255.255.255.255
http_access allow webserver
Testing the Configuration

To test the saved configuration, we run /usr/local/sbin/squid -k parse. This checks the configuration for any errors. When there's no output, the configuration file is valid.

Before Squid can be started for the first time, run /usr/local/sbin/squid -z, which will initiate the caching directories. Then start Squid with no options; Squid will start itself in the background. Now you can test your setup. Surfing from the local net should work (remember to enter the proxy IP and port in the client's browser configuration). If you get "access denied" messages in your browser, there might be an ACL error. ACLs are processed from top to bottom in squid.conf, so make sure your "http_access allow" statements are set before the "http_access deny all".

The next thing to test is your corporate Web site, as seen from the Internet. If you already changed the DNS configuration to reflect the new situation, you can surf to www.yourcompany.tld, but you can also surf to the IP address of the Squid server. Be sure that you are not accessing the Web site from your LAN. If everything goes as planned, you'll see your corporate Web site appearing on your screen.

Not from Your LAN?

The solution presented has one major flaw -- if you now try to access the corporate Web site from your LAN through the Squid server, you'll get an "Access Denied". If you look in your cache.log file, you'll see "Forwarding loop detected" errors. This cannot be avoided because what actually happens is that the proxy asks itself for a page, which results in a forwarding loop.

There are several options to circumvent this. If you use an internal DNS view, which is different from the DNS view on the Internet, you could change the IP address internally to the Web server, instead of the Squid server. Alternatively, you can exclude www.yourcompany.tld from the proxy list, though this will require manually reconfiguring each browser. Some browsers also support PAC scripts as a means of reconfiguring their browser settings. Here's an example of such a PAC script:

function FindProxyForURL(url, host)
{
if (isInNet(host, "192.168.0.0", "255.255.0.0"))
    return "DIRECT";
else
    return "PROXY 192.168.10.1:3128";
}
This will configure the browser to bypass the proxy (on port 3128) when the host resides in the local LAN segment. Of course, your client is still connecting to the Squid server (on port 80) which proxies the corporate Web server.

Conclusion

Using Squid as both a caching proxy for your LAN clients and an off-loading mechanism for your corporate Web server can help you save money on your Internet connection which also helps your Web server to survive a Slashdot effect.

References

PAC file format -- http://wp.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html

Slashdot effect -- http://en.wikipedia.org/wiki/Slashdotted

Squid homepage -- http://www.squid-cache.org

Ralf van Dooren works as a consultant for Snow (http://snow.nl), a leading Unix consultancy company based in the Netherlands. He holds various certifications for Unix (SCNA, LPIC2) and networking (CCNP) and is challenged by clients to find the right answer (42) for their problems. He can be reached at: r.vdooren@snow.nl.