Using
More Tentacles of Squid
Ralf van Dooren
Many small and medium businesses access the Internet with a relatively
small Internet connection (DSL, cable, or even POTS/ISDN). These
businesses have a box, much like you may have at home, that handles
the traffic to and from the Internet. The box proxies the traffic
so all employees can use the Internet.
To take the burden off this Internet connection and to control
what employees can access, it is wise to install a caching proxy
that proxies the request for a Web site and caches the results on
local storage. This way, if Bob visits the same sites as Alice,
a lot of content is already local and doesn't have to be transferred
through the Internet connection.
In many cases, this Internet link is also used for the corporate
Web site, and is full of dynamic content for your (potential) customers.
In normal circumstances, this Web server probably runs fine, but
what if your company's Web site is Slashdotted? In this article,
I'll show you how to enhance the "standard" usage of a caching proxy
to also run as a Web server accelerator for your corporate Web server.
Squid is a Web proxy cache. It is open source and is one of the
most used proxy servers in the world to proxy requests and cache
results. But as its name implies, Squid can do more than that. In
a previous article in Sys Admin magazine, Rajeev Kumar described
a way to firewall your corporate Web site using Squid (http://www.samag.com/documents/s=9023/sam0402c/).
I'll use a combination of these functions: offload (and firewall)
your corporate Web site, and cache the traffic for your employees
so your connection will have some bandwidth left.
Off-loading your corporate Web server can be advantageous if your
Web server is sustaining a rather heavy load. If you use Squid in
Web Server Acceleration mode, the static content (images, css-sheets,
etc.) won't have to be retrieved from the Web server itself, and
the server can then focus its CPU cycles to the dynamic content.
Overall, the users of your corporate Web server will have a much
better "surfing experience", which is vital for your Web appearance
-- a slow responding Web site won't create as much revenue as a
fast loading Web site. Furthermore, as described in Kumar's Squid
article, this technique can be used to firewall your Web server.
Installing Squid
As always, there are a couple of methods to install Squid. A great
many operating systems have developed a way to (semi-)automatically
install your software. For example, if you use FreeBSD, installing
Squid is as simple as:
(freebsd)# cd /usr/ports/www/squid
(freebsd)# make install clean
If you haven't installed Squid before, you'll get a menu with several
choices. Adjust them to your preferences. If you need to make changes
to this configuration, use make config in /usr/ports/www/squid.
Alternatively, you can download the source from http://www.squid-cache.org
and compile the source yourself. This gives you the flexibility
(but also the complexity) of tuning the Squid installation. As most
configuration options are specific for your system setup, I won't
go into the details of that.
Configuring the Proxy
I assume that you've now installed Squid successfully. I use a
FreeBSD ports installation as a path reference for files; the location
of files on your system may be different.
To begin, the caching proxy must be configured. I'll only show
a basic configuration. You can adjust this to your needs. Many Web
sites have configuration examples, so you can search those for inspiration.
So, let's say we have a LAN network, 192.168.10.0/24, and all
users on it are allowed to surf the Web. In /usr/local/etc/squid/squid.conf,
add/change the following lines:
http_port 192.168.10.1:3128
acl lan_users src 192.168.10.0 255.255.255.0
http_access allow lan_users
You can define the string "lan_users" yourself. Just pick something
appropriate, so you'll remember later what it is. Port 3128 will be
the port on which Squid listens, on address 192.168.10.1 (the LAN
side of your box).
Configuring the Web Accelerator
Installing Squid in front of your Web server means that users
will connect to Squid, as if it were the Web server. So, www.yourcompany.tld
should resolve to the IP address of the Squid server. In this setup,
your Web server can only have an internal IP address; it doesn't
have to be reachable from the Internet.
Because the client's browser is exchanging traffic with the Squid
server, Squid has to listen to port 80. Squid also needs to know
where to find the real content, so we must configure the internal
IP address of the Web server. Squid also needs to be told that it
should act both as a proxy and a Web accelerator; this is not default
mode. To the squid.conf, add the following lines:
http_port 80
httpd_accel_host <ip address of web server>
httpd_accel_with_proxy on
httpd_accel_single_host on
This last line ensures minimal changes in the HTTP request.
Additionally, you must make sure the Internet users cannot use
your proxy setup as a stepping stone; Squid should only cache the
corporate Web site for them and not the rest of the Internet. ACLs
within squid.conf will accomplish this task:
acl webserver <ip address of web server>/255.255.255.255
http_access allow webserver
Testing the Configuration
To test the saved configuration, we run /usr/local/sbin/squid
-k parse. This checks the configuration for any errors. When
there's no output, the configuration file is valid.
Before Squid can be started for the first time, run /usr/local/sbin/squid
-z, which will initiate the caching directories. Then start
Squid with no options; Squid will start itself in the background.
Now you can test your setup. Surfing from the local net should work
(remember to enter the proxy IP and port in the client's browser
configuration). If you get "access denied" messages in your browser,
there might be an ACL error. ACLs are processed from top to bottom
in squid.conf, so make sure your "http_access allow" statements
are set before the "http_access deny all".
The next thing to test is your corporate Web site, as seen from
the Internet. If you already changed the DNS configuration to reflect
the new situation, you can surf to www.yourcompany.tld, but you
can also surf to the IP address of the Squid server. Be sure that
you are not accessing the Web site from your LAN. If everything
goes as planned, you'll see your corporate Web site appearing on
your screen.
Not from Your LAN?
The solution presented has one major flaw -- if you now try to
access the corporate Web site from your LAN through the Squid server,
you'll get an "Access Denied". If you look in your cache.log file,
you'll see "Forwarding loop detected" errors. This cannot be avoided
because what actually happens is that the proxy asks itself for
a page, which results in a forwarding loop.
There are several options to circumvent this. If you use an internal
DNS view, which is different from the DNS view on the Internet,
you could change the IP address internally to the Web server, instead
of the Squid server. Alternatively, you can exclude www.yourcompany.tld
from the proxy list, though this will require manually reconfiguring
each browser. Some browsers also support PAC scripts as a means
of reconfiguring their browser settings. Here's an example of such
a PAC script:
function FindProxyForURL(url, host)
{
if (isInNet(host, "192.168.0.0", "255.255.0.0"))
return "DIRECT";
else
return "PROXY 192.168.10.1:3128";
}
This will configure the browser to bypass the proxy (on port 3128)
when the host resides in the local LAN segment. Of course, your client
is still connecting to the Squid server (on port 80) which proxies
the corporate Web server.
Conclusion
Using Squid as both a caching proxy for your LAN clients and an
off-loading mechanism for your corporate Web server can help you
save money on your Internet connection which also helps your Web
server to survive a Slashdot effect.
References
PAC file format -- http://wp.netscape.com/eng/mozilla/2.0/relnotes/demo/proxy-live.html
Slashdot effect -- http://en.wikipedia.org/wiki/Slashdotted
Squid homepage -- http://www.squid-cache.org
Ralf van Dooren works as a consultant for Snow (http://snow.nl),
a leading Unix consultancy company based in the Netherlands. He
holds various certifications for Unix (SCNA, LPIC2) and networking
(CCNP) and is challenged by clients to find the right answer (42)
for their problems. He can be reached at: r.vdooren@snow.nl. |