Global
Web Site Performance Improvement
Jeffrey Fulmer
For more than a decade, programmers have been refining the art
of Web development while systems administrators have been honing
configurations, adding hardware, and fattening network pipes. Upgrades
at peering points and transport backbones have dramatically improved
network performance. Despite all these enhancements, however, many
users still register the same complaint -- "Your site is too slow!"
One source of frustration shared by site administrators is varied
perception. One user may deem performance to be slow while another
thinks it's fantastic. To further complicate this situation, both
users may be right. As I'll show later, your Web site may be both
slow and fast. Because few customers contact your company with thank-you
notes after a wonderful Internet experience, however, your inboxes
are more likely to fill with complaints rather than compliments.
Those gripes may incorrectly skew the perception of your Web site's
performance as they darken your boss's view of your administration
skills. A comprehensive picture is necessary to clarify the situation
and provide benchmarks against which to measure future improvements.
To gauge the validity of customer complaints and judge the extent
to which users experience latency, it is necessary to measure your
Web site's performance. To ensure an accurate measurement, multiple
benchmarking agents should be positioned in a manner that reflects
your customer base. If you have customers scattered throughout the
globe and a single Perl script taking measurements, then your data
hardly reflects a random customer's experience. If your company
has numerous access points, you may be able to leverage its existing
infrastructure to build a comprehensive model. For administrators
lacking multiple access points, there are other options. You can
rent server accounts and collect your own data from various access
points, or you can hire a Web monitoring service that provides data
points that reflect your user base. Either way, it's important to
get a comprehensive picture of your site's performance.
Many administrators are lulled into complacency by either incomplete
or inaccurate measurements. One of the more telling views of your
data is performance by geographic area (see Figure 1). The most
striking thing about this view is the variance from city to city.
Now we can see different perceptions for the same Web site. For
a customer in San Jose, performance is exceptional. For the unlucky
Frankfurt resident, performance is "too slow". How can two people
have such very different experiences on the same site? Certainly
they may have different hardware, but this data was collected with
identical configurations. The answer is distance. The customer in
San Jose sits fewer than 30 miles from the Web site, while the customer
in Frankfurt is half a world away. Despite our best efforts, geographic
discrepancies add latency. Thomas Friedman's world may be flat,
but its continents are separated by thousands of miles. As a result,
your site is still slow.
Before I continue, it's important to note what I mean by "latency".
For the purposes of this article, it is the lapse between an HTTP
request and content download. It is commonly defined as the lag
between request and response, but that doesn't serve us well for
a number of reasons. For one thing, humans generally don't read
HTTP response headers; they read Web content. A Web site customer
continues to wait after the headers come over the wire. As far as
they're concerned, the page didn't load until it rendered inside
their browser. Our goal is to improve the experience of customers,
not Internet crawlers. The definition was chosen for another reason
-- it matches our measurement tool. The mean times on Figure 1 represent
the time it took to download the Web page and all its elements.
Latency is added as a result of performance degradation in each
of three major segments of an HTTP transaction. They are listed
in order from the Web server to the customer:
- First mile -- Page generation to Internet access point
- Middle mile -- The Internet backbone
- Last mile -- From Internet backbone through the user's ISP
To understand how latency is accumulated in each of these areas,
it's important to examine them in greater detail.
Most developers and administrators concentrate their efforts on
the first mile. It is, after all, the area in which they maintain
the greatest control. Any reasonably competent team with a respectable
budget can build a site that can render a page in less than a second.
Given slightly deeper pockets, our reasonably competent team should
be able to launch that site inside a quality data center with fast
Internet access. While there is always room to improve first mile
performance, a time will come where you can no longer bleed the
stone. The return on investment is not worth the cost.
The middle mile spans the Internet "backbone", which is an archaic
term for a series of very fast networks linked globally from city
to city by a series of high-speed lines. Internet service providers
connect to the "backbone" through a series of TCP/IP routers. Throughout
the middle mile, latency is added by several factors. The main contributing
factor is distance. The further a packet must travel, the longer
it takes to complete its round trip time (RTT). Lengthy round trips
usually occur through a series of routers. The store and forward
nature of routers contributes latency. After the entire packet is
read into memory, the device must parse its header then determine
where next to send it. Another problem is TCP/IP itself. Each packet
must be verified. If delivery verification fails, then the packet
is re-sent after a mandatory timeout. If packet loss is especially
high, the middle mile will be characterized by high latency.
The third major area in which latency is accrued is the final
mile, from the ISP's peering point to the end user's computer. For
a Web site whose clientele consists of thousands or millions of
unique visitors per month, it isn't practical or cost-effective
to upgrade each user's connection speed. While there are actions
we can take to improve final mile performance, most latency accrued
there is beyond our capacity to improve.
Performance Degradation
Now that we understand where latency is amassed, let's revisit
the data displayed in Figure 1. The Web site whose performance was
measured resides in the San Francisco Bay area. The page we monitored
was relatively light; it included just 15 elements. Given what we
know about performance degradation, it is not shocking that customers
in California experience the best performance while those on other
continents experience the worst. Our unfortunate customers in China
were forced to wait six times longer for the same content. According
to market research, if a page requires more than eight seconds to
load, customers will leave the site. As our Asian customers click
their way through meatier portions of the site, they will have little
trouble reaching that threshold. Since the business sponsor was
committed to delivering content to a global audience, it was necessary
to get international performance in line with North America.
While global performance improvement was deemed a priority, most
latency was added in areas outside of our control. If we were granted
the resources and opportunity to stock the middle mile with unlimited
bandwidth and the most robust routers available, latency would still
be a problem. As the crow flies, the distance between Frankfurt
and San Francisco is 5693 miles. The speed of light is capped at
186,000 miles per second. In perfect conditions, a signal could
follow our crow's flight pattern in 32 milliseconds. The round trip
is complete in 64 milliseconds. Under such conditions, the multithreaded
agent in Frankfurt should complete its task in 1.15 seconds (101
TCP packets handled by four threads). In reality, the task took
5-1/4 seconds. Where did we amass the extra time?
As detailed earlier, packets traverse the middle mile through
a series of cables and routers. Each router adds latency. In this
scenario, each TCP packet must traverse 12 routers -- 6 per leg.
If each router adds 2 milliseconds latency, then another 2.2 seconds
are added to our total. The final discrepancy is explained with
packet loss. TCP is designed to avoid packet loss. The sender must
receive acknowledgement for every packet sent. If a sender does
not receive verification before a set timeout (200 ms in most cases),
then it resends the packet and every one after it. If 20 packets
were sent and packet 13 was lost, then packets 13 through 20 must
be retransmitted.
Solutions
Products and services that correct Internet latency tend to address
it in one of the three areas I described: the first, middle, or
final mile. While there are products that address final mile solutions,
most are beyond our control to implement. We can't ask customers
to upgrade their Internet connection speeds or increase the sizes
of their local cache. For the most part, I will focus on first and
middle mile solutions. In some cases, first mile solutions will
improve final mile performance.
First Mile Solutions
The products in this category all have one thing in common --
they are implemented in the datacenter. The idea is to improve overall
performance by greatly improving the first mile. The faster you
can get it out the door and the lighter you make it, the quicker
you can deliver it.
The competitors in this area provide similar offerings based on
similar technologies. There is really only so much you can do to
bleed a stone. Appliance devices by F5, NetScaler, Red Line Networks,
and FineGround provide inline compression and TCP optimization.
Some offer HTTP protocol optimization and caching.
According to Gartner Research, you can expect a 2 to 1 performance
enhancement provided by inline compression and TCP optimization.
With a product like one from FineGround with its added HTTP optimization,
you can expect a 3 to 6 times performance improvement. On the surface,
those numbers sound impressive but most administrators won't experience
this type of performance enhancement. Why? If you care enough to
read articles such as this one, then you are probably already doing
many of the same things the appliances are doing.
Good Web systems administrators are already using mod_gzip to
compress everything that is compressible. They already use mod_expires
or mod_headers to set explicit cache directives on heavily requested
elements. Their content is quick to generate and light over the
wire. If you've already implemented many of the features offered
by optimization appliances, then your performance enhancement will
fall short of the promises. This is not to say these solutions are
without merit. With compression offline, you could free CPU cycles
and extend the life of your servers. But at the same time, you could
add at least one high-end Unix server for the price of these devices.
Middle Mile Solutions
As mentioned above, latency accrues with distance under perfect
conditions. Middle mile solutions that address this matter take
one of two forms. One is to move content closer to customers. Another
is to correct inherent problems associated with TCP/IP and Internet
traffic routing. In this section, we'll consider several options
that allow us to improve performance on the middle mile.
Distance delays can be corrected by simply distributing content
closer to users. If our Frankfurt customers pull content from London
rather than San Francisco, then 10,598 miles are shaved off their
round trip. The theoretical download time is reduced from 1.15 seconds
to seven-tenths of a second.
Caching reverse proxies can be employed to move content closer
to customers. To implement such a solution, you will need to position
servers regionally. Geographic DNS will allow you to map host names
by region. If a user in Melbourne requests your server in San Francisco,
geographic DNS could send them instead to a caching server on the
Pacific Rim. The proxy serves static content from its cache and
dynamic content directly from San Francisco. The middle mile is
dramatically reduced along with its latency. Unfortunately, this
solution is not cheap. If your company has a global infrastructure,
you could leverage it to position redundant inexpensive caching
servers on all major continents. Geographic DNS servers are not
inexpensive. Indeed, their price tag may kill the project before
it gets started. Fortunately, there is an open source player in
this field, PowerDNS. While it lacks some features offered by its
commercial counterparts, it may be more than adequate for your needs.
You may also consider outsourcing CNAMES to a Geographic DNS provider.
Rather than implementing a geographic caching system or distributing
load over regional Web servers, you could acquire the services of
a specialist. Two of the biggest players in this field are Netli
and Akamai. Both rely on geographic caching to move content closer
to end users, but Netli offers a proprietary protocol that reduces
some of the inherent weaknesses of TCP/IP. Akamai offers its own
networking refinements. It employs routing optimization to improve
its network performance. In Figure 2, we see the same Web server
from Figure 1, but this time its content is delivered through Netli's
Web accelerator. The curve appears similar, but now the average
download time from Beijing drops from more than six seconds to little
more than two.
Final Mile Solutions
For all practical purposes, final mile solutions will be implemented
in the first mile. Customer performance can be improved in the final
mile by compressing data in the first. We can reduce the number
of server requests with explicit caching directives on the server.
The lighter the pages, the faster they'll move through a customer's
ISP. While you may have implemented many of these solutions, there
may still be room for improvement. The savvy administrator will
recognize the point of diminished return.
Conclusion
The first step toward global performance improvement is comprehensive
monitoring. As indicated earlier, it is important that monitoring
agents are dispersed so that they mirror your customer base. Each
refinement should be evaluated by its effect on performance data.
A comprehensive picture that reflects your customer base will help
you make the business case for additional hardware or services.
A competent bean counter will never cut a check based on an administrator's
hunches. In the real world, you need data.
With adequate monitoring in place, you are ready to make refinements
to the area in which you have the most control. First mile tuning
can go a long way toward improving final mile performance. Make
sure your systems have adequate resources and they are tuned specifically
for delivering Web content. (O'Reilly's Web Performance Tuning
is a great place to start.) Compress all the content you can because
the smaller it is, the faster it travels over the wire. You can
decrease load and improve performance by setting explicit caching
directives. (See my article in Sys Admin, March 2005,: http://www.samag.com/documents/s=9559/sam0503b/.)
It requires less time to pull an element from a local cache than
it does to pull it off a server half a world away.
As you hone your first mile configuration, check it against your
monitoring data. If performance measurements meet expectations for
your customer base, then you may not require further refinement.
Unfortunately, many administrators provide content to a global audience.
Unless your site is very lean, chances are that customers are still
above the desired threshold.
In most cases, distance is a primary culprit in the conspiracy
to slow down the Web. Performance could be greatly enhanced if we
simply moved customers closer to the Web site, but it's more realistic
to move content closer to them. Serving European content in Europe
and Asian content in Asia creates logistical problems -- it requires
multiple content replications and it necessitates redundant hardware
and data center expenses. If your primary site was expensive, how
much will it cost to bring up another two? For this reason, it is
often better to rely on geographically distributed caching proxies
or accelerator services such as Netli or Akamai.
The world is getting flatter, but it doesn't have to be slow.
The tips in this article will help you deliver content quickly to
all your customers, not just the ones next to your data center.
References
Friedman, Thomas. 2005. The World Is Flat. New York: Farrar,
Straus and Giroux.
Fulmer, Jeffrey. 2005. "Save Bandwidth and Increase Performance
with Cache-Control Response". Sys Admin 14(3): 13-15. http://www.samag.com/documents/s=9559/sam0503b/
Killelea, Patrick. 2002. Web Performance Tuning, 2nd. Ed.
Sebastopol, CA: O'Reilly & Associates.
PowerDNS Web site -- http://www.powerdns.com/
Jeffrey Fulmer has administered enterprise computer systems
professionally since 1995. He is an open source software developer
and the primary author of siege. He currently resides in Pennsylvania
with his wife and English bulldog. |