nov2005.tar

Global Web Site Performance Improvement

Jeffrey Fulmer

For more than a decade, programmers have been refining the art of Web development while systems administrators have been honing configurations, adding hardware, and fattening network pipes. Upgrades at peering points and transport backbones have dramatically improved network performance. Despite all these enhancements, however, many users still register the same complaint -- "Your site is too slow!"

One source of frustration shared by site administrators is varied perception. One user may deem performance to be slow while another thinks it's fantastic. To further complicate this situation, both users may be right. As I'll show later, your Web site may be both slow and fast. Because few customers contact your company with thank-you notes after a wonderful Internet experience, however, your inboxes are more likely to fill with complaints rather than compliments. Those gripes may incorrectly skew the perception of your Web site's performance as they darken your boss's view of your administration skills. A comprehensive picture is necessary to clarify the situation and provide benchmarks against which to measure future improvements.

To gauge the validity of customer complaints and judge the extent to which users experience latency, it is necessary to measure your Web site's performance. To ensure an accurate measurement, multiple benchmarking agents should be positioned in a manner that reflects your customer base. If you have customers scattered throughout the globe and a single Perl script taking measurements, then your data hardly reflects a random customer's experience. If your company has numerous access points, you may be able to leverage its existing infrastructure to build a comprehensive model. For administrators lacking multiple access points, there are other options. You can rent server accounts and collect your own data from various access points, or you can hire a Web monitoring service that provides data points that reflect your user base. Either way, it's important to get a comprehensive picture of your site's performance.

Many administrators are lulled into complacency by either incomplete or inaccurate measurements. One of the more telling views of your data is performance by geographic area (see Figure 1). The most striking thing about this view is the variance from city to city. Now we can see different perceptions for the same Web site. For a customer in San Jose, performance is exceptional. For the unlucky Frankfurt resident, performance is "too slow". How can two people have such very different experiences on the same site? Certainly they may have different hardware, but this data was collected with identical configurations. The answer is distance. The customer in San Jose sits fewer than 30 miles from the Web site, while the customer in Frankfurt is half a world away. Despite our best efforts, geographic discrepancies add latency. Thomas Friedman's world may be flat, but its continents are separated by thousands of miles. As a result, your site is still slow.

Before I continue, it's important to note what I mean by "latency". For the purposes of this article, it is the lapse between an HTTP request and content download. It is commonly defined as the lag between request and response, but that doesn't serve us well for a number of reasons. For one thing, humans generally don't read HTTP response headers; they read Web content. A Web site customer continues to wait after the headers come over the wire. As far as they're concerned, the page didn't load until it rendered inside their browser. Our goal is to improve the experience of customers, not Internet crawlers. The definition was chosen for another reason -- it matches our measurement tool. The mean times on Figure 1 represent the time it took to download the Web page and all its elements.

Latency is added as a result of performance degradation in each of three major segments of an HTTP transaction. They are listed in order from the Web server to the customer:

First mile -- Page generation to Internet access point
Middle mile -- The Internet backbone
Last mile -- From Internet backbone through the user's ISP

To understand how latency is accumulated in each of these areas, it's important to examine them in greater detail.

Most developers and administrators concentrate their efforts on the first mile. It is, after all, the area in which they maintain the greatest control. Any reasonably competent team with a respectable budget can build a site that can render a page in less than a second. Given slightly deeper pockets, our reasonably competent team should be able to launch that site inside a quality data center with fast Internet access. While there is always room to improve first mile performance, a time will come where you can no longer bleed the stone. The return on investment is not worth the cost.

The middle mile spans the Internet "backbone", which is an archaic term for a series of very fast networks linked globally from city to city by a series of high-speed lines. Internet service providers connect to the "backbone" through a series of TCP/IP routers. Throughout the middle mile, latency is added by several factors. The main contributing factor is distance. The further a packet must travel, the longer it takes to complete its round trip time (RTT). Lengthy round trips usually occur through a series of routers. The store and forward nature of routers contributes latency. After the entire packet is read into memory, the device must parse its header then determine where next to send it. Another problem is TCP/IP itself. Each packet must be verified. If delivery verification fails, then the packet is re-sent after a mandatory timeout. If packet loss is especially high, the middle mile will be characterized by high latency.

The third major area in which latency is accrued is the final mile, from the ISP's peering point to the end user's computer. For a Web site whose clientele consists of thousands or millions of unique visitors per month, it isn't practical or cost-effective to upgrade each user's connection speed. While there are actions we can take to improve final mile performance, most latency accrued there is beyond our capacity to improve.

Performance Degradation

Now that we understand where latency is amassed, let's revisit the data displayed in Figure 1. The Web site whose performance was measured resides in the San Francisco Bay area. The page we monitored was relatively light; it included just 15 elements. Given what we know about performance degradation, it is not shocking that customers in California experience the best performance while those on other continents experience the worst. Our unfortunate customers in China were forced to wait six times longer for the same content. According to market research, if a page requires more than eight seconds to load, customers will leave the site. As our Asian customers click their way through meatier portions of the site, they will have little trouble reaching that threshold. Since the business sponsor was committed to delivering content to a global audience, it was necessary to get international performance in line with North America.

While global performance improvement was deemed a priority, most latency was added in areas outside of our control. If we were granted the resources and opportunity to stock the middle mile with unlimited bandwidth and the most robust routers available, latency would still be a problem. As the crow flies, the distance between Frankfurt and San Francisco is 5693 miles. The speed of light is capped at 186,000 miles per second. In perfect conditions, a signal could follow our crow's flight pattern in 32 milliseconds. The round trip is complete in 64 milliseconds. Under such conditions, the multithreaded agent in Frankfurt should complete its task in 1.15 seconds (101 TCP packets handled by four threads). In reality, the task took 5-1/4 seconds. Where did we amass the extra time?

As detailed earlier, packets traverse the middle mile through a series of cables and routers. Each router adds latency. In this scenario, each TCP packet must traverse 12 routers -- 6 per leg. If each router adds 2 milliseconds latency, then another 2.2 seconds are added to our total. The final discrepancy is explained with packet loss. TCP is designed to avoid packet loss. The sender must receive acknowledgement for every packet sent. If a sender does not receive verification before a set timeout (200 ms in most cases), then it resends the packet and every one after it. If 20 packets were sent and packet 13 was lost, then packets 13 through 20 must be retransmitted.

Solutions

Products and services that correct Internet latency tend to address it in one of the three areas I described: the first, middle, or final mile. While there are products that address final mile solutions, most are beyond our control to implement. We can't ask customers to upgrade their Internet connection speeds or increase the sizes of their local cache. For the most part, I will focus on first and middle mile solutions. In some cases, first mile solutions will improve final mile performance.

First Mile Solutions

The products in this category all have one thing in common -- they are implemented in the datacenter. The idea is to improve overall performance by greatly improving the first mile. The faster you can get it out the door and the lighter you make it, the quicker you can deliver it.

The competitors in this area provide similar offerings based on similar technologies. There is really only so much you can do to bleed a stone. Appliance devices by F5, NetScaler, Red Line Networks, and FineGround provide inline compression and TCP optimization. Some offer HTTP protocol optimization and caching.

According to Gartner Research, you can expect a 2 to 1 performance enhancement provided by inline compression and TCP optimization. With a product like one from FineGround with its added HTTP optimization, you can expect a 3 to 6 times performance improvement. On the surface, those numbers sound impressive but most administrators won't experience this type of performance enhancement. Why? If you care enough to read articles such as this one, then you are probably already doing many of the same things the appliances are doing.

Good Web systems administrators are already using mod_gzip to compress everything that is compressible. They already use mod_expires or mod_headers to set explicit cache directives on heavily requested elements. Their content is quick to generate and light over the wire. If you've already implemented many of the features offered by optimization appliances, then your performance enhancement will fall short of the promises. This is not to say these solutions are without merit. With compression offline, you could free CPU cycles and extend the life of your servers. But at the same time, you could add at least one high-end Unix server for the price of these devices.

Middle Mile Solutions

As mentioned above, latency accrues with distance under perfect conditions. Middle mile solutions that address this matter take one of two forms. One is to move content closer to customers. Another is to correct inherent problems associated with TCP/IP and Internet traffic routing. In this section, we'll consider several options that allow us to improve performance on the middle mile.

Distance delays can be corrected by simply distributing content closer to users. If our Frankfurt customers pull content from London rather than San Francisco, then 10,598 miles are shaved off their round trip. The theoretical download time is reduced from 1.15 seconds to seven-tenths of a second.

Caching reverse proxies can be employed to move content closer to customers. To implement such a solution, you will need to position servers regionally. Geographic DNS will allow you to map host names by region. If a user in Melbourne requests your server in San Francisco, geographic DNS could send them instead to a caching server on the Pacific Rim. The proxy serves static content from its cache and dynamic content directly from San Francisco. The middle mile is dramatically reduced along with its latency. Unfortunately, this solution is not cheap. If your company has a global infrastructure, you could leverage it to position redundant inexpensive caching servers on all major continents. Geographic DNS servers are not inexpensive. Indeed, their price tag may kill the project before it gets started. Fortunately, there is an open source player in this field, PowerDNS. While it lacks some features offered by its commercial counterparts, it may be more than adequate for your needs. You may also consider outsourcing CNAMES to a Geographic DNS provider.

Rather than implementing a geographic caching system or distributing load over regional Web servers, you could acquire the services of a specialist. Two of the biggest players in this field are Netli and Akamai. Both rely on geographic caching to move content closer to end users, but Netli offers a proprietary protocol that reduces some of the inherent weaknesses of TCP/IP. Akamai offers its own networking refinements. It employs routing optimization to improve its network performance. In Figure 2, we see the same Web server from Figure 1, but this time its content is delivered through Netli's Web accelerator. The curve appears similar, but now the average download time from Beijing drops from more than six seconds to little more than two.

Final Mile Solutions

For all practical purposes, final mile solutions will be implemented in the first mile. Customer performance can be improved in the final mile by compressing data in the first. We can reduce the number of server requests with explicit caching directives on the server. The lighter the pages, the faster they'll move through a customer's ISP. While you may have implemented many of these solutions, there may still be room for improvement. The savvy administrator will recognize the point of diminished return.

Conclusion

The first step toward global performance improvement is comprehensive monitoring. As indicated earlier, it is important that monitoring agents are dispersed so that they mirror your customer base. Each refinement should be evaluated by its effect on performance data. A comprehensive picture that reflects your customer base will help you make the business case for additional hardware or services. A competent bean counter will never cut a check based on an administrator's hunches. In the real world, you need data.

With adequate monitoring in place, you are ready to make refinements to the area in which you have the most control. First mile tuning can go a long way toward improving final mile performance. Make sure your systems have adequate resources and they are tuned specifically for delivering Web content. (O'Reilly's Web Performance Tuning is a great place to start.) Compress all the content you can because the smaller it is, the faster it travels over the wire. You can decrease load and improve performance by setting explicit caching directives. (See my article in Sys Admin, March 2005,: http://www.samag.com/documents/s=9559/sam0503b/.) It requires less time to pull an element from a local cache than it does to pull it off a server half a world away.

As you hone your first mile configuration, check it against your monitoring data. If performance measurements meet expectations for your customer base, then you may not require further refinement. Unfortunately, many administrators provide content to a global audience. Unless your site is very lean, chances are that customers are still above the desired threshold.

In most cases, distance is a primary culprit in the conspiracy to slow down the Web. Performance could be greatly enhanced if we simply moved customers closer to the Web site, but it's more realistic to move content closer to them. Serving European content in Europe and Asian content in Asia creates logistical problems -- it requires multiple content replications and it necessitates redundant hardware and data center expenses. If your primary site was expensive, how much will it cost to bring up another two? For this reason, it is often better to rely on geographically distributed caching proxies or accelerator services such as Netli or Akamai.

The world is getting flatter, but it doesn't have to be slow. The tips in this article will help you deliver content quickly to all your customers, not just the ones next to your data center.

References

Friedman, Thomas. 2005. The World Is Flat. New York: Farrar, Straus and Giroux.

Fulmer, Jeffrey. 2005. "Save Bandwidth and Increase Performance with Cache-Control Response". Sys Admin 14(3): 13-15. http://www.samag.com/documents/s=9559/sam0503b/

Killelea, Patrick. 2002. Web Performance Tuning, 2nd. Ed. Sebastopol, CA: O'Reilly & Associates.

PowerDNS Web site -- http://www.powerdns.com/

Jeffrey Fulmer has administered enterprise computer systems professionally since 1995. He is an open source software developer and the primary author of siege. He currently resides in Pennsylvania with his wife and English bulldog.