Peter Scheuermann*, Junho Shim*,
*Department of Electrical and Computer Engineering
Evanston, IL 60208
400 Oracle Parkway, Box 659413
Redwood Shores, CA 94065
Caching at proxy servers plays an important role in reducing the latency of the user response, the network delays and the load on Web servers. The cache performance depends critically on the design of the cache replacement algorithm. Unfortunately, most cache replacement algorithms ignore the Web's scale. In this paper we argue for the design of delay-conscious cache replacement algorithms which explicitly consider the Web's scale by preferentially caching documents which require a long time to fetch to the cache. We present a new, delay-conscious cache replacement algorithm LNC-R-W3 which maximizes a performance metric called delay-savings-ratio. Subsequently, we test the performance of LNC-R-W3 experimentally and compare it with the performance of other existing cache replacement algorithms, namely LRU and LRU-MIN.
The World Wide Web has become the predominant client/server architecture. Its simplicity and incremental design led not only to its quick adoption, but unfortunately also to its poor performance. The high response time perceived by Web clients is caused primarily by the long communication delays, although other factors (such as slow service times) also contribute.
The communication delay to a certain extent can be reduced by buying links with a higher bandwidth and improving the efficiency of communication protocols . However, a significant fraction of the communication delay is due to propagation latency which can be reduced only up to the limits imposed by the speed of light. Consequently, one of the most effective ways of reducing the communication delay is by employing caching. The documents can be cached at clients as is done currently by most Web browsers , or by the Web servers themselves, which is most useful when a server contains many pointers to other servers . But both alternatives do little towards reducing the overall load on the network. Instead, caching proxies, or proxy servers, which act on behalf on a number of users can reduce the number of messages on the network and have been recognized as the preferred caching strategy on the Web [1, 4, 7, 8]. In an attempt to tune the performance of caching proxies several techniques have been used, such as avoiding caching of documents that originate at nearby servers, and hierarchies of caches .
Cache replacement algorithms play a central role in the design of any caching component; these algorithms have been extensively studied in the context of operating system virtual memory management and database buffer management [5, 12]. Cache replacement algorithms usually maximize the cache hit ratio by attempting to cache the data items which are most likely to be referenced in the future. Since the future data reference pattern is typically difficult to predict, a common approach is to extrapolate from the past by caching the data items which were referenced most frequently. This approach is exemplified by, e.g., the LRU cache replacement algorithm.
We argue in this paper, however, that maximizing the cache hit ratio alone does not guarantee the best client response time in the Web environment. In addition to maximizing the cache hit ratio, a cache replacement algorithm for Web documents should also minimize the cost of cache misses--i.e., the delays caused by fetching documents not found in the cache. Clearly, the documents which took a long time to fetch should be preferentially retained in the cache. For example, consider a caching proxy at Northwestern University. The cache replacement algorithm at the proxy found two possible candidates for replacement. Both documents have the same size and the same reference frequency, but one document originates from the University of Chicago while the other is from Seoul National University. The cache replacement algorithm should select for replacement the document from the University of Chicago and retain the document from Seoul National University because upon a cache miss the former can be fetched much faster than the latter.
We define a new performance metric called delay-savings ratio which generalizes the hit ratio metric by explicitly considering cache miss costs. Subsequently, we describe a new cache replacement algorithm LNC-R which maximizes the delay-savings ratio. The LNC-R cache replacement algorithm approximates the optimal cache replacement algorithm, which is NP-complete. Thus, the design of LNC-R relies on a solid theoretical foundation. LNC-R estimates the probability of a future reference to a document by using a moving average of the last K inter-arrival times of requests to the document. However, it has been observed that Web clients exhibit a strong preference for accessing small documents [6, 7, 9, 11]. Therefore, we take advantage of this domain-specific information and base the estimate of a probability of future reference both on the past reference pattern and also on the size of the document. We call the resulting cache replacement algorithm LNC-R-W3. We compare the performance of LNC-R-W3 against other existing algorithms, namely LRU and LRU-MIN, using a trace-driven simulation based on Web client requests.
The remainder of this paper is structured as follows: In Section 2 we describe our cache replacement algorithms. Section 3 contains a comparison of the performance of our algorithms with other existing cache replacement algorithms. Section 4 discusses the related work. Finally, Section 5 concludes the paper and describes the directions of future work.
We first describe the basic LNC-R (Least Normalized Cost Replacement) algorithm. LNC-R is a generic cache replacement algorithm suited for caching data items of different sizes with different costs of materializing in the cache. LNC-R was initially developed for caching query results in a data warehousing environment . We subsequently extend LNC-R by taking advantage of the preference of Web clients for for accessing small documents. We call the resulting cache replacement algorithm for proxy caching on the Web LNC-R-W3.
LNC-R aims at maximizing the delay-savings ratio(DSR) defined as
where di is the average delay to fetch document Di to cache, ri is the total number of references to Di and hi is the number of references to Di which were satisfied from the cache. All sums are defined over all referenced documents. The delay-savings ratio determines the fraction of communication and server delays which is saved by satisfying the references from cache instead of a remote server. If the time to satisfy a request from cache is smaller than the time to satisfy it from a remote server (what is the typical case) then a maximal delay-savings ratio guarantees also minimal client request response time.
LNC-R maintains the following statistics with each cached document Di:
In order to maximize the delay-savings ratio, the above statistics are combined into one performance metric, called profit, defined as
LNC-R selects for replacement the least profitable documents. Let us assume that document Dj with size sj needs to be cached and the amount of free space in the cache is less than sj. Then LNC-R sorts all documents held in the cache in ascending order of profit and selects the candidates for eviction in the sort order. To explain the rationale behind this algorithm, it is enough to realize that profit of a document Di gives the normalized contribution of this document to the reduction of the DSR, namely the expected delay saving obtained from caching each byte of Di. LNC-R simply tries to maximize the delay savings ratio (i.e. total cache "profitability") by maximizing the profit from each unit of storage
The average reference rate li can be estimated using the time of last reference as in the LRU cache replacement algorithm. However, it has been pointed out in  that LRU performs inadequately in presence of bursty workloads or workloads consisting of multiple classes with different characteristics- workload likely to appear on the Web. To deal with these difficulties,  proposed estimating the reference rates from the last K (K >= 1) samples of reference times. Following their rationale, we define the average reference rate li as
where t is the current time and tk is the time of the K-th reference. The incorporation of the current time into the estimate of reference rate li is important, because it guarantees "aging" of documents which are not referenced at all (as opposed to frequence-count based approaches similar to LFU). For performance efficiency, the estimate is updated only upon reference to Di or at fixed times in absence of the former.
Whenever fewer than K reference times are available for a document Di, the average reference rate li is estimated using the maximal number of available samples. However, such an estimate is statistically less reliable and thus the documents having fewer reference samples are more likely to be selected for replacement by LNC-R. In particular, the LNC-R algorithm first considers for replacement all documents having just one reference sample in their profit order, then all documents with two reference samples, etc. until sufficient cache space has been freed (see Figure 1).
|Input: s - space to be freed
Output: C - list of candidate documents for replacement
for i = 1 to K
D= list of all documents arranged in order D1 < D2 < ... < Dk
C = minimal prefix of D such that Sumj(sj) >= s
A straightforward implementation of the above algorithm may, however, lead to starvation of documents having fewer than K reference samples. Namely, if the reference samples are discarded when the corresponding document is evicted from cache, then they must be collected again from scratch the next time the document is cached. However, since the document is again likely to be selected for replacement, it may be impossible to collect the necessary K samples to cache it permanently, irrespective of its reference rate. To prevent the starvation, LNC-R retains the reference samples even after the corresponding document has been evicted from the cache. The reference samples are garbage-collected using a profit-based criterion described in . The average delay di is also estimated from a sample of the last K delays, whenever appropriate. This means that we keep track of the last K delays that involved retrieval from a remote server
We show in  that selecting the optimal set of documents for replacement so that the delay savings ratio is maximized is equivalent to solving the knapsack problem and is thus NP-complete. However, we also show that LNC-R approximates (with K growing to infinity) the optimal algorithm, provided that the internal cache fragmentation is negligible and the references to the individual documents are statistically independent with apriori known and fixed rates. Thus the design of LNC-R relies on a solid theoretical foundation. In  we also design a cache admission algorithm, LNC-A, which determines for every newly received document whether it should be cached. Documents are cached only if they improve the overall cache profit. Since it is impossible to estimate the reference rates of documents which were not previously referenced, the admission algorithm must be based on heuristics. We were, however, unable to find heuristics which would improve performance on the Web client traces. Therefore, we do not discuss the cache admission algorithms in this paper.
Several studies of Web reference patterns show that Web clients exhibit a strong preference for accessing small documents [6, 7, 9, 11]. In particular, it was shown in  that given document Di of size si, its average reference rate can be expressed with a high confidence as
b= 1.66 and c is a constant (not given in ).
Consequently, we can base the estimate of average reference rate not only on the past reference pattern as LNC-R, but also on the size of the document. The above estimate of li and the original history-based estimate can be combined in many ways. We have selected for our new Web-based cache replacement algorithm LNC-R-W3 a multiplicative combination which determines the average reference rate li of document Di as
After substituting (5) in equation (2) we obtain:
Since our trace exhibited different characteristics from that reported in , we performed some experimentation, explained below, in order to determine the value of the constant b.
We evaluated the performance of LNC-R-W3 on a client trace collected at Northwestern University. The trace represents a two day snapshot (November 96) of requests generated by clients on approximately 60 PC's in a public lab at Northwestern University. The trace contains about 20K requests. Browser cache hits and non-cacheable requests (e.g. URL's containing "bin"or "cgi-bin") were filtered out from the trace. The browsers were temporarily adjusted so that all requests were re-directed to a proxy where for each referenced URL, we recorded the size the of requested document and the difference between the time when the request for document Di arrives at the proxy and the time when Di is actually fetched to the proxy server.
To gain more insight into the nature of the client requests' characteristics, we capture some of the statistical properties of our traces. We concentrate on two aspects:
Previously published trace analyses [6, 7, 9, 11] show that small files are much more frequently referenced than large files. Our traces exhibit the same characteristic as Figure 2a indicates. For example, 25% of all client requests are for documents smaller than 1K.
As discussed in Section 2, other researchers have observed the hyperbolic dependence between document size and reference rate. In order to determine the skew of the hyperbola given by parameter b in equation (4) we found the least-squares fit for a hyperbola on the trace data, which determines the best value of b as 1.30. The comparison between the best-fit line and and the real trace data is shown in Figure 2b.
|(a) Request size distribution (log scale)||(b) Least squares fit (log scale)|
The correlation between the document size and the delay to fetch the document is defined as
where Cov(s, d) is the covariance between size and delay, Var(s) is the variance of size and Var(d) is the variance of delay. Cor(s, d) shows whether the delay to fetch a document to cache varies across documents of similar size. Cor(s, d) = 1 indicates that delay is linearly dependent on size. Consequently, there is no need for delay-sensitive caching as delay can be always determined from size. On the other hand, Cor(s, d) = 0 indicates that there is no relationship between size and delay and thus delay-sensitive caching is necessary. If most of the clients access primarily local documents, we expect Cor(s, d) to be close to 1. On the other hand, if the clients access a significant fraction of remote documents, we expect Cor(s, d) to be close to 0. We measured the value Cor(s,d) on our trace as 0.2145, which is relatively low. Therefore, delay-conscious caching is indeed necessary.
The delay saving ratio (DSR) defined in previous section is our primary performance metric in all experiments. We also use cache hit ratio (HR) as a secondary metric, which is defined as
where hi is the number of references to document Di which were satisfied from cache, and ri is the total number of references to document Di.
We compare the performance of LNC-R-W3 against plain LRU and LRU-MIN . Similarly to LNC-R-W3, LRU-MIN also exploits the preference of Web clients for accessing small documents. However, unlike LNC-R-W3, LRU-MIN does not consider the delays to fetch documents to the cache and estimates reference rate to each document using only the time of last reference. Whenever LRU-MIN needs to free space for a new document Di of size si, it first attempts to select for replacement any document larger than si (the documents are searched in the LRU order). If no document has been found, LRU-MIN considers the lists of documents larger than si/2, si/4, etc., in LRU order until enough space has been freed.
We ran experiments with unlimited cache size in order to study the potential of caching on the workloads in our trace. As the results in Table 1 show, our trace exhibits reference locality (given by hit ratios) similar to other previously published results [1, 7]. The results in Table 1 also reveal that local documents exhibit a higher degree of reference locality than remote documents, because the delay-saving ratios are lower than the corresponding hit ratios. Again, this is consistent with similar findings in .
Table 1. Performance with infinite cache
We study the impact of the selection of parameters K and b on the performance of LNC-R-W3. In the following two sections we concentrate on each of the two parameters.
Increasing the value of K improves the reliability of reference rate estimates. Clearly, the burstier the workload the larger value of K should be selected. On the other hand, large values of K result in higher spatial overhead to store the reference samples. The transition from K=1 to K=2 is particularly sharp, since LNC-R-W3 with K=1 does not need to retain any reference samples after eviction of corresponding documents. Thus we expect the best performance for intermediate values of K. The results(Figure 3) confirm our expectations. With cache size 2% of the total size of all documents, setting K=3 leads to best performance as shown in (Figure 3a). The improvement of delay saving ratio relative to K=1 is 19.4%. Figure 3b confirms that also for other cache sizes the best performance is achieved with intermediate values of K (marked by white bar). Therefore, we conjecture that K should be set to 2 or 3 to obtain the best performance.
|(a) 2.0 % cache size||(b) all cache sizes|
As explained in Section 2, parameter b determines the skew of dependence of reference rate on document size. The higher the value of b, the stronger the preference of clients to access small documents. In Section 3.1 we determined b = 1.30 as the best fit for the data in our trace. However, since we found the best fit of Section 3.1 relatively noisy, we validated the prediction also experimentally. Figure 4a confirms the prediction for cache size 2% of the total size of all documents. White bars in Figure 4b show the optimal values of parameter b for other cache sizes. In most cases the measured optimal values are close to the predicted optimum b = 1.30. Because other Web client traces exhibit similar skew of dependence of reference rate on document size , we conjecture that for best performance on most web workloads b should be set between 1 and 2.
|(a) b settings on 2 % cache size||(b) b on different cache size|
As Figure 3 and 4 show for 20% cache size, we achieve DSR close to 0.2, which is about 10% below the maximum possible for infinite cache.
We compared the performance of LNC-R-W3 with LRU and LRU-MIN. For LNC-R-W3 we used the optimal setting of b=1.3 and K=3. As Figure 5 indicates, LNC-R-W3 provides consistently better performance than LRU and LRU-MIN for all cache sizes. In terms of delay-savings ratio, LNC-R-W3 gives on average 29.3% improvement over LRU and 11.4% improvement over LRU-MIN(Figure 5a). The maximal improvement over LRU and LRU-MIN is 49.9% and 20.9% for cache size 2.0%.
Although LNC-R-W3 is not designed to maximize the cache hit ratio, it still provides an improvement over LRU and LRU-MIN as shown in Figure 5b. In particular, the average improvement is 37.6% over LRU and 4.4% over LRU-MIN.
|(a) Delay-Savings Ratio||(b) Hit Ratio|
Cache replacement algorithms have been extensively studied in the context of operating system virtual memory management  and database buffer pool management . The design of LNC-R-W3 is based on our earlier published cache replacement algorithm LNC-R  developed for caching of query results in data warehousing environment. The data warehousing environment bears many similarities with the Web environment described in this paper. In fact, Web client GET requests can be viewed as a very simple form of queries. The cache replacement algorithms in both environments must cope with managing data items of different sizes and non-uniform costs of cache misses due to different resource consumptions for various query executions. However, LNC-R-W3 exploits the preference of Web clients for accessing small documents.
The performance benefits of caching Web documents close to clients on proxy servers have been pointed out by several researchers [1, 2, 4, 7, 8, 14]. Several variants of LRU tailored to caching of Web documents have been published in . All algorithms in  exploit in some form the aforementioned preference of Web clients for accessing small documents. One of the algorithms, LRU-MIN, is used as a yardstick for LNC-R-W3. A modification of one of the algorithms appears in . None of these algorithms, however, explicitly considers the delay to fetch a document to cache as one of the criteria for cache replacement.
The importance of explicitly considering the delay to fetch documents has been observed in . Similarly to LNC-R-W3, the proposed algorithm is based on a rank function which takes the delay as one of its inputs. However, unlike LNC-R-W3, the design of the rank function is ad-hoc. The inputs of the function are combined using a set of unspecified weights. A concept of profit similar to that embodied in our equation (2) has been defined in , however no actual cache replacement algorithm based on it is developed. Our profit function used for LNC-R-W3 , i.e., equation (6) combines the effects of delays and preference for small documents in order to minimize the response times perceived by the clients. In addition, LNC-R-W3 obtains performance benefits from a more reliable estimate of document reference rates based on a moving average window of the last K references.
The main contribution of our paper is to show the importance of delay-conscious cache replacement algorithms for Web document cache management. Our trace analyses confirm that with improving quality of Web document content, the users will access remote documents more frequently.
To address the need for design of delay-conscious cache replacement algorithms, we developed a new cache replacement algorithm LNC-R-W3. LNC-R-W3 builds on our earlier published cache replacement algorithm LNC-R, which is shown to approximate the optimal, NP-complete, cache replacement algorithm . Therefore, the design of LNC-R-W3 benefits from a solid theoretical foundation. While LNC-R is a generic delay-conscious cache replacement algorithm, LNC-R-W3 exploits the specific characteristics of Web client workload.
We compared the performance of LNC-R-W3 against the standard LRU and a Web-tailored LRU-MIN using trace-driven simulations. We used our own trace recently collected at Northwestern University. In summary, the experimental results indicate that LNC-R-W3 provides consistently better performance than LRU and LRU-MIN. We also systematically studied the choices for tuning the various performance knobs of LNC-R-W3.