Pseudo-serving: A User-responsible Paradigm for Internet Access

Keith Kong
Electrical and Computer Engineering Department
University of California at Davis
Davis, California 95616, USA

Dipak Ghosal
Computer Science Department
University of California at Davis
Davis, California 95616, USA


As the demand for bandwidth on the Internet continues to rocket, it becomes increasingly important to expand the existing infrastructure and to fairly allocate scarce network resources to improve user welfare. This paper introduces a paradigm for accessing information on the Internet whereby previously unused resources, those of the users themselves, play a central role in allowing users to bypass network congestion under "flash-crowd" conditions, where a large number of users from many networks retrieve the same set of files from a server over a short period of time. Simulations demonstrate the feasibility of this approach: a user obtains speedy access to files on servers under flash-crowd conditions in exchange for holding the file for a short period of time, within which, she serves at most two other users.

1. Introduction

Although users are connected to the Internet through faster modems, a greater percentage of them cite downloading speed as the number one problem with accessing information on the Web [9]. The bottlenecks for transferring data are shifting away from modems to servers and to intermediate links of the global internetwork. This shift is one of many signs indicating the failure of the current expansion of the network infrastructure to keep pace with growing demand. It is clear that improvements in network technologies need to continue at all levels if the Internet is to sustain its hyper-exponential rate of growth.

It is also clear that unless technological advances outpace people's imagination for consuming ever greater amounts of bandwidth, some mechanism is needed to fairly allocate network resources so that preferred access is granted to those who are willing to pay for it. The literature frequently cites the "problem of the commons" faced by the Internet community, where public property, bandwidth in this case, is overused because each user overlooks the cost of her own actions imposed on others [11], [14]. There is no mechanism in place, for example, to discourage users from downloading large, often whimsical, files from a site half way around the world during peak hours. While there is certainly some benefit to the user in "surfing" the Web, there is also a high cost in surfing the Web irresponsibly.

We introduce a new paradigm for Internet access, called pseudo-serving, in which speedy access is granted to users who "contribute to the bandwidth they consume." It is therefore a mechanism that brings the cost of transferring data directly to the user. The more congested a site becomes, the more valuable preferential access to that site also becomes. Under such circumstances, the user may be willing to contribute some of her own resources in exchange for such preferred access. This contribution, in turn, makes it possible to grant speedy access to others. We describe how this can be done in this paper.

The paper is organized as follows. First, we describe various types of congestion and identify those types that can be eliminated or reduced by pseudo-serving. We then describe the principle that allows users to contribute useful bandwidth to the Internet. This is followed by a description of pseudo-serving itself, an application of it to dissipating "flash-crowds," a comparison of it to related work, a summary of future work, and finally a conclusion.

2. Types of Congestion

Broadly speaking, the bottlenecks experienced by data traversing through the links of the Internet can be categorized according to their location. We identify three types of bottlenecks: those that occur near the server, those that occur near the client, and those that occur at the intermediate links.

Server-side bottlenecks occur when a large number of clients are connected to the server simultaneously. For many applications, such as ftp, there is a limit to the number of connections that can be handled by the server simultaneously, called the server's concurrency. Once this limit has been reached, no new connections can be established without additional ones being terminated. Because the price/performance of computer systems continues to drop, concurrency is becoming less of an issue, and server-side bottlenecks are moving toward the server's link to the Internet, where it is much more expensive to add capacity [Footnote A]. When the server's link becomes the bottleneck before server concurrency is reached, the number of simultaneous transfers grows until the service rate of any individual transfer is lower than that which users can tolerate.

     Figure 1a. A traditional                     Figure 1b. A pseudo-server
     client/server system.                        system.

Figure 1a illustrates this phenomenon. The bandwidth of the server's link to the Internet is divided among many clients. As a result, the rate at which data is sent to most clients is much less than the rate that their links to the Internet can handle. This is a growing problem for clients who access globally popular sites, where the competition for bandwidth is growing rapidly.

A similar problem occurs at the client side. Typically, a large number of clients share the same link to the Internet through an Internet Service Provider (ISP). The bandwidth of this link is divided among all of the clients when they are using it to transfer files simultaneously. However, client-side bottlenecks differ from server-side ones because users can avoid them. One can choose different levels of user sharing by subscribing to different services. A person who only uses the Internet for e-mail, for example, may not mind sharing a T1 connection with many hundreds of other users. "Net surfers" who demand faster connections can subscribe to a service where the number of users per link is smaller. In fact, users can have their own dedicated links to the Internet if they are willing to pay for them.

Finally, bottlenecks can occur at the intermediate links of the Internet. This happens during peak hours of network usage during which many connections between servers and clients exist. Because packets belonging to many connections are handled by the same intermediate links, these links are often points of congestion. Figure 1a illustrates this point. Clients that are farther away from the server cross more links and therefore have a higher chance of crossing a congested link. As a result, such clients receive data at lower rates. As is the case with server-side bottlenecks, the user is powerless in avoiding bottlenecks arising from congested intermediate links. If the user insists on retrieving the file as soon as possible, she must be patient enough to download the file in the presence of congestion.

3. Trade-off Between Storage Space and Bandwidth

Although there is little that a user can do to sidestep bottlenecks occuring near the server and at the intermediate links of the Internet, she can often obtain faster access by retrieving files from replicates of the server closer to her own host. Many globally popular servers circumvent the problem of congestion occurring near the server and at the intermediate links through replication. Replicating, or "mirroring," the server on different networks on the Internet multiplies the server's concurrency. It also spreads the load on the server's link to the Internet onto many such links. Finally, having more than one site opens up the possibility of being able to retrieve files from a closer site. By retrieving files from the closest site, the client minimizes the probability of transfers crossing congested links.

One can think of replication as a spectrum in which storage space can be traded for the bandwidth of links. At one extreme, replication is not used so that a server has no mirror sites; clients must retrieve files directly from the server. Each retrieval uses bandwidth on the links connecting the client to the server. At the opposite extreme, the server's files are replicated on every host. In this case, no link bandwidth is used when a client wishes to retrieve a file from the server; it simply retrieves the file from its local disk. At some point, of course, data must be transferred to the mirror sites before they can be used. Fortunately, this can often be done during non-peak hours, during which bandwidth is essentially free [Footnote B].

Even so, mirroring data has other problems. It is still a waste of bandwidth if data is mirrored on many hosts but only a few of them are accessed by clients. This can become a significant problem if updates must occur often because the server's data changes frequently. A more important issue is the cost of mirror sites. Replicating servers also means replicating the cost of servers and, if the mirror sites are located in different network, the cost of those networks. Even for the most affluent of organizations, it may not be cost-effective to continue adding mirror sites to meet increasing demand. Finally, mirroring data is currently a server-initiated process. If a congested server's data is not mirrored, there is little that a user can do to improve her access to the server's data.

Pseudo-serving introduces a mechanism through which individual users can bypass congestion occurring near the server and at intermediate links of the Internet. This is made possible by the storage resources users can contribute to the Internet during periods of congestion. We expand the concept of mirror sites to include a "quality-of-service" dimension. In this framework, mirror sites provide the highest quality of service by mirroring all data on the parent server for an indefinite period of time and allowing an unlimited number of hosts to retrieve data from it. Because of the expense in disk space and network resources required, however, few hosts are willing to become mirror sites.

Pseudo-serving exploits the possibility that, while not all hosts are willing to provide nearly as much resources as mirror sites, most may be willing to contribute some resource in exchange for increased accessibility to information on the Internet. In fact, we will show that a user can often obtain speedy access if she is willing to upload (or convince a proxy to upload) a single file for every file she downloads. We believe this is a reasonable price to pay for speedy access to information on sites that were previously too overloaded to serve additional users. This is especially true when one considers that, because the upload and download portions of links are often bandwidth independent, uploading a file can be done at the same time that a file is being downloaded without significant effect on the download speed for most users [ Footnote C].

4. The Pseudo-server System

A pseudo-server system consists of two components: a super server and a set of pseudo-servers. The former grants the latter access to files in exchange for some amount of network and storage resources through a contract. This amount is zero under "low-demand" conditions, when the super server functions as a concurrent server and pseudo-servers function as clients. Under "high-demand" conditions, when the super server's concurrency limit has been reached, it may be possible to grant immediate access to the pseudo-server in exchange for temporary usage of its network and storage resources. Under these circumstances and subject to the condition that the contract is met, the super server gives the pseudo-server a referral to where the requested data may be obtained.

Figure 1b illustrates this process. Pseudo-server A wishes to retrieve a file from the super server, so it sends the request, and along with it, the resources it is willing to contribute (1). Upon determining that these resources are sufficient in light of current patterns of request, the super server gives a referral to A, directing it to retrieve the file from B (2). After acting on this referral (3), A finally obtains the file from B (4). It can then serve the file to other pseudo-servers in the same manner as B. A crucial point to note is that the overhead associated with the transaction is negligible when the request is for large files.

The following sections describe the components of a pseudo-server system in greater detail. For clarity, these descriptions are based on a specific implementation of such a system. For convenience, we sometimes refer to a pseudo-server before it has retrieved its requested file as a client. We distinguish such a client from the client of a traditional client/server system by referring to the latter as a traditional client, where necessary.

4.1 The Pseudo-server

The pseudo-server plays essentially the same role in accessing data as that of the traditional client. However, there are a few key differences in the messages the pseudo-server sends to the server and the responses expected. These differences are noted below:

At this point, it is worthwhile to briefly clarify what meeting a contract entails. In exchange for being served or given a referral, a pseudo-server guarantees to be in "standby mode" for some period of time. Within this file holding time, it agrees to serve the file it has just retrieved up to some number of times as specified in the contract. If no clients arrive within this period, the pseudo-server is released from its obligation. It is also released from its obligation as soon as it has served the number of clients it agreed to serve, regardless of whether it has been in standby mode for the agreed duration.

4.2 The Super Server

The super server answers requests. Requests are handled differently depending on the size of the file requested. If the file is small, the client is served immediately by the super server. Otherwise, the super server checks to see if a pseudo-server for the requested file exists. If one exists and the contract conditions are met, the super server tells the client the location of the nearest pseudo-server containing the file, along with the resources actually required from the client. Information regarding these resources and the IP address of the client are then stored in the super server's main memory, indexed by the file name and the location of the client based on its IP address. This information is used for the benefit of future requests for the same file.

If no pseudo-server for the file exists, but the limit of pseudo-servers the super server is concurrently serving has not been exceeded and the contract has been met, the super server itself serves the file to the client. If this limit has been reached, but the limit on the number of free requests the super server can serve concurrently has not been reached, then the super server also serves the file to the client. This "freebie" route is followed whenever a contract has not been met. This is to ensure that all hosts, regardless of ability to contribute resources to the Internet, are given nearly the same chance of gaining access as a traditional concurrent server would to a client. Finally, when it is not possible to give a referral to a client and it is not possible to serve a client because the server's concurrency for both have been reached, the client is told to try again later.

4.3 Contract Policies

The key to a super server's power is its ability to set contracts. The super server can set contract conditions differently according to different conditions of demand.

In order to guarantee speedy access to users, two conditions need to be satisfied. First, a pseudo-server must exist that can satisfy a request as it arrives. Secondly, the links between a requestee and the pseudo-server from which it retrieves the file must be uncongested.

The first condition can be satisfied by "growing" pseudo-servers as necessary until the number of pseudo-servers is sufficiently large to handle the rate at which requests come. This can be achieved by stipulating in the contract that pseudo-servers must handle more than one referral. If each pseudo-server handles N referrals before it leaves the "pseudo-server pool," then the number of pseudo-servers in the pool grows exponentially, at a rate of N^n, where the unit of n is the time it takes for a pseudo-server to serve N referrals. This "expansion" policy continues until the size of the pseudo-server pool is equal to rate of request x download time. When this happens a requestee can be given a referral almost as soon as it arrives, and the contract is reduced so that each pseudo-server need only handle one referral to maintain the size of the pool. Under such circumstances, the super server can be thought to be in a "steady-state" mode. If the rate of requests should drop, fewer pseudo-servers are needed, and therefore a "contraction" policy is imposed; clients who make requests during periods of reduced demand are simply given referrals without any need for their resources. If there are as many periods of contraction as there are periods of expansion, a pseudo-server handles on average only one referral.

In order to satisfy the second condition, it is necessary that clients can retrieve files from sites without traversing congested links. This requires knowledge of global network traffic and therefore can not be done easily without using a large amount of overhead. A more reasonable approach would be to relax the condition; rather than looking for a solution that guarantees data can be retrieved from sites that do not traverse congested links, we look for a solution that guarantees data can be retrieved from sites that are, on average, less congested than if the data were directly retrieved from the server. This can be done in the framework of pseudo-serving by setting contracts based on the pattern of requests coming from groups of closely-linked networks, or network clusters, and making referrals only to a pseudo-server located in the same network cluster as the client.

5. Flash-crowd Dissipation

We are developing a simulator to test the operation of a super server under different conditions of demand. It is sufficiently complete that important aspects of a pseudo-server system can be shown by it. We ran the simulator under "flash-crowd" conditions, and compared the performance of a pseudo-server system with that of a traditional client/server one. The results of this simulation are discussed in this section.

5.1 Flash-crowds

"Flash-crowd" conditions arise whenever a large number of requests are made over a short period of time for a small set of files contained on a server. This happens, for example, when the location of a server containing information of global interest is broadcast on national television. The unfortunate result is a sudden overload of the server's network and nearby routers and intolerably long service times. Users often exacerbate the problem by attempting to make connections again when connections are not established the first time.

5.2 Simulator Overview

The simulator is a C++ program that takes input describing the conditions of the simulation and produces output logs of important events organized into separate files. The input includes the parameters describing a super server, parameters describing the super server's link to the Internet, and parameters describing individual pseudo-servers, including the time at which they make their initial requests. The simulator assumes the Internet itself has infinite bandwidth and hence does not model delays due to congestion at intermediate links.

5.3 Link Model

It is worthwhile to discuss the link model in more detail because it plays a vital role in demonstrating the limitations of both the legacy and proposed paradigms for accessing data on the Internet. As mentioned previously, once the bottleneck due to server concurrency is removed, the server's link to the Internet becomes the new bottleneck; data can be transferred to clients only as fast as the link is able to move them. Similarly, a super server can only process messages as fast as they arrive from the clients through the link. Pseudo-serving, as it is currently proposed, reaches its limit for effectively handling arbitrary rates of requests when the rate at which requests are made exceeds the rate at which they can be processed by the server's link to the Internet.

We model a link as being composed of two independent portions, an uplink and a downlink. The downlink receives messages from clients and transfers them to the super server. Similarly, the uplink receives messages from the super server and transfers them to clients. Each link has a finite bandwidth that affects the rate at which messages are transferred. A link with bandwidth BW bits per second is able to completely transfer a message of M bits in M/BW seconds. This assumes, of course, that the link is not transferring any other messages, and that, for the duration of the transfer, no new messages arrive and no pending messages depart.

The link model accounts for the effects of message arrivals and departures by keeping track of message completion times. Using the link bandwidth, the number of messages currently served by the link, and the size of a newly arriving message, two operations are performed when a new message arrives. First, a message completion time is computed for the new message. Second, the message completion times of all pending messages are updated to reflect the additional bandwidth taken by the new message. This is also done whenever a message departs from the link; message completion times are updated to reflect the extra bandwidth "released" by message departures.

Although this model of a link does not reflect reality, it is a simple model that captures the average load on a link. In a real network, data are transferred as packets from buffer to buffer. Under conditions of heavy traffic, some packets are transferred at the expense of others, in which case, timeouts occur. The simulator, on the other hand, treats all requests with equal priority: a request handed to the link will be put in the link. A nice consequence of this is that the load on a link can be accounted for by simply noting the number of simultaneous messages in the link as a function of time.

5.4 Simulation Parameters

In the simulation, users connected to the Internet through 28.8Kbps links make requests for the same 100,000-byte file stored on a server which is connected to the Internet through a 1.544Mbps link. Requests are made at a rate of 10 every second for 1000 seconds, with each client request and server response taking 500 bytes. A user who cannot establish a connection the first time continues to make requests until she successfully establishes one. The time interval between attempts in seconds is is a random integer uniformly distributed between 1 and 100. This would be consistent with a scenario, for example, in media-stimulated events (e.g. pictures of Comet Shoemaker-Levy colliding with Jupiter) where a large number of users tries to access a site over a short period of time, and, to the misfortune of network administrators, each user's Web browser has an automatic "redial" capability.

Although the simulator was developed to demonstrate the general operation of pseudo-serving, we tried to make the simulation as realistic as the capabilities of the simulator allowed. Toward this end, we used statistics reported in [1] and [9] as the basis for setting parameters in the simulation. These statistics are:

User connection speed [9]
15% of the users surveyed report that they are connected to the Internet at speeds greater than 128 kbps, 73% at less than or equal to 128kbps, and 12% were unsure. We interpret these percentages conservatively as meaning 15% of Internet users are connected on-line permanently while the rest is connected only temporarily. Users permanently on-line were assumed to be able to hold their files much longer than users who are only temporarily on-line. For the simulation, we made the following assumptions. "Permanent users" can hold a file for 12 hours within which they are willing to serve 10 times. "Temporary users" can hold a file for 15 minutes within which they are willing to serve only 2 times.
Domain-based request patterns [1]
A typical server is accessed by thousands of domains, 10% of which account for 75% of the requests. We assumed a server is accessed by 5000 networks, 500 of which account for 75% of the requests (hot domains) and 4500 of which account for the remaining 25% of the requests (cold domains).

Unfortunately, preliminary experiments showed that the simulator, which is currently optimized for flexibility at the expense of speed and memory usage, cannot adequately handle parameters with these values. In order to reduce computation time, we assumed network clustering is in use, so that requests are accounted for in a per cluster basis where each cluster is a group of 10 networks. The effect of this assumption is to reduce the 5000 networks into 500 network clusters. The percentages of requests for hot and cold domains were applied to these network clusters unchanged, so that there were 50 "hot network clusters" and 450 "cold network clusters."

We also assume bottlenecks are occurring at the server side. That means bandwidth is limited by the server's concurrency as well as by the server's link to the internet. When the simulator was run in "traditional mode," the server's concurrency was set at 20. When it was run in "pseudo-serving mode," the 20 concurrencies were split evenly between serving only pseudo-servers that meet contracts and serving all clients, including those who do not meet contracts. It should be noted again that the simulator does not model the effects of congestion occurring at intermediate links connecting server to clients and pseudo-servers to each other.

5.5 Simulation Results and Discussions

While the simulator produced many interesting results, we are able to only briefly discuss some of the most important ones here. Further results and discussions will be put in a technical report.

Figure 2 shows the plot of the average download times for individual clients as a function of the time at which they make their first requests. A download time corresponds to the total time that a client waits before the file has been completely transferred to the client's host. It includes the time to successfully establish a connection in addition to the time to download the requested file. According to the figure, for example, a client requesting a file at time 300 seconds takes an average of about 1500 seconds before it has completely received the file.

           Figure 2. Download time with       Figure 3. Download time with
           concurrent server.                 pseudo-serving.

The download time for the concurrent server in the simulation was dominated by the connection time. To see why this is the case, at a link bandwidth of 1.544Mbps and a concurrency of 20, it takes about ten seconds for a client to download a file once it has established a connection with the server. Clearly, Figure 2 shows that the average download time over the entire simulation is about two orders of magnitude greater than this duration. Most of the time is therefore spent in trying to establish a connection.

At a rate of 10 requests per second and a concurrency of 20, the server quickly reaches its concurrency in a matter of a couple of seconds, after which the download time is dominated by competition for slots that open whenever a client has finished retrieving a file from the server. The more clients there are trying to make a connection to the server, the longer that each individual client has to wait before it can establish a successful connection. Clients who attempt to make a connection early in the simulation have a "head start" on everyone else, and hence take less time to download files than other clients. The average download time therefore increases as a function of time. At about 300 seconds after the start of the simulation, however, the average download time seems to stabilize and in fact actually drops. This is a manifestation of the fact that once the entire crowd has shown up, the average time for downloads actually decreases because the crowd is being dissipated by the server. This trend is somewhat difficult to see because it is masked by the relatively large client retry time in the simulation.

Figure 3 shows how a pseudo-server system handles the same flash-crowd conditions [Footnote D]. It has three aspects strikingly different from the previous figure. First, download times are more than an order of magnitude smaller. Secondly, the download time seems to go down exponentially with time. Thirdly, after about 200 seconds, the download time seems to reach a constant of around 25 seconds, about 30 times smaller than the smallest download time shown in Figure 2.

These striking aspects are immediately made clear by Figure 4. It shows that the number of pseudo-servers grows exponentially, peaks at around time 200 seconds, and finally settles to a constant value at around time 300 seconds. These phases correspond to the "expansion," "contraction," and "steady-state" policies, respectively, described in Section 4.3. Clearly, the number of pseudo-servers grows and shrinks as necessary to meet changing rates of demand. The appearance of a contraction stage in addition to an expansion stage and a steady-state one is a result of retransmissions made by clients. As in the case of the concurrent server, the super server's concurrency limit is quickly reached. When this happens, the request rate is augmented by retransmissions, and the super server does not stop its expansion policy until the number of pseudo-servers is sufficiently large to accommodate this new rate of request. Eventually the rate of retransmissions decreases because there are fewer clients making retransmissions, at which point, the super server changes its policy to a contracting one. Finally, when there are no more retransmissions, the super server settles on a steady-state policy, in which the number of pseudo-servers are sufficient to handle only the newly arriving clients.

Figure 4. Number of pseudo-servers vs. time.

Figure 5. File holding time for Figure 6. File holding time for temporary hot domain client. temporary cold domain client.

Figures 5 and 6 demonstrate the feasibility of pseudo-serving when transfers are restricted to within network clusters as described in Section 5.4. Restricting transfers in this way imposes greater burden on the pseudo-server in terms of requiring it to hold the file it just retrieved for a longer period of time. At the same time, because transfers are made only within networks close to each other, the probability of encountering congestion during transfers is reduced.

Figure 5 shows the average file holding time for the temporary hot domain case as a function of the time at which clients make their first request. As expected, the file holding time is small during the expansion stage near the beginning of the simulation. Clients that make their requests starting at around 250 seconds hold their files for about 4.5 seconds. By doing so, clients are guaranteed referrals are made to pseudo-servers within the same network cluster. Also as expected, it is more difficult for clients to satisfy contracts in cold domains, as can be seen in Figure 6. Clients that make requests starting at around 400 seconds need to hold their files for an increasingly longer period of time, reaching a peak of around 30 seconds in this simulation. But clearly, even this duration is reasonably satisfiable by clients only temporarily connected to the Internet.

6. Comparison with Similar Work

The novelty of pseudo-serving stems from its use of the user's resources, the server's knowledge, and a contract with the user to provide preferred access in exchange for granting access to other users. Of these, the exploitation of the client's resources and use of the server's knowledge are not new ideas.

In 1992, Blaze [5] describes a dynamic hierarchical distributed-file system that uses the client's cache to reduce load on the server. Because the Internet is significantly different from a distributed-file system, many of the assumptions applicable to file systems in this work does not apply to the Internet. It can not be assumed, for instance, that most clients can be relied upon to be connected to the network when requests arrive for their cached files.

Gwertzman's work with push-caching [8] is similar to our own work in that the server's knowledge of request patterns is used to "push" data onto caches located in geographically-strategic places. Bestavros has done similar work with his document dissemination system [4]. The problem with these approaches is that they assume cooperation from possible sites for mirroring data. Gwertzman says server-directed caching is therefore especially appropriate for centrally-administered networks such as the Microsoft Network and the AT&T network. But the Internet is not centrally administered and it is clearly the network that affects the most users. Moreover, because these works are caching schemes, cache space becomes the bottleneck. There is no guarantee that a cached file will not have been overwritten by some other file by the time a new request is made for it.

Pseudo-serving, on the other hand, is a reservation scheme. Data is guaranteed to be held by the pseudo-server for some period of time as dictated by the contract. Moreover, because the super server's knowledge of request patterns comes to bear in making the contract, the resources of the client is used more efficiently. This is in contrast to traditional caching schemes, where only the knowledge of request patterns of the clients served by the cache can be used. Knowledge concerning the global request pattern for files on a server is not exploited.

The work of Baentsch et al. [2] comes closest to our own. They have implemented an automated replication system called Caching goes Replication (CgR), in which, as the name suggests, caches are essentially turned into mirror sites for subsets of the files on different servers, as is done in pseudo-serving. CgR seems to be a fully scalable system with its own URL-like naming service. Pseudo-serving differs from CgR in several important respects. First, it is a light weight protocol in the sense that overhead is reduced at the expense of less functionality. It does not have, for example, a naming service and therefore requires that all requests go to the server. At the same time, because it has no naming service, problems associated with introducing a new naming service need not be addressed. If in fact a request to the server becomes the new bottleneck, then a naming service may need to be introduced. Second, CgR does not adequately address the issue of when caches should be turned into mirror sites and for how long. Gwertzman answers this in part by relegating such decisions to a central administration. Again, we argue that this would be difficult to do for the Internet.

Finally, pseudo-serving recognizes that proxy servers are themselves valuable resources and are therefore potential sites for bottlenecks. Mirroring data on proxy servers benefits other users, but this is done at the expense of caching data, which benefits the users originally intended to be served by the cache. Pseudo-serving addresses this issue by creating an environment in which the caches of the users themselves can be put to good use when it is possible to do so. Because there are far more clients than proxy servers, applying replication at the level of the client can significantly augment the effectiveness of replication when it is applied only at the level of the proxy server. This augmentation can be especially significant when the proxy cache becomes a bottleneck; Smith [15], for example, reports that it was at times actually faster for clients to retrieve files directly from the server rather than from a busy proxy cache.

7. Future Work

Because we are only beginning to explore pseudo-serving, there is much more work to be done. There are a number of issues that still need to be addressed before pseudo-serving becomes practical. We mention a few of them here.

While the effectiveness of pseudo-serving in reducing server-side bottlenecks have been demonstrated, more work needs to be done in order to determine the extent to which pseudo-serving is effective in reducing bottlenecks at the intermediate links. This requires using a more complex simulator that models the topology of the Internet in greater detail. It also requires more detailed knowledge of the willingness of users to hold their connections for a longer period of time- how much are users willing to contribute in order to gain faster access?

Pseudo-serving is primarily useful for request patterns in which the same document or set of documents are requested by users. This occurs in many situations, including retrievals of daily weather maps [a], Doctor Fun comic strips [b], pictures and videos of unusual cosmic occurrences [c], and the latest Linux distributions [d]. Even under "normal" circumstances, a 90/10 rule seems to hold in the patterns of requests, where 90% of the bytes transferred are due to 10% of the documents on a server [1], [3]. Pseudo-serving, however, is not suitable for "hot sites" with many files, where the rate of request of any individual file is small.

We are exploring ways to make pseudo-serving more general by putting files together into packages. While the rate of request for individual files may be small, the amalgamated rate of request of a number of such files may be high. By grouping files so that users can retrieve only packages, the rate at which these packages are retrieved can be made to be sufficiently high so that it becomes practical for users to satisfy contracts. Clearly, the disadvantage of forcing users to retrieve packages is that a user who wants only part of a package must retrieve the entire package; this makes the retrieval time longer and, in doing so, consumes more bandwidth. But if by doing so, it is easier to satisfy contracts, then more users are able to participate as pseudo-servers. More sites are therefore created for retrieval of files, and network locality can therefore be exploited to a higher degree to reduce the usage of bandwidth on congested intermediate links.

Finally, pseudo-serving only works if contracts are not widely broken. This can be ensured by transferring files in encrypted form to pseudo-servers and only sending the key after the pseudo-server has fulfilled its contract. This means, however, that pseudo-serving is only more attractive than the traditional client/server paradigm if a pseudo-server's file holding time is small. Figure 6 suggests this is in fact the case, where the file holding time is on the order of 5 seconds under high-demand conditions. Further work needs to be done to determine whether such an extreme measure for reducing contract violations is necessary. We suspect, however, that most users would not violate contracts simply because fulfilling them is so easy : the user simply lets another user or two use that portion of her link to the Internet that she does not use often anyway (the uplink).

8. Conclusion

The power of pseudo-serving lies in its introduction of a mechanism through which a user can barter for preferred access those resources she does not value highly but are precisely the ones in great demand by other users. Use of this mechanism increases the welfare of the Internet community on two fronts: resources are allocated with greater efficiency because the cost of resources during periods of scarcity is made clear to the user through a contract, and resources are added as necessary when demand for them grows, a benefit not present in pricing schemes. Preliminary simulations show that pseudo-serving is feasible under flash-crowd conditions where the download time is dominated by server-side congestion. Specifically, a user can reduce the total download time of a 100,000-byte file from a busy server by thirty-fold if she is willing to hold the file she just retrieved for less than a minute, within which, she serves on average one other user.


[A] A T1 link operating at 1.544 Mbps costs between $1000 to $3000 a month to lease [10], or about the the one-time cost of a server. Using such a link, a modest-sized image file of 20,000 bytes can be transferred in about one-tenth of a second. If the rate at which requests come is greater than ten every second, then the link becomes a point of congestion. Busy servers are now reporting rates of request in the tens of accesses per second. Infoseek [e], a popular search engine connected to the Internet through a 45 Mbps T3 line, for example, reports five million requests a day, or over fifty requests a second [6].

[B] On average, the bandwidth of the NSFNET backbone is used only 5% of the time, most of which is consumed during peak hours [12]. Although this September 1994 figure is dated, it still shows a huge gulf between average and peak usages, and is therefore a powerful justification for replication, which shifts transfers that take place during peak hours to hours when traffic is low.

[C] In fact, there is a modem protocol called HS/Link [f] that does just that; uploads occur at the same time as downloads. The original intent of HS/Link was to save on-line fees in the days when metered rates were the norm.

[D] In order to accentuate the expansion, steady-state, and contraction stages of pseudo-serving, clients were not separated into hot and cold domains, as it is done in the simulations that produced Figures 5 and 6.


[1] M. F. Arlitt, C. L. Williamson, "Web server workload characterization: the search for invariants." 1996 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, Philadelphia, PA, USA, Vol. 24, No. 1:126-37, 1996.

[2] Michael Baentsch, Georg Molter, Pterm Sturm, "Introducing application-level replication and naming into today's Web." Computer Networks and ISDN Systems, Vol. 28, pp. 921-930, 1996.

[3] Azer Bestavros, "Demand-based Document Dissemination for the World-Wide Web," Technical Report TR-95-003, Boston U., CS Dept., Boston, MA 02215, February, 1995.

[4] Azer Bestavros, Robert L. Carter, Mark E. Crovella. "Application-Level Document Caching in the Internet." Technical Report BU-CS-95-002, Boston U., CS Dept., Boston, MA 02215, March, 1995.

[5] Matt Blaze, Rafael Alonso, "Toward Massive Distributed File Systems," Proc. 3rd Workshop on Workstation Operating Systems, April 1992.

[6] Adrian Cockroft, "Watching your Web server," SunWorld Online, Vol. 10, No. 12, Dec. 1996.

[7] David Dennis, The Internet Access FAQ. .

[8] James Gwertzman, Margo Seltzer, "The Case for Geographical Pushcaching," Proceedings of the 1995 Workshop on Hot Operating Systems.

[9] GVU's 5th WWW User Survey.

[10] Internet Provider Map.

[11] Jeffery K. Mackie-Mason, Hal R. Varian. "Pricing the Internet." In Brian Kahin and James Keller, editors, Public Access to the Internet. Prentice-Hall, Englewood Cliffs, New Jersey, 1995.

[12] Jeffrey K. Mackie-Mason, Hal R. Varian. "Some FAQS about Usage-Based Pricing," September 1994.

[13] Donald Neal, "The Harvest Object Cache in New Zealand," Computer Networks and ISDN Systems, Vol. 28, pp. 1415-1430, 1996.

[14] S. Shenker, D. Clark, D. Estrin, S. Herzog, "Pricing in Computer Networks: Reshaping the Research Agenda," Telecomunications Policy, Vol. 20, No. 3, pp. 183-201, 1996.

[15] Neil G. Smith, "The UK national Web cache -- The state of the art," Computer Networks and ISDN Systems, Vol. 28, pp. 1407-1414, 1996.


[a] weather maps

[b] Doctor Fun comic strips

[c] pictures and videos of unusual cosmic occurences

[d] latest linux distributions

[e] Infoseek

[f] HS/Link

Return to Top of Page
Return to Technical Papers Index