WWW94: a scalable HTTP server

presented by eric dean katz, michelle butler, robert mcgrath, NCSA, chicago
the WWW server at the National Center for Supercomputer Applications (NCSA) suffered from a dramatic increase of http requests. during the last 40 weeks, the number of requests per week rose from 91'000 to 1'500'000 which is equal to an increase of almost 11% per week !

in the beginning, they tried to solve the problem by replacing the hardware by a more powerful system. but at a certain point, the people at NCSA realized that they have to follow a different approach to satisfy the increasing demand of requests.

so they came up with the idea of sharing the load between multiple systems. one way to do this would be to split the document tree and to distribute the documents over multiple systems. but this would lead to many dead links because the network address of all these documents would change plus this would involve a great deal of work because all documents that refer to documents that would be shifted to other systems had to be modified.

therefore people at NCSA decided to follow a different approach: they set up what they call a "randomly distributed domain name resolution". the idea is simple: each time a client asks for a document, it first has to resolve the address of the server. they modified a BIND server in such a way, that it returns a different IP address for each address resolution request, based on a round robin algorithm. this allows to distribute the load over multiple hosts very easily. modifications to the BIND server where necessary to implement this feature for the HTTP protocol only, while other protocols would be left untouched by the round robin algorithm.

this new algorithm allows to add and remove systems to the "virtual WWW server" without disturbing the service at all. therefore this is a truly scalable WWW server. to ensure consistent data regardless of the actually chosen system, the documents reside on a Andrew File System (AFS) which is shared by all systems.

the people at NCSA monitored the performance of their virtual WWW server and to their surprise they noticed that the requests are not equally distributed over all systems. over time, some systems respond more often to requests then others do. it happened that one server replied to more then 50% of the incoming requests. one reason might be, that addresses get cached and that subsequent requests address the same server because they use the cached address rather then inquiring the BIND server again. this behavior is subject for further investigations at NCSA.

i have no link information for this paper on the web.
1st_day_scal_server / 13-jun-94 (ra) / reto ambühler