WWW94: World Wide Web proxies


presented by ari luotonen, CERN and kevin altis, Intel
a WWW proxy server provides access to the web for people behind a firewall. in addition, the proxy server does some caching of hypertext documents, so if many users access the same documents, response time will be significantly faster.

explanation of the term FIREWALL

some sites don't want external people to access their internal network. therefore they setup a secure subnet which can be accessed only through a dedicated gateway. such a gateway is known as a "firewall machine", because it allows only dedicated users or systems access to the secure subnet.

client side issues

since proxying is a standard feature of the available WWW clients (build into libwww), there is no need for any special client software or extended clients. environment variables are used to support proxying, there are individual variables for each protocol. e.g. proxying for HTTP would be defined by setting the http_proxy variable.
once the variables are defined, the specified system will be used as a proxy server, which means all network requests using the HTTP protocol will be re-directed to the proxy server.

server side issues

the proxy server has to be able to act as both, a client and a server. it acts as a server when accepting HTTP requests from a client, but it acts like a client to the remote server when it actually receives a document.

as an additional feature, caching has been introduced to the proxy server. the caching algorithm is quite simple, it stores a retrieved document into a local file. if a second client wants to access the same document, there is no need to re-transmit the document from the remote server. on the other hand, caching introduces a number of new problems, like how does the proxy server know, if the document has been changed on the remote server since the last transmission ? this problem has been solved by adding an conditional GET request, which would re-transmit a document only if it has been modified after the specified date and time.


i think caching is a very important issue because many network administrators already get nightmares about the World Wide Web and the huge amount of data that gets transmitted over the net over and over again. caching is probably one of the most effective methods against network overload.
this paper is available on the web.
1st_day_proxies / 13-jun-94 (ra) / reto ambühler