WWW94: World Wide Web proxies
presented by ari luotonen, CERN and kevin altis, Intel
a WWW proxy server provides access to the web for people behind a firewall.
in addition, the proxy server does some caching of hypertext documents, so
if many users access the same documents, response time will be significantly
explanation of the term FIREWALL
some sites don't want external people to access their internal network.
therefore they setup a secure subnet which can be accessed only through a
dedicated gateway. such a gateway is known as a "firewall machine", because
it allows only dedicated users or systems access to the secure subnet.
client side issues
since proxying is a standard feature of the available WWW clients (build into
libwww), there is no need for any special client software or extended
clients. environment variables are used to support proxying, there are
individual variables for each protocol. e.g. proxying for HTTP would be
defined by setting the http_proxy variable.
once the variables are defined, the specified system will be used as a proxy
server, which means all network requests using the HTTP protocol will be
re-directed to the proxy server.
server side issues
the proxy server has to be able to act as both, a client and a server. it
acts as a server when accepting HTTP requests from a client, but it acts
like a client to the remote server when it actually receives a document.
as an additional feature, caching has been introduced to the proxy server.
the caching algorithm is quite simple, it stores a retrieved document into
a local file. if a second client wants to access the same document, there
is no need to re-transmit the document from the remote server. on the other
hand, caching introduces a number of new problems, like how does the proxy
server know, if the document has been changed on the remote server since the
last transmission ? this problem has been solved by adding an conditional
GET request, which would re-transmit a document only if it has been modified
after the specified date and time.
i think caching is a very important issue because many network administrators
already get nightmares about the World Wide Web and the huge amount of data
that gets transmitted over the net over and over again. caching is probably one
of the most effective methods against network overload.
this paper is available on the web.
13-jun-94 (ra) /