Issues On Information Access Through the World-Wide-Web

Ching-Shan Peng, Jen-Yao Chung and Kwei-Jay Lin

Dept. of Electrical and Computer Engineering
IBM Thomas J. Watson Research Center
University of California, Irvine 92697
Yorktown Heights, NY 10598


The explosive popularity of Web browsers creates unprecedented high demands for decentralized and communication-based business applications. Since the Web technology is evolving, either the way people access information from the Net or how organizations run their businesses through the Web has been changing dramatically. This paper focuses on the key issues to the success of a Web server which has to enable the enterprise to meet the requirements in today's network-centric computing environment, including integration between the Web and legacy data, authoring and managing tools, and scalable server technology.

1. Introduction

There is no doubt that the World Wide Web is the most popular and exciting medium to access Internet/Intranet because of its well-adopted, platform-independent, and easy-to-use interface. The universal acceptance of Web technologies increases the demands for communication-based applications. In the past, all leading-edge commercial applications with high performance and reliability such as online transaction processing system (OLTP), information processing system and decision support system were run on legacy systems. In fact, most of today's mission critical real-time transactions still rely on those legacy data, be it customer order records, financial information, or employee data. The challenge for today's enterprise Information Technology is to integrate legacy system with the Web interface. By doing so, they can efficiently broaden the scope of communication without the complexity and high cost associated with the traditional two or three tiers client/server model. The deployment of Web enablement or so called "legacy linkage" is very crucial for the enterprise to utilize the ubiquitous Web environment without re-engineering the database and application codes that reside on a reliable and scalable environment (such as in mainframes). To bridge the gap, database vendors, Web server and application developers have been providing various middleware solutions. The infrastructures of those solutions are based on some common ground but with different approaches. The first part of this paper will talk about the legacy system connectivity.

With the ability to unleash legacy data on the Web, it then comes down to the issues of manageability and serviceability to account for a successful site. Web server needs to work with excellent authoring and developing tools in order to offer a high quality of service. These tools must be powerful enough to design and maintain all the static and dynamic information with ease, yet to be simple and straightforward to reduce the development costs. Meanwhile, the Web community is exploding. Millions of people and families are involved with Web surfing and shopping in their everyday life; more and more large scale commercial and e-commerce Web sites are going to be launched. The true test for the Internet is gathering on the horizon. Web servers must be able to provide service to a large amount of concurrent users, i.e., it's the scalability that will dominate the competition. We will introduce some authoring tools as well as scalable Web server technologies in later sections.

>Figure 1: System Overview

Figure 1: shows a high level system model of a scaleable Web service with legacy linkage and authoring tools support. We will discuss each of these pieces in this paper as follows: In Section 2, we analyze the Web middleware architecture and some approaches currently available. In Section 3, we introduce the Web authoring tools and their components. A few scaleable Web service technologies are described in Section 4. The paper concludes at Section 5.

2. Web Middleware Architecture

A middleware is a gateway program between applications and databases. Web middleware connects the legacy Information System and the Web server by allowing them to exchange messages and data. Normally, the middleware architecture includes a manager component that allows applications and database-specific drivers to call upon a standard Application Programming Interface (API) that acts as a pipe for accessing these resources. There are some well-known industrial standards such as Open Database Connectivity (ODBC) from Microsoft, Distributed Computing Environment (DCE) from Open Software Foundation, Distributed Relational Database Architecture (DRDA) from IBM, Enterprise Database Access (EDA) from Information Builders, and Java Database Connectivity (JDBC) from JavaSoft. Take Microsoft's ODBC interface as an example, it defines a minimal common API for relational databases and is based on Object Linking and Embedding (OLE) technologies. On the contrary, OSF's DCE is heavily based on Remote Procedure Call (RPC) for UNIX.

Figure 2: Web Access Model

In a general Web access model, as depicted in Figure 2, the Web server must be able to interact with legacy IS, where the precious data is, in addition, to communicate with front-end browsers through the HyperText Transfer Protocol (HTTP) over Internet's TCP/IP protocol. The fundamental techniques to interact with the front-end browsers could be Common Gateway Interface (CGI), Server Side Include (SSI), Application Programming Interface (API), and Mobile/Component-based Code (e.g. Java, ActiveX). On the back end, it usually requires a connection gateway to deal with network protocols (shown in Figure 3), such as the System Network Architecture (SNA) or Advanced Peer to Peer Communication protocol (APPC) for IBM hosts. The Web middleware can provide either a data or interface gateway for the back-end, depending on the functionality. It can be a stand alone server or, be integrated as part of the Web server. Note that TCP/IP has always been the native communication for UNIX systems. Therefore, legacy UNIX system and mainframe can support generating HTML and linking with Internet browser clients. IBM has already supported TCP/IP on both MVS and VM as well as HTTP, which makes mainframe itself, a Web server. IBM System/390 thus transforms the mainframe into a powerful client/server platform.

Figure 3: Web Server to Host Gateways

The Web middleware solutions mainly fall into the following categories. First, it is a Web server that supports major databases. All the major Web server vendors have integrated their servers with proprietary APIs (e.g. NSAPI from Netscape and ISAPI from Microsoft) such that they are capable of supporting one or more commercial DBMS directly. Examples are Microsoft's Internet Information Server, IBM's Internet Connection Secure Server, Netscape's LiveWire Pro, and Oracle's WebServer. These servers support popular database servers such as Microsoft's SQL Server, IBM's DB2, Informix OnLine Server, and Oracle7. Secondly, most popular commercial DBMS vendors like Informix, Sybase, Microsoft, Illustra, and Microrim all supply tools for Web connectivity. From the early stage, facility like Informix-ESQL/C, a CGI interface kit and Sybase's web.sql which is based on SSI and embedded SQL, to R:WEB of Microrim which integrates the auto-generation of HTML pages from their R:BASE database.

Many have been working on gateway solution for accessing legacy applications and databases. There are many varieties with different focuses: Microsoft's SNA Server specializes in leveraging investments in the IBM Hosts by providing LAN-to-SNA connection gateway and ODBC/DRDA driver. Sybase's CONNECT family delivers a middleware solution to link any data source or application into IS through APIs. XDB's HeatShield and IBM's implement for mainframe DB2. HeatShield is designed to offer PC-based DRDA link with support for ODBC and JDBC driver managers. Net.Data builds on the strong database access and reporting capabilities of DB2WWW Connection which is a CGI-BIN application and uses Web macro file approach combined with language environment support. NEON's Shadow Direct Servers aim at mainframe data sources, too. It integrates the client ODBC application with CICS (Customer Information Control System), IMS (Information Management System), and DB2 on the MVS server. NeXT's WebObjects supports direct mainframe database access through "Data Source Adapter". Its Conextions Builders Adapter Web-enables the mainframe 3270, 5250 applications. IBM's CICS Internet Gateway, Simware's Salvo Server, Teubner & Associate's Corridor and Attachmae's Emissary Host Publishing System all have similar approaches to use "screen scraper" to translate 3270 data stream into HTML on-the-fly.

Figure 4: Middleware Architecture

A conceptual representation of middleware architecture is shown in Figure 4 as a summary. Various component(s) included in different middleware approaches are by no means exclusive solutions. More innovative and integrated techniques are surely to come. At the same time, the Intranet revolution is emerging behind the company firewalls. Intranet is a cost-effective and versatile solution to enhance group collaboration and productivity for enterprises. By integrating Web browsing, legacy database/application access, and services like groupware and email, Intranet can efficiently coordinate department projects and workflow. Netscape, Microsoft, and Lotus all provide their own integrated server suite to meet the Intranet requirements: Based on ONE (Open Network Environment), Netscape deploys a platform-independent, open standards-based services offered by the full service Intranet named SuitSpot. The cross-platform/database access is supported through ODBC connectivity in addition to native support for database of Informix, Sybase and Oracle. Microsoft, on the other hand, utilizes their dominance of PC operation systems and continues to push Window-centric platform. Their BackOffice is designed to improve streamline business processes with a choice of client-server solutions that work on the Internet. It employs SNA Server for host integration which extends company LAN on a scaleable Windows NT platform. Lotus' Notes Server is another next-generation Web server which fully supports Web browses and Notes clients. It uses ODBC as a standard interface between Notes and DBMS, and also includes specific integration with Oracle, DB2 and Seabees SQL Server. The tightly integration with CICS-based transaction processing systems is achieved through the use of IBM's MQ Series for Notes.

3. Web Authoring Tool

The visual presentation and site organization play important roles for a successful commercial Web site. Advanced Web publishing and authoring tools can help to create attractive Web face with a powerful functionality. It provides a WYSIWYG design environment which requires little or even no knowledge of HTML at all. One example is Netscape's Navigator Gold, making easy the design task which used to be very tedious and time-consuming. A word-processing style page editor simplifies the job to specify proper tags all over the document. One simply enters the contents of the pages using conventional formatting styles like variable font sizing and text alignment, and it transparently translates them into corresponding HTML scripts. Images can be inserted into or removed from pages as easy as pushing a button then selecting the name of file. It also integrates the process of uploading/downloading pages to/from the Web server without a separate FTP program. Microsoft's Internet Assistant for Word offers a similar functionality by enabling Web attributes on top of the Microsoft Word.

However, the page-oriented HTML layout editor is just the first step. Since Web page embeds many hyperlinks, either a local file or a remote URL, an e-commerce site usually has a very complicated road map in which pages reference one another. To well maintain the system hierarchy, a much more powerful authoring tool is necessary. Emphasis must be put on managing

the relationship between pages in the site. The easier to create all pages from scratch as well as to update them afterward, the more productive the site will be. In this regard, systems like Adobe's PageMill/SiteMill and SGI's WebMagic are sort of enhanced media-rich page design tools. NetObjects Fusion from NetObjects uses visual, site-oriented approach toward Web design which treats each Web site as a whole and allows users to edit all of its aspects from overall structure to individual page components without any HTML coding. It provides centralized control over external file such as images, sounds, and applets as well as site-wide elements like headers/footers, buttons, and links. If a user edits any one piece, NetObjects Fusion automatically updates for the entire site. Microsoft's FrontPage 97 also keeps site-design in mind. Like NetObjects Fusion, it offers some advanced professional templates. It does not have the tight integration between Web pages as NetObjects Fusion does, but there are some nice tools inside a separate bonus package. such as Image Composer and Publishing Wizard.

The trend of Web authoring is going to be more user-friendly, visualized design tools with drag-and-drop, point-and-click, and auto-management. Flexible and intelligent pages/files maintenance will efficiently keep the business-related information up-to-date and consistent. As the sites are getting bigger, fancier, and the reference links are frequently getting updated, it really makes a lot of difference by employing a professional authoring tool.

4. Scaleable Web Service

Given the exponential growth of Internet and Web community, it is increasingly more difficult for organizations to properly predict what Web server needs in the future regarding both resources and hardware requirements. On the other hand, large business Web sites need far-flung, intense demands in order to make them profitable. As a result, a Web server must be able to grow with a seemingly endless increase in the number of user requests. In other words, a scaleable Web service architecture is needed for high availability and quality of service. Also, a scaleable system can eliminate the single point of failure inherent in single-server configuration. In this section, we will discuss the issues of building a scaleable Web service.

National Center for Supercomputing Applications (NCSA) has defined a combination of one-to-many mapping facilities necessary for building a scaleable Web service:

Generally speaking, Web salability is based on the flexibility to change the number of Web servers dynamically without interrupting Web service. To achieve that, the correspondent name mapping should be resolved dynamically too. NCSA's implementation allows for dynamic scaleability by rotating through a pool of HTTP servers that are alternately mapped to the hostname alias of the Web server. It includes a cluster of identical configured servers, a Round-Robin DNS for distributing requests across the cluster, a distributed file system (AFS) to maintain a synchronized set of documents across the cluster, and method to administrate the cluster. Because these technologies are inherently platform-independent which allows new types of server architectures to be easily integrated into the HTTP service. Besides, the configuration is transparent to the client.

Oracle's WebServer is based on a secure, scaleable architecture. The core of WebServer is the Web Request Broker (WRB), a high-speed mechanism for dispatching, load-balancing and adding third-party server extensions. Through the unique independent processing architecture, the WRB guarantees that third-party server extensions will not affect other parts of the system, thus delivering unparalleled reliability for users. An Oracle7 server may optionally be integrated for increased data processing power and scaleability. Oracle WebServer can translate and dispatch client information requests directly to the Oracle7 applications server using PL/SQL, Oracle's procedural language for Oracle7. This approach ensures that dynamic, data-driven Web applications, such as Oracle Applications for the Web, run much faster with Oracle WebServer than conventional CGI-based Web servers.

With the proliferation of multimedia applications and the need to support thousands of concurrent video streams, highly scaleable and available multimedia servers are necessary. IBM's solution for a robust, highly scaleable Web server is to use scaleable parallel computing. This technology offers flexibility by joining together from two to hundreds of computer processors to break down complex, data-intensive jobs to speed their completion. Parallel architecture can achieve computing power once available only in high-end mainframes while getting the benefit of scaling flexibility at the same time. IBM Scaleable and High Available Web Server offers services via a Scaleable POWERparallel System SP2 or a cluster of RS/6000 workstations. This server is built to support a large number of concurrent users, and has support for scaleability, high bandwidth, real-time multimedia delivery, fine-grained load balancing and high availability. Some of the technologies has already been used in the Web Server for 1996 Summer Olympics.

Moreover, IBM's high-end parallel processing system S/390 Parallel Sysplex is another candidate platform for scaleable Web serving. Because in the S/390 Parallel Sysplex environment, processing capacity can be added in granular increments; from the addition of single processor within an existing system to the introduction of one or more data-sharing systems. New system can be introduced into the Parallel Sysplex in a non-disruptive manner. Parallel Sysplex data-sharing technology enables systems to be added to the configuration with near-linear scaleability and nearly-unlimited capacity. Plus, IBM's commitment to open system had made MVS compatible with UNIX and supported HTTP. It is a promising platform to build a scaleable Web server architecture on legacy host, utilizing the highly reliable and available environment.

5. Conclusion

Web protocols make it simple to standardize the process for information access. However, it requires well-designed integration to run a successful commercial site; including legacy linkage, powerful management and scaleability. In this paper, we have emphasized on these issues with the demonstration of current vendor solutions. They are certainly going to be the essential parts for the future of Web.

Return to Top of Page
Return to Workshops Index