MS WebScout: Web Navigation Aid and Personal Web History Explorer

Natasa Milic-Frayling
Microsoft Research Ltd
7 J J Thomson Avenue
Cambridge CB30FB, United Kingdom
+44 (0) 1223 479 700
natasamf@microsoft.com
Ralph Sommerer
Microsoft Research Ltd
7 J J Thomson Avenue
Cambridge CB3 0FB, United Kingdom
+44 (0) 1223 479 700
som@microsoft.com
Robert Tucker
Microsoft Research Ltd
7 J J Thomson Avenue
Cambridge CB3 0FB, United Kingdom
+44 (0) 1223 479 700
i-robert@microsoft.com

ABSTRACT

WebScout is a prototype application that creates a complete archive of Web pages accessed by the user and a rich record of the user navigation, including the user and system annotations of seen or previewed pages. In addition, WebScout enhances the Browser with (1) the natural language processing of text and image analysis, (2) indexing and search capabilities for text and images, and (3) a module for creating visual representations of structured data. Based on this fundamental layer of the WebScout we have built three features: LinkInspector, SessionNavigator, and HistoryExplorer.  

The LinkInspector allows the user to pre-view the hyperlinked content within a zoomed out Browser placed next to the inspected link. The content of the pre-viewed page can be highlighted with respect to the recent search query or other active context.  The SessionNavigator provides easy access to pages seen during a browsing session. It organizes the pages into WebTrails and provides a linear view of the navigation by transforming the navigation hierarchy into a sequence of ‘branches’. The HistoryExplorer provides a textual search over the contents and annotations of the pages and a color search over page thumbnails. It enables the user to browse the archive using graphical representations of past navigation patterns, e.g., WebTrails or visited Web sites.  

Keywords

Navigation history, personal Web archive, link preview, Web history search, navigation aid, Web browser

1. INTRODUCTION

Capabilities of Web Browsers have evolved to accommodate increased users’ demand for accessibility to a variety of contents and services on the Web. They have also changed in accordance with our improved understanding of how users communicate via the Internet. We expect that the recent increase in affordability of local storage and high computer processing power will mark further development of the Browser. Similarly, we expect that the encouraging prospects for a wide availability of the fast Internet access will play an important role. With this in mind, we went on exploring ways how to address the following three problems:

We have built the WebScout, a prototype application that extends the capabilities of the Microsoft Internet Explorer (IE) with three features: LinkInspector, SessionNavigator, and HistoryExplorer. In the following sections we briefly describe the main features of the WebScout v.1.0 and refer to the related research work. Our future research will involve extensive evaluation of the underlying concepts and ideas.

2. WebScout Features

An essential function of the WebScout is its mechanism for capturing and storing locally

In addition to the rich data storage, WebScout incorporates the natural language processing capability and a searchable index of both textual and image data. In the current implementation, the pages are archived independently from the standard IE Cache store. Since the IE Cache has the primary purpose of aiding the page loading, it is not suitable for a permanent archive without a significant redesign. 

 

Figure 1: LinkInspector showing the content of a linked document, highlighted with respect to the query terms

 

Technical details of WebScout data archiving and search capabilities will be presented in future publications. In the following three sections we describe the basic concepts behind the three features built upon the foundation of WebScout: LinkInspector, SessionNavigator, and HistoryExplorer.

2.1 LinkInspector - Attention and Context Management

During Web browsing, users are continuously performing a risk assessment and deciding whether or not to follow a hyperlink on a page. In order to help the users with this task researchers have devised intelligent agents that proactively search the Web and inform the users of the relevance of immediate and further removed information (see [1], [2], [3]). We fully recognize the benefit of this approach. However, we also note that a significant gain can be achieved by enabling the user to preview the hyperlinked content without having the view of the current page obstructed.   

We implemented LinkInspector, a Browser feature that presents a hyperlinked page in a zoomed out browser of a specified thumbnail size (see Figure 1), visible within the current page next to the inspected link. In this fashion the user can inspect the content and the format of the linked page and decide whether to commit to the full Browser view of the page. In the current implementation of the LinkInspector downloads the page on user demand (either when the user is hovering over the link with the mouse or clicking on a tool-tip icon that appears above the link). 

Similarly to [3] and [4], the WebScout enhances the Browser with the natural language processing capability and thus enables the user to preview links on the search result page with highlights that correspond to the matches of query terms.  

2.2 SessionNavigator - Recent Navigation History

One dominant feature of Web navigation is the user’s ‘linear’ experience of navigation as determined by the time of access to a Web page.

 

Figure 2: SessionNavigator Toolbar showing the WebTrails (indicated by black and red arrows) and the graph view of the current WebTrail

 

However, in order to be effective the user has to keep a mental note of the hierarchical structure and access sequence of the Web pages. This mental overload is to large extent due to the type of navigation support provided in commercially available Browsers. In our attempt to address this issue we followed the strategy taken in previous works (see [5],[6]): we capture the user navigation events and use thumbnails as visual representation of pages. However, we also explore two novel ideas.

First, we partition the user’s navigation into logical units, referred to as WebTrails. Our hypothesis is that the navigation session can be automatically divided into sequences of page visits that form groupings meaningful to the users. In the current implementation, a WebTrail is a sequence of pages that begins with a user request for a page by specifying its URL (either explicitly by typing a URL or implicitly by activating a link from the Bookmarks). Alternatively, one can chose finer grain trails allowing each search query to mark the beginning of a trail. Each WebTrail is marked by the title of the initiating URL or a search query. 

Second, we provide a linear view of navigation by ‘flattening’ the navigation hierarchy into a sequence of ‘branches’. We repeat a branching point whenever showing the new branch, thus enforcing the user’s linear experience of the navigation and providing easy access to pages that serve as ‘hubs’.  

The basic view of the navigation is facilitated by the SessionNavigator Toolbar that shows a sequence of thumbnails in the order of page access with a clear demarcation of WebTrails. As the user navigates the Web the thumbnail images are appended to the current WebTrail. The user can also choose to view a graphical representation of individual WebTrails (see Figure 2).

2.3 HistoryExplorer - Storage and Access to the Personal Web Archive

One significant drawback of relying on the Web as a source of information is the transient nature of Web page contents. Thus the ability to store and access the true representation of the content seen by the user has many benefits. While an obvious one is the ability to re-examine information at later dates, the archive can also enable a more reliable user profiling than the navigation patterns (see [7]) or search topics alone (see [3],[4]).

In order to illustrate the possible uses of the WebScout archive we implemented a search facility that enables the user to filter by date, pose text queries over the content of the Web pages and stored search queries, and specify a predominant color on the page.

The color search is based on the analysis of thumbnail images while color query specification is facilitated by a predefined color palate. We also support browsing based on site organization or WebTrail organization of pages. 

Our future work will include the evaluation of these search strategies for retrieving archived information.   

4. REFERENCES

  1. Lieberman, H. Letizia: An Agent that assists Web browsing. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI 95). 1995.
  2. Lieberman, H. Autonomous interface agents. In Proceedings of the ACM Conference on Computers and Human Interface. 1997.
  3. Milic-Frayling, N. and R. Sommerer. MS-Read: Context Sensitive Document Analysis in the WWW Environment. User Modeling in the Web Environment. Microsoft Research Technical Report: MSR-TR-2001-63,  http://research.microsoft.com/scripts/pubs/trpub.asp, 2001.
  4. Milic-Frayling, N. and R. Sommerer. MS-Read: User Modeling in the Web Environment. In Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. 2001.
  5. Hightower, R., Ring L., Helfman, J., Bederson, B., and J. Hollan Graphical Multiscale. Web Histories: A Study of PadPrints. In Proceedings of ACM Conference on Hypertext (Hypertext'98) ACM Press, pp. 58-65, 1998.
  6. Ayers, E., Z. and J. T. Stasko: Using Graphic History in Browsing the World Wide Web. Technical Report GIT-GWU-95-12, May 1995.
  7. Fu, X., Budzik, J., and K. J. Hammon. Mining Navigation History for Recommendation. In Proceedings of Intelligent User Interfaces Conference 2000 (IUI’00), pp. 106-112, 2000.