Multimodal Presentation as a Solution to Access a Structured Document

Philippe Truillet, Bernard Oriola, Nadine Vigouroux
IRIT Logo UMR CNRS 5505
118, Route de Narbonne
F-31062 TOULOUSE Cedex
FRANCE

e-mail: {truillet,oriola,vigourou}@irit.fr

Abstract

The Word Wide Web (WWW) is an opportunity to revolutionize access to information by blind people. The various tools available to access the WWW are beginning to be used and appreciated [ICC 96]. However, as their number is growing, these tools still do not sufficiently consider the importance of structure, within electronic documents, in providing navigation capabilities. This paper first reports some solutions to access WWW at the present time and, next, it explores the way the multimodal presentation concept could be a solution. The paper concludes on a description of the multimodal presentation system of a HTML document.

Keywords: Structured Electronic Document, Multimodal Presentation, Non-Visual Consultation, Blind People.


1. Introduction

This article shows how, by way of multimodal presentation, the blind can efficiently and effectively access and read World-Wide Web documents. Unfortunately, current browser tools heavily rely on visual methods of interaction and presentation, a fact that makes the task of providing alternatives to the blind extremely difficult.

In fact, a major problem encountered by blind people is that visual information —whose textual objects (making up a document) are perceived by sighted people "quasi-simultaneously"— is only sequentially accessible to the blind. Therefore, blind people cannot rely on the same outer reality as the one perceived by sighted people.

On another hand, several works [BAU 94], [THO 93] have pointed out that the structure of the document is a sense carrier and increases comprehension in a reading task. In these conditions we consider that both the accessibility of the WWW and the viewing of a document cannot be reduced to a simple reading (cursory glance) of the content.

In fact the World Wide Web document is divided into two different layouts:

and

HTML (HyperText Markup Language) allows for the representation of both text and structure of the document by putting tags. Our SMART user interface interprets these tags so as to present the document to the user in the best way possible according to this user's sensory capabilities.

The non visual consultation of HTML documents will be carried out by the SMART user interface:

After a brief survey of some existing tools, we will discuss advantages and limitations of the current systems available on the market. Then we turn to our concept of information accessibility which is based on a multimodal presentation designed in the SMART user interface.


2. Access To The WWW by Blind People

Offering access to Internet affords a challenging opportunity to the blind to get information. Internet browsers give access to huge databases (electronic newspapers, electronic teaching materials, presentation of remote lectures, tourist and travel information, ...) in electronic form related to professional activities as well as to socio-cultural ones. This is a great opportunity for people who cannot access printed documents. In order to enable blind people to access and view information in electronic form, an effective user interface has to be designed.
Since 1994 several user access tools have been realized. Though all such tools have the common purpose of allowing blind people to access the World-Wide-Web three kinds of approaches have been worked out:

2.1. pwWebSpeak™ v. 1.3

This application is a speaking Internet browser which uses a TTS synthesis. While running this program, the user can reach any web page. pwWebSpeak™ [PWW 97] and then reads the content of this page. A user navigates from link to link by means of tabulation keys. At any time, the speech output device synthesizes the name of the active link; the validation of the current link permits to reach a new HTML file.

This system is relatively easy to use, even though two remaining weaknesses have been identified:


2.2. DosLynx 0.8 alpha, Netscape Navigator 3.01 and MS-Internet Explorer 3.0

DosLynx [DOS 95] is a Dos browser, while Netscape Navigator [NET 96] and Internet Explorer (IE) [MIC 96] are window-displayed browsers with standard browser functionalities ("Open URL", "Save", etc.). They can be used by blind people, allowing them to read documents with a screen reader running under Dos or Windows 95 operating systems.

When one uses TTS synthesis, the application works practically as pwWebSpeak does. The main advantage is that the computer is still accessible when one leaves the Internet browser.

2.3. W3 Access For The Blind

W3-Access For The Blind [PER 95] is another approach. "W3-Access" may be used with any browser and any layout. A proxy is placed between the W3 Server (where documents are located) and the W3 browser. The proxy is a filter: it transforms the original HTML file before sending it out to the client. Therefore, no specific material is necessary. Users (whether sighted or not) can continue to read documents with their own browser. The blind user just needs a Windows screen-reader like Jaws [JAW 96] or Slimware Windows Bridge [WIN 96].

The main drawback with these three access tools to WWW is the fact that all the tags of the document structure are displayed, whether the presentation is vocal or braille, or both. For example, the portion of HTML code, "<A HREF="http://....>", will be "displayed" as "Anchor http://...".
Another weakness of existing access tools concerns the unacceptable cognitive load they accrue on a user.

It is in this context that the multimodal presentation concept has been defined. The main justification for this concept is to support multimedia presentation of both document content and interpretation of structure tags.


3. Towards a Presentation Model

The aim of this chapter is to design a user interface which affords efficient and effective access to HTML documents. We mention, as does [BAU 94], that material layout (visual presentation —i.e., typographic and semantic structure) is a sense carrier. Many representation models of electronic information, such as SGML and HTML used by the World-Wide Web, offer this kind of representation power. Moreover, studies in the field of cognitive research demonstrate the importance of visual structure in a document comprehension task [THO 93].

Providing a non-visual interface to access structured electronic documents means, on the one hand, replacing the visual modality by several audio and tactile modalities and, on the other hand, working out solutions to materialize a document layout [VIG 94]. The goal is to provide blind "readers" with maximum information (both text and material layoutwise) against minimum cognitive effort on their part. The design of a multimodal presentation relies on an effective cooperation between several output modalities. Here cooperation is based on the results of Bernsen's study on the output modalities [BER 94] for the representation of information in the acoustic and tactile modes.

Another problem with our presentation model is owed to the hardware features of TTS synthesis. Two main difficulties occur:

The SMART model of presentation tries to at least de-emphasize these weaknesses if not to get rid of them.


4. The SMART User Interface: Towards a Multimodal Presentation Model

To enable blind people to use electronic information, effective methods of interacting and of viewing need to be developed. The SMART user interface includes these facilities both to navigate and explore an HTML document by use of the multimodal presentation concept.

4.1. Description of the system

The specifications of the SMART user interface [VIG 95] have been defined, on the one hand, to give blind people the semantic and pragmatic content of documents through the replacement of visual modalities by audio and/or tactile ones and, on the other hand, to offer the same "action" functions for the consultation (fast scanning, annotation processes, etc.).

The SMART System is not only multimodal but also multilingual, for it uses TTS synthesis operating on documents in either French, English or German that can be read by blind people.

Architecture
Figure 1. Architecture of the System

The hardware configuration (See Figure 1) can include, and this according to the user's preferences:

Four strategies to access and view HTML documents are defined:

4.2. Functions

The main characteristics of the SMART user interface are the multimodal presentation and the feedback notification. The multimodal approach consists in translating the visual layout of documents through several non-visual modalities. To allow reading, marking and annotating processes, three kinds of functions were defined: navigation, notification and annotation.

4.2.1. Navigation Functions

Classical reading functions are available such as the complete reading of the document, reading the text summary, reading a paragraph, a sentence or only one word. The reader can stop the reading whenever. Thus, these functions allow the non-sighted persons to quickly "scan" the document as easily as would a sighted user.

Moreover, blind people can activate and deactivate the presentation models of the document structure.

4.2.2. Notification Functions

Whenever they want, users can ask, from the SMART System, their contemporary location within the document as well as the title/author(s) of the document they are reading. They can also insert bookmarks in the document in order to quickly come back to the same location.

These functions are useful for a better memorization of the documents read.

4.2.3. Annotating Functions

Users can annotate documents (textual and vocal notes). They can also read, modify and delete notes.

These functions allow blind people to personalize documents while they, also, enable easier memorization.

4.3. Multimodality as a solution to provide material layout information

4.3.1. Document Structure and Multimodal Strategy

The SMART user interface interprets the HTML DTD (Document Type Definition). This structure is mainly used to present information in a visual form. Few systems developed for blind people take into account the document structure except for the hypertext links.
The structure enables sighted user to have an overview of the document, for a visual parsing of headers can do the job. This same "parsing" concept could perhaps be applied to blind people's reading habits by synthesizing the headers.

To better distinguish the text from its typographic features, during a reading task by blind people, various multimodal presentation strategies are available on the SMART System. Either:

or or or

These presentation strategies can be chosen by the blind according to their preferences, to their cognitive load as well as to the interaction context of the reading. This represents some advantage over previous systems described in 2. above.

Moreover, making use of the structure can be a solution to the handling of multilingual documents. In fact, blind people are often confused when the language in which the links are expressed is different from the language in which the document is written.

Two solutions can be provided. Either an additional tag (e.g. <LANG="FR">) is inserted into the structure to identify the language in which the document is written, or a lexicon is used to identify the language by means of morpho-syntactic information.

A multilingual TTS synthesis device is then necessary to switch quickly to the language of the newly linked document. The user interface will switch whenever the change in language is detected.

4.3.2. Notification Functions

As mentioned earlier, the main problem concerning the reading of a document with existing tools is the lack of audio and/or vocal feedback notification.
Often, blind people are lost [CHE 96]: they have problems keeping track of their own contemporary location within the document being read by the synthesis devices.

In SMART user interface, blind people are notified with a sound whenever an event occurs such as the end of the text, the beginning of the text, a possible activation of an hypertext anchor, etc.


5. Conclusion

Even if the World-Wide Web represents an extraordinary challenge to blind people, significant and effective methods of interacting with the WWW must be worked out. They will have to be based on multimodal presentation and notification functions.

SMART is an example of this new challenge in the area of user interface for disabled people, based on document structure interpretation. The SMART platform is currently being assessed by users with varying needs, abilities and preferences.


6. Acknowledgment

The authors are grateful to Prof. J.F. Malet, CSU Sacramento, for a prompt translation of the original text in French.


7. References

[BAU 94] Bauwens B., Engelen J., Evenepoel F., Tobin C. and Wesley T., "Structuring documents: the key to increasing access to information for the print disabled" in 4th International Conference on Computers for Handicapped People, Vienna, September 1994, Lecture Notes in Computer Science, 860, pp. 214-221, Springer Verlag, Berlin, ISBN 3 540 58476 5.

[BER 94] Bernsen N. O., "A Revised Generation Of the Taxonomy of Output Modalities", The AMODEUS Project, ESPRIT Basic Research Action 7040, TM/WP11.

[CHE 96] Chen Chaomei, Rada Roy, "Interacting With Hypertext: A Meta-Analysis of Experimental Studies", in Human-Computer Interaction, 1996, Volume 11, pp. 125-156.

[DOS 95] DosLynx v. 0.8 Alpha Release Information.

[ICC 96] 5th International Conference, ICCHP'96, Linz, Austria, July 1996, Sessions "Access to Documents".

[JAW 96] JAWS for Windows - Henter-Joyce Inc. - 2100 62nd Ave. N., St-Petersburg, FL 33702 (USA).

[MIC 96] MS-Internet Explorer v. 3.0.

[NET 96] Netscape Navigator v. 3.01.

[PER 95] Perrochon L., Kennel A., "World Wide Web Access for Blind People", IEEE Symposium on Data Highway, Bern, October 1995.

[PWW 97] pwWebSpeak ™ - The Productivity Works Inc. - 7, Belmont Circle, Trenton, New Jersey 08618 (USA).

[THO 93] Thon B., Marque J.C., Maury P., “Le texte, l'image et leurs traitements cognitifs.”, Colloque Interdisciplinaire du CNRS, “Images et Langages”, Multimodalité et Modélisation Cognitive, 1993, Paris, pp. 29-39.

[VIG 94] Vigouroux N., Oriola B., "Multimodal Concept for a New Generation of Screen Reader", 4th International Conference (1994), Computers for Handicaped Persons, Springer-Verlag, Vienna, 1994, pp. 154-161.

[VIG 95] Vigouroux N., Seiler F.P., Oriola B., Truillet Ph., "SMART - System for Multimodal and Multilingual Access, Reading and Retrieval for Electronic Documents.", 2nd TIDE Congress, Paris, 26-28 April 1995.

[WIN 96] Slimware Windows Bridge Inc. - Stoney Creek - Ontario (Canada).





Return to Top of Page
Return to Posters Index