Developing a Web-based Question Answering System

Zhiping Zheng
School of Information
University of Michigan
zzheng@umich.edu

Abstract

AnswerBus is an open-domain question answering system based on sentence level Web information retrieval. It accepts users' natural-language questions in English, German, French, Spanish, Italian and Portuguese and provides answers in English. Five search engines and directories are used to retrieve Web pages that are relevant to user questions. From the Web pages, AnswerBus extracts sentences that are determined to contain answers. Its current rate of correct answers to TREC-8's 200 questions is 70.5% with the average response time to the questions being seven seconds. The performance of AnswerBus in terms of accuracy and response time is better than other similar systems.

Keywords

question answering, open-domain, QA specific dictionary, information retrieval

1. Introduction

Research for automated Question Answering (QA) as a mechanism of information retrieval has been undertaken since 1960's. At the initial stage, the developed systems were often confined within specific domains ([6], [3]). Recently, researchers have been attracted to the task of developing open-domain QA systems based on collections of real world documents, especially the World Wide Web. Several examples of this kind of systems include LCC([2]), QuASM, INOAUT([1]), Mulder and Webclopedia ([4]). QA technologies are approaching rapidly to practical applications.

As an open-domain question answering system based on the Web, Answerbus endeavors to enhance existing techniques and also adopt new techniques to improve both the accuracy and speed of question answering. After receiving users' questions in natural language, it uses five engines and directories (Google, Yahoo, WiseNut, AltaVista, and Yahoo News) to retrieve Web pages that potentially contain answers. From the Web pages, AnswerBus extracts sentences that are determined to contain answers.

Figure 1 describes the working process of AnswerBus. A simple language recognition module will determine whether the question is in English, or any of the other five languages. If the question language is not English, The module will send the original question and language information to AltaVista's translation tool BabelFish, and obtain the question that has been translated into English.

Figure 1 Working process of AnswerBus

The rest of the process is comprised of mainly four steps: 1) select two or three search engines among five for information retrieval and form search engine specific queries based on the question; 2) contact the search engines and retrieve documents referred at the top of the hit lists; 3) extract sentences that potentially contain answers from the documents; 4) rank the answers and return the top choices with contextual URL links.

2. Answer Forms

AnswerBus returns sentences as answers to user questions. As listed in Table 1, different QA systems return answers in different forms.

Table 1 Answer Forms Adopted by QA Systems

QA System	Output
AnswerBus	Sentences
AskJeeves	Documents
IONAUT	Passages
LCC	Sentences
Mulder	Extracted answers
QuASM	Document blocks
START	Mixture
Webclopedia	Sentences

START returns a mixture of sentences, links and sometimes, images. This form may suit different user needs, nonetheless, it requires the support of a special knowledge base. To build and maintain a big knowledge base requires tremendous effort. In the meantime, a knowledge base seems to limit START's ability to answer questions outside the domain of its knowledge base.

Mulder tries to extract exact answers and it reached 34% correct rate in top-1 answers for TREC-8 questions. However, 34% may still be far away from real users' expectation. The extraction of exact answers can lead to precision loose, and leave out contextual information that is important for users to judge whether an answer is a correct one.

IONAUT uses passage list as its output form. It provides rich contextual information, nevertheless, demands large user efforts to dig out the answer from a whole passage.

QuASM returns large blocks of text with block boundaries not clearly defined. AskJeeves uses document list like general search engines do. Similar to passages, users for the two systems are required to extract answers themselves.

Based on these observations, Answerbus chooses to return sentences, with the goal of minimizing user effort to extract answers, and at the same time, providing enough contextual information for them to make a quick judgment on the validity of the answers. Several other QA systems, including Webclopedia, and LCC, use the output form as AnswerBus does. More detail description about sentence segmentation and extraction modules used in AnswerBus can be found in [7].

3. Web Resources

Most QA systems listed in Table 1 try to take advantages of the wealth of the Web. The most effective way to use Web resources for question answering may be to index the whole Web specifically for QA tasks. At the current stage, no one seems to have accomplished this.

START uses several selected web sites as part of its knowledge base. Webclopedia uses locally stored TREC corpus. Other QA systems use one or several search engines to retrieve related documents. AnswerBus currently chooses Google, Yahoo, YahooNews, Alta Vista and Wisenut. Using multiple search engines can cover more knowledge domains, meanwhile, increase the system's ability of error tolerance. Among the five search engines, Google, Alta Vista and WiseNut are three general purpose search engines for answering general questions. YahooNews indexes current news all over the world. It is intended to answer time sensitive questions. Yahoo is a human generated web directory and it assures higher quality of selected web sites. It is also used to answer general questions.

Aiming to incorporate more specific domains, AnswerBus is considering adding MedlinePlus as an additional search engine. Also, YahooNews doesn't index every new news stories. AnswerBus is thinking of modifying International News Connection ([8]) and adding it as another Web resource.

4. Query-Sentence Matching Formula

In any language, a sentence is not a simple combination of a set of words. It is a string of words with logical and lexical structures. Thus, oftentimes, in real life, a person does not need to hear every word in a sentence in order to effective capture its meaning. AnswerBus uses following experimental matching rule to determine whether a sentence is potentially an answer to the question:

It is not necessary for a sentence to match every word in the question in order to be an answer candidate.

Or more specifically, AnswerBus uses the following formula to select sentences that may be an answer. In the formula, Q is the number of words in a query, q is the number of matching words in a retrieved sentence.

More detailed explanation about this formula can be found in [7].

5. Evaluation

TREC 8's 200 questions were used to evaluate AnswerBus's question answering performance and also compare its performance to that of four other similar systems. Table 2 presents the performance of AnswerBus and other four systems. It provides the numbers of correct answers in the systems' top five and top one answers, the standard NIST scores, the maximal, minimal and average response times measured in seconds, standard deviation of response times, and the average lengths of returned answers.

Table 2 demonstrates that AnswerBus outperforms other similar systems, in terms of both accuracy and response time. AnswerBus also returns more concise answers than other systems.

Table 2 Performance of Online Question Answering Systems


Systems	Correct TOP 5	Correct TOP 1	NIST Score	T_max (s)	T_min (s)	T_mean (s)	T_{std dev}	L_mean (byte)
AnswerBus	141	120	64.18%	15.06	3.79	7.20	3.07	141
IONAUT				44.88	2.78	12.51	6.81	1312
LCC	97	75	41.73%	342.52	4.30	44.24	32.63	178
QuASM	13	7	4.45%	284.29	2.61	20.72	33.92	1766
START	29	29	14.50%	62.07	2.02	9.84	7.45

No similar data were obtained for other two famous QA systems, Mulder and Webclopedia, because of their off-line status at the time when this evaluation was conducted.

As a conclusion, in comparison to other Web-based QA systems that are currently accessible on the Web, AnswerBus demonstrates a higher accuracy of question answering and a faster speed. The enhanced performance can be attributed to mainly three features of the system.

Every module of the system, excluding the language translation part, has been written by the same author, with the same architecture and programming language. Each module provides the exact functions that are required by other modules. In other words, the system was designed to perform with high efficiency.
AnswerBus adopted a number of new techniques, including, QA specific dictionary, sentence-question matching formula, light NLP components, and dynamic named entities extraction. During the development cycles, these techniques were found to effectively enhance the system's performance.
AnswerBus also exerted effort to enhance the techniques that have been widely used in other QA systems, for example, question-type detection module, question-sentence matching, search engine selection, search engine specific query forming, pseudo-answer detection, and HTML sentence parsing.

6. References

Steven Abney, Michael Collins, and Amit Singhal. Answer Extraction. Proceedings of ANLP 2000. Seattle, WA. April 29 - May 3, 2000.
Sanda Harabagiu, Dan Moldovan, Marius Pasca, Mihai Surdeanu, Rada Mihalcea, Roxana Girju, Vasile Rus, Finley Lacatusu, Paul Morarescu and Razvan Bunescu. Answering Complex, List and Context Questions with LCC's Question-Answering Server. Tenth Text REtrieval Conference (TREC-10). Gaithersburg, MD. November 13-16, 2001.
Lynette Hirschman and R. Gaizauskas. Natural Language Question Answering: The View from Here. Natural Language Engineering, 2001.
Eduard Hovy, Laurie Gerber, Ulf Hermjakob, Michael Junk and Chin-Yew Lin. Question Answering in Webclopedia. Ninth Text REtrieval Conference(TREC-9). Gaithersburg, MD. November 13-16, 2000.
Boris Katz, From Sentence Processing to Information Access on the World Wide Web. AAAI Spring Symposium on Natural Language Processing for the World Wide Web. Stanford, California. 1997.
Cody C. T. Kwok, Oren Etzioni and Daniel S. Weld. Scaling Question Answering to the Web. Tenth World Wide Web Conference. Hong Kong, China. May 1-5, 2001.
Zhiping Zheng. AnswerBus Question Answering System. Proceeding of HLT Human Language Technology Conference (HLT 2002). San Diego, CA. March 24 - 27, 2002.
Zhiping Zheng. International News Connection: A Real-time Online News Filtering and Classification System. Proceedings of Workshop on Mathematical/Formal Methods in Information Retrieval (ACM/SIGIR MF/IR 2001). New Orleans, LA. September 13, 2001.