Facing Uncertainty in Link Recommender Systems

Jean-Yves Delort
Laboratoire d'Informatique de Paris 6, POLE IA
Université Pierre et Marie Curie
8, rue du Capitaine Scott
75015 Paris, France
Jean-Yves.Delort@lip6.fr

Bernadette Bouchon-Meunier
Laboratoire d'Informatique de Paris 6, POLE IA
Université Pierre et Marie Curie
8, rue du Capitaine Scott
75015 Paris, France
Bernadette.Bouchon-Meunier@lip6.fr

ABSTRACT

Most of prefetching and web recommender systems require a learning phase on a users behavior database. In most of the situations data are the outcome of a preprocessing task of HTTP log files which contain information intrinsically uncertain. This paper deals with modelization of this uncertainty in a link recommendation perspective. A new algorithm based on evidence theory is presented. In addition, new general characterizations of recommender systems are introduced.

Keywords

Link recommender systems, profiling, evidence theory

1. INTRODUCTION

Today, as web designers are more and more willing to integrate innovating technologies of services based on user profiles, researches on recommender systems (RS) have quickly increased. Online recommenders can be distinguished in two categories. Object recommender systems (ORS) aim at providing suggestions on a set of resources such as products or news, and are intended to motivate or encourage users to follow them. ORS algorithms come from collaborative filtering which take as an input a user preference database of items and try to discover associations. Then a set (or a list) of related items matching current interests of the user is proposed. Preferences are often, but not necessarily, explicit votes. Indeed, when behavior is not ambiguous (i.e. clearly the user likes what he sees) it will be interpreted as a user vote in favor of the object. For instance, if a user buys a product, it will be interpreted as an implicit vote in favor of it. In this study we focus on the second kind of RS, called Link recommender systems (LRS). Their purpose is to facilitate the navigation of the users on a web site and help them not to get lost when they are browsing through very large librairies of hypertext documents. In general, building user profiles and extracting preferences require to preprocess the transactions recorded into HTTP log files. This leads, because of structural drawbacks of this protocol, to an uncertain representation of user behaviors on a web site during a given time. The sequence of transactions associated with a user behavior during a given time is called a user session.
In this paper we put forward an original link recommendation algorithm that handles uncertainty in user sessions. It is based on Dempster-Shafer theory of evidence [1]. Also we show that, with classical performance criteria used to assess LRS, it is possible to turn prefetching algorithms into efficient link recommendation algorithms. However we underline that additional constraints should be introduced to describe the expected behahior of a link recommender system. Thus we propose new measures modelizing these constraints. The proposed algorithm has good preliminary outcomes with both classical and new criteria.

2. PREDICTION AND RECOMMENDATION

In order to yield predictions, prefetching algorithms find sequences of pages that are likely to be requested immediatly after the last page of the active session. Examples of such algorithms are variations around the N-GRAM model [3]. From the information of an active session, a recommender system suggests a set or a list of pages that can interest the user in some ways. For link recommender systems, it is usually supposed that, when a user accesses a resource, he implicitly votes in favor of it [5]. These systems, as well as ORS, are often assessed and compared thanks to classical measures of precision, recall and coverage [4]. Such measures imply that RS efficiency rely only on their ability to anticipate next user choices. However, in a link recommendation perspective the purpose is different. Pages can be suggested provided that they are useful for a user at a given time and they make easier his browsing. It doesn't necessarily coincide with his access to the page. Thus these measures are not suitable for LRS. Unlike prefetching systems, LRS are not required to anticipate pages immediatly after the last one accessed by a user. In fact, suppose that some pages are recommended and that the user first decides to skip them. If eventually he accesses these pages, then the system will be considered reliable because it will have proven its ability to anticipate his interests.

	Order	Consecutivity
Prediction	Yes	Yes
Recommendation	No	No

Summary of the differences between recommendation and prediction processes

The previous table summarizes the differences between recommendation and prediction processes when the main criterion of efficiency is the ability of LRS to anticipate next resources accessed by the users.

3. Suggestions by Cumulative Evidence ALGORITHM

Evidence theory introduced by Demspter and Shafer [1] is a powerful tool to handle problems with uncertain and imprecise data. We propose a new link recommender system the SCE (Suggestions by Cumulative Evidence) derived from this theory. The proposed model is intuitively based on the idea that all previously seen pages and their combinations must play a role in the decision process. Thus, to aggregate all evidence suggesting that a resource is connected to others will increase the global knowledge on each resource. For each admissible resource, i.e. that has a chance (statistically determined) to interest the user, a degree of belief that this page can interest the user according to his history, is computed. Then pages are ranked according to their degree of belief and a classical technique (support pruned criterion) is used to get rid of the pages that have low support in the learning database (see [3] for more details).

4. EVALUATION METRICS

As mentioned in section 2, link recommender systems suffer from a deep lack of suitable criteria to compare them and assess their performances. Of course such criteria depend on what is exactly expected from the functionalities of the link recommender. We present three new measures that should help numerically assessing general characteristics of link recommendation algorithms. First, we postulate that suggested pages should not vary too much between two consecutive recommendations after a few clicks of the user. In fact, if two consecutive recommended page sets are too different then the user may consider the system as unstable and will not pay any more attention to it after a while. If the system had understood what the user was looking for, it should have targetted a set of pages. Two other criteria are introduced to compare the whole set of recommended pages with implicit preferences deduced from users behavior. The first criterion evaluates the rank of popularity of the accessed pages while the other one compares the distributions of proposed and accessed pages on a web site.

5. CONCLUSION

We investigated the modelling of uncertainty of input data in a link recommender systems. This uncertainty is the main drawback that prevents the deployment of LRS. The link recommendation algorithm we propose is flexible because it is not based on strong hypotheses and it integrates all available information in the decision process. Three new characterizations of LRS efficiency have been introduced and their general scope makes possible their use for assessing LRS as well as ORS. We are currently testing our algorithm and studying its behavior under several situations. Preliminary results are encouraging showing excellent robustness and acceptable precision rate. Beside, as we are dealing with online algorithms, we are currently investigating different ways of optimizing the average computation time of our method.

6. REFERENCES

Shafer, G. A Mathematical Theory of Evidence. Princeton University Press, 1974
Mobasher, B., Cooley, R., and Srivastava, J. Automatic Personalization Based on Web Usage Mining. Communications of the ACM, 43(8):142-151, 2000
Deshpande, M., and Karypis, G. Selective Markov Models for Predicting Web-Page Accesses. Proceedings SIAM Int. Conference on Data Mining (SDM'2001), Apr 2001
Salton, G., and McGill, M. J. Introduction to Modern Information Retrieval. McGraw-Hill, 1983
Breese, J. S., Heckerman, D. and Kadie C. Empirical Analysis of Predictive Algorithms for Collaborative Filtering Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI, July, 1998