Ioannis Antonellis
Computer Science Dept.
Stanford University
CA, 94305, USA
H.4.3Information Search and RetrievalRetrieval models
Algorithms, Experimentation, Theory
sponsored search, link analysis, similarity metric, click graph
In sponsored search, paid advertisements (ads) relevant to a user's query are shown above or along-side traditional web search results. The placement of these ads is in general related to a ranking score which is a function of the semantic relevance to the query and the advertiser's bid.
Ideally, a sponsored search system has access to a database of available ads and a set of bids. Conceptually, each bid consists of a query 4#4, an ad 5#5, and a price 6#6. With such a bid, the bidder offers to pay if the ad 5#5 is both displayed and clicked when a user issues query 4#4. For many queries, there are not enough direct bids, so the sponsored search system attempts to find other ads that may be of interest to the user who submitted the query. Even though there is no direct bid, if the user clicks on one of these ads, the search engine will make some money (and the advertiser will receive a customer). The challenge is then to find ads related to incoming queries that may yield user click throughs.
For a variety of practical and historical reasons, the sponsored search system is often split into two components. A front-end takes an input query 4#4 and produces a list of re-writes, i.e., of other queries that are ``similar'' to 4#4. The query and its rewrites are then considered by the back-end, which displays ads that have bids for the query or its rewrites. The split approach reduces the complexity of the back-end, which has to deal with rapidly changing bids. The work of finding relevant ads, indirectly through related queries, is off-loaded to the front-end.At the front-end, queries can be rewritten using a variety of techniques developed for document search. However, these techniques often do not generate enough useful rewrites. Part of the problem is that in our case ``documents'' (the ads) have little text, and queries are very short, so there is less information to work with, as compared with larger documents. Another problem is that there are relatively few queries in the bid database, so even if we found all the textually related ones, we may not have enough. Thus, it is important to generate additional rewrites, using other techniques.
Our focus is on query rewrites based on the recent history of ads displayed and clicked on. The back-end generates a historical click graph that records the clicks that were generated by ads when a user inputs a given query. The click graph is a weighted bi-partite graph, with queries on one side and ads on the other. The schemes we present analyze the connections in the click graph to identify rewrites that may be useful. Our techniques identify not only queries that are directly connected by an ad but also queries that are more indirectly related. Our techniques are based on the notion of SimRank [2], which can compute query similarity based on the connections in a bi-partite click-graph. However, in our case we need to extend SimRank to take into account the specifics of our sponsored search application.
Figure 1 illustrates two sample click graphs. The left nodes are queries issued by users and the right nodes correspond to ads. An edge between a query (left node) and an ad (right node) indicates that at least someone clicked on the ad after issuing that query. If we look at the similarity scores that Simrank computes for the two query pairs car - automobile and car - motorcycle, we can see that sim(car, automobile) is always less than sim(car, motorcycle) no matter how many iterations of the Simrank computation we perform. Table 1 tabulates these scores for the first 7 iterations. In fact, we can prove that sim(car, automobile) becomes eventually equal to sim(car, motorcycle) as we include more iterations.
In this example, the SimRank similarity of car to motorcycle is less than that of car to automobile (if we run limited iterations) or at best they are equal (if we pay the high price of running to convergence). Either result is counterintuitive since there are more ads that connect car and automobile. To give car and automobile higher similarity, we introduce the notion of ``evidence of similarity.''
Iteration | sim(``car'', | sim(``car'', |
``automobile'') | ``motorcycle'') | |
1 | 0.4 | 0.8 |
2 | 0.56 | 0.8 |
3 | 0.624 | 0.8 |
4 | 0.6496 | 0.8 |
5 | 0.65984 | 0.8 |
6 | 0.663936 | 0.8 |
7 | 0.6655744 | 0.8 |
We also modify the underlying random walk model of Simrank so that it exploits the edge weights of the click graph. Again we use the evidence scores, but now we perform a different random walk that utilizes the edge weights in its transition probabilities. More details can be found on the extended version of the paper [1].
The first is a manual evaluation, carried out by professional members of Yahoo!'s editorial evaluation team. Each query - rewrite pair is considered by an evaluator, and is given a score on a scale from 1 to 4, based on their relevance judgment. The query rewrites that were more relevant with the original query assigned a score of 1, and the least related assigned a score of 4. The judgment scores are solely based on the evaluator's knowledge, and not on the contents of the click graph. The evaluation metrics we used were precision/recall (based on the editorial scores), the query coverage (number of queries for which each method manages to provide at least one rewrite) as well as the query rewriting depth (number of query rewrites that a method provides for a given query). In summary, evidence-based simple Simrank outperforms simple Simrank and Pearson both in query coverage, rewriting depth and precision/recall. Weighted Simrank maintains the query coverage and rewriting depth of evidence-based Simrank but substantially boosts the precision of the rewrites.
Our second evaluation method addresses the question of whether our methods made the ``right'' decision based on the evidence found in the click graph. The basic idea is to remove certain edges from the click graph and to see if using the remaining data our schemes can still make useful inferences related to the missing data (desirability prediction). Weighted Simrank outperforms all other alternatives here.
More details on the evaluation method and the results can be found on the extended version of the paper [1].
We have discussed the problem of query rewriting for sponsored search. We propose Simrank to exploit the click graph structure and we introduce two extensions: one that takes into account the weights of the edges in the click graph, and another that takes into account the ``evidence'' supporting the similarity between queries. Our experimental results show that weighted-based Simrank is the overall best method for generating rewrites based on a click graph.
Even though our new schemes were developed and tested for query rewriting based on a click graph, we suspect that the weighted and evidence-based Simrank methods could be of use in other applications that exploit bi-partite graphs. We plan to experiment with these schemes in other domains, including collaborative filtering.