1st day (wednesday, may 11)

opening ceremony - keynote by tim berners lee - semantic similarity between search engine queries - duplicate detection in click streams - Thresher - keynote by yuji inoue - current trends in the integration of search and browsing - poster reception

opening ceremony:

tatsuya hagino from keio university - co-chair of WWW2005 - officially welcomed the 900 attendees from all over the world. the conference was then officially opened by koichi iida, the chiba prefecture.

[ opening ceremony at the conference hall ] [ tatsuya hagino ]


keynote by tim berners lee:

this year, tim berners lee (tbl) mentioned the semantic Web only briefly. the first part of his keynote was about the main problem real people have with the Web. tbl said, there is a strong analogy between the Web and literature: according to ted nelson, there are the two gods of literature: the writer and the reader. and there is a constant battle between these two gods about the information space. on the Web, there are a number of good examples for this battle, such as cascading style sheets (CSS), where the author tries to layout the document exactly the way she or he wants and the reader tries to overwrite these settings with his or her own style sheet or browser preferences. there are also some unwanted parties such as banners and pop-ups. tbl said, the browser is called "user agent", so it should act on behalf of the user. but he as a user doesn't want pop-ups, so if the browser opens pop-ups, it acts not on behalf of him as a user, but on behalf of the author, abusing the user's pixels on the screen.
tlb said, we must teach the users not to trust anyone on the Web. according to surveys, the users' number one problem with the Web are spam, phishing, pop-ups, viruses and computer system destruction.
in the second part of his talk, tbl announced the "W3C mobile Web initiative", which will provide best practices for mobile Web content.

the slides to his talk are available on the Web.

[ tim berners lee delivers his keynote ] [ tim berners lee ]


semantic similarity between search engine queries using temporal correlation:

this session was about finding semantically related search engine queries based on their temporal correlation; for example, how certain terms were looked up on particular holidays or names of people who appeared in the news. they developed methods to reduce time and resources for statistical analysis and did some experiments with real data from MSN search engine to actually prove their methods.

duplicate detection in click streams:

commercial online advertising may follow various business models. one way to monetize advertising is to count the number of clicks on a particular ad. in a standard setting, an advertising commissioner acts as a middle person between the publisher and the advertiser. the later pays the former based on the number of clicks on its ad. this mechanism may seduce a publisher to execute an excessive number of clicks on the advertiser's ad itself in order to make more money. therefore the advertising commissioner is interested in means to detect duplicate click streams in order to prevent fraud. the session explained various methods to detect such duplicate click streams based on advanced filter technics and statistical analysis.

Thresher: automating the unwrapping of semantic content from the World Wide Web:

Thresher is an extension to the existing user interface to the Web (browser) that allows even non-technical users to extract semantically related information from websites by simply highlighting one set of data and adding descriptions. Thresher then analyses the content of the website, finds similar data sets and applies the same semantic rules.

keynote by yuji inoue (NTT):

the afternoon keynote was given by yuji inoue, senior vice president at NTT and was titled "innovation for a human-centered network". he started with some impressive numbers about telecommunication in japan: this country has 120 mio inhabitants, thereof 107 mio have Internet access, there are 91 mio mobiles phones and 59 mio fixed phones and 19 mio have broadband Internet access (optical + ADSL > 10 MBit). there are already more new broadband subscriptions than new ADSL subscriptions.

NTT wants to support japan's government to move from e-Japan to u-Japan (u = ubiquitous) by providing high-speed or super-high-speed services to everyone.

the five most important aspects are:

  1. conversion to broadband (IP-based and optical)
  2. convergence of communication and broadcasting
  3. fixed-mobile convergence
  4. safety and security
  5. reduction of environmental loads (solar energy)

another goal is to replace 50% of the conventional phone connections with IP-based optical fiber connections by 2010.
NTT will provide triple-play services - upstream + downstream + TV (1 GBit + 100 TV stations) - based on do-it-yourself installable optical cable.

NTT supports the deployment of new medias such as high definition television (HDD-TV) and digital cinema by developing new technologies such as data reduction and compression devices to transfer the big amount of data required by these new medias.

current trends in the integration of search and browsing:

this panel was about the three schools that have emerged over the last 10 years from the basic browsing and searching discovery paradigms on the Web:
  1. the search-centric school: they argue that free text search is so effective and widely accepted, that users can basically satisfy all their needs via simple queries.
  2. the taxonomy navigation school: they claim that users have difficulties expressing informational needs, thus browsing is more effective when users don't quite know what they are looking for.
  3. the metadata centric school: they say, that the use of metadata for narrowing large sets of results - known as "multi faceted search" - is the best solution.

even though the five panelists wear different colored hats, there was not much disagreement between them. they advocated for their position, but at the same time, i felt some agreement that the world is not black and white, which means both paradigms - searching and browsing - have their pros and cons. in my opinion, susan dumais, senior researcher at Microsoft Research, brought it to the point, when she said that there is an enormous redundancy of information on the Web and in many cases, it doesn't really matter which document you find, as long as you find one that provides the information you need.

poster reception:

the poster reception was held in hall 8 - the same place where we got our food. the posters are available from our Web server.
i'd like especially to point out the poster by erik wilde, ETH zürich.

[ poster reception in hall 8 ] [ poster booth ] [ erik wilde ]


