Metadata Mediation : Representation and Protocol

Tsuyoshi Sakata, Hiroyuki Tada, Tomohisa Ohtake
Digital Vision Laboratories
7-3-37, Akasaka, Minato, Tokyo, Japan
sakata@dvl.co.jp, tadaa@dvl.co.jp, otaket@dvl.co.jp

Abstract

We are developing an electric commerce mediator(ECM), a system which enables a consumer to retrieve data about merchandise sold on the WWW by designating the features of the merchandise. In this paper we will present (1) Multi-Schema Metadata Format (MMF), a logical structure for metadata sets , and (2) Metadata Mediation Protocol (MMP), a protocol set for interchanging metadata, which are designed for the ECM.

1. Introduction

When a consumer is looking for a merchandise on the WWW, he has the characteristics or the constraint of what he wants in his mind, such as "the color should be red", "the size should be XL", or "the price should be less than 50 dollars". This suggests that the retrieval service based on characteristics is desired. In fact, many electric commerce groups are following this way. For example, the Electronic Commerce Promotion Council (ECOM)[1] of Japan has a working group to establish a set of the characteristics of merchandise for electric commerce. However, there is and shall be "No" standard schema of metadata which covers all sorts of merchandise, although several schemata will be designed and used by groups of industries and manufacturers. Moreover, shops may add their own attribute to the original schema in order to appeal to their customers. This requires a retrieval system to transform retrieval formulae based on the schemata. To accomplish such inter-transformability of metadata, we have designed a logical form and syntax of metadata, Multi-Schema Metadata Format (MMF). We have also designed a metadata-exchange protocol for agents, Metadata Mediation Protocol (MMP), which enables distributed agents to store metadata, to resolve retrieval formulae cooperatively, and to maintain consistency of metadata among agents.

We are planning to start our commerce mediation service based on MMF and MMP by April 1997. We will provide a metadata input system, which simplifies the effort to put metadata in MMF for the merchandise pages.

In this paper, the Multi-Schema Metadata Format will be described in Chapter 2, the Metadata Mediation Protocol will be described in Chapter 3 and the structure of the Commerce Mediator that we are implementing now will be described in Chapter 4.

2. Multi-Schema Metadata Format

In this chapter, we describe the structure and syntax of the Multi-Schema Metadata Format (MMF). In the case of Online Shopping, schema of metadata of merchandise will be diversified. This means that the capability of transforming a schema of metadata and retrieval formulae to other schemata is required. To accomplish this inter-transformability of metadata, we have decided to add the ontology of attributes of schemata and the machine-readable definition of the schema onto the module of the metadata instance. We propose the following as the metadata structure, "Multi-Schema Metadata Format (MMF)". MMF consists of four parts, "metadata instance", "schema definition", "schema ontology", and "core ontology", as shown in Fig. 1.

Figure 1. The structure of MMF.

Each part may be distributed in the network, and they are linked to each other.

2.1 Metadata Instance

The metadata instance contains metadata of an object such as a merchandise. Metadata is represented by a set of attributes and their value. Note that a value here may have a strucuture and/or may be coded based on the existing coding system. Such a structured value is represented by a set of values with "aspects". A value coded based on an existing coding system, such as a currency unit or RFC1738, is represented with a "system" identifier.

Fig. 2 describes an example of the metadata instance in HTML format. In this example, the value for a "maker" attribute has two aspects, "name_of_company" and "telephone_number". The value for a "price" attribute has a value of 55 with system "USD".

The metadata instance is described in a HTML header with META-tag. The formats of the metadata of the resources on the Internet are proposed, as the IAFA template[2], URC[3] and SOIF[4]. Recently workshops supported by OCLC proposed Dublin Core[5] and the Warwick Framework[6]. Since the Warwick Framework deals with metadata in multi-schema, we adopted the Warwick Framework as a syntax of the metadata instance. The extentions we have made are

  1. delimiter-line, and
  2. aspect and system described above.
When two or more merchandise are described in a HTML page, metadata of each merchandise are separated by delimiter-lines. "Aspects" and "systems" are introduced as the qualifier in the Dublin Core. The syntax of MMF can be referred in http://www.dvl.co.jp/mediator/index.html.

<meta name="-----" content="begin">
<link rel=SCHEMA.CN href=http://www.dvl.co.jp/tags/cn.scm>
<meta name="CN.category" content="()Game Playstation">
<meta name="CN.name" content="()Formula 1">
<meta name="CN.purchase_page" content="(system=RFC1738) http://www.dvl.co.jp/purchase/aaa.htm">
<meta name="CN.sample" content="(system=RFC1738) http://www.dvl.co.jp/sample/aaa.htm">
<meta name="CN.maker" content="(aspect=name)DVL">
<meta name="CN.maker" content="(aspect=tel)+81-3-5411-9800">
<meta name="CN.price" content="(system=USD)55">
<meta name="-----" content="separate">
<link rel=SCHEMA.CN href=http://www.dvl.co.jp/tags/cn.scm>
<meta name="CN.category" content="()Game Playstation">
<meta name="CN.name" content="()Rage Racer">
<meta name="CN.purchase_page" content="(system=RFC1738) http://www.dvl.co.jp/purchase/bbb.htm">
<meta name="CN.sample-1" content="(system=RFC1738) http://www.dvl.co.jp/sample/scene1.htm">
<meta name="CN.sample-2" content="(system=RFC1738) http://www.dvl.co.jp/sample/scene2.htm">
<meta name="CN.maker" content="(aspect=name)DVL">
<meta name="CN.maker" content="(aspect=tel)+81-3-5411-9800">
<meta name="CN.price" content="(system=JPY)55000">
<meta name="-----" content="end">

Figure 2. An example of the metadata instance in HTML header.

2.2 Schema Definition

The schema definition gives a frame of metadata which is given by a set of attributes and their aspects. Restrictions of "systems" may be declared in the schema definition when necessary. The schema definition is described in SOIF (the Summary Object Interchange Format)[4]. An example of the schema definition is shown in Fig. 3.

We suppose that the schema definition is written by a schema designer, who may be different from a metadata author. For example, an expert in an industry, a mall or a shop designs schemata, and authors of metadata select an appropiate schema from them. Hence, the schema definition need not employ the same syntax of the metadata instance.

@SCHEMADEFINITION { http://cm.dvl.co.jp/schema/default.scm
Schema-ontology{x}: http://cm.dvl.co.jp/ontology/default.sot
Number-of-entries{x}: 6
Attribute-1{x}: name
Description-1{x}: name of the merchandise

Attribute-2{x}: purchase_page
Description-2{x}: url of the page selling the merchandise
System-2{x}: RFC1738

Attribute-3{x}: price
Description-3{x}: price of the merchandise. it should be JPY or USD.
System-3{x}: JPY
System-3{x}: USD

Attribute-4{x}: maker
Description-4{x}: data of the maker of the merchandise
Number-of-aspects-4{x}: 2

Aspect-5{x}: name
Parent-attribute-5:{x} maker
Description-5{x}: name of the maker of the merchandise

Aspect-6{x}: tel
Parent-attribute-6:{x} maker
Description-6{x}: telephone number of the maker of the merchandise
}
Figure 3. An example of the schema definition.

2.3 Schema Ontology

The schema ontology contains conceptual relations between attributes in different schemata for the inter-transformability of the metadata and retrieval formulae among schemata.

For example, suppose there are two video rental shops. The first, shop A, selects schema A. The second, shop B, selects schema B. The attribute "cast" is defined in the schema A. The value of "cast" is a list of the names of persons playing in the video movie. On the other hand, the attribute "leading_actress", more detailed than "cast", is defined in the schema B. A movie, in which Ms. Maedchen Amick plays, will be retrieved by a retrieval formula "A.cast == Maedchen Amick" at shop A, while it is not valid in shop B. Conversely, the movie will be retrieved by a retrieval formula "B.leading_actress == Maedchen Amick" at shop B, while it is not valid in shop A.

This suggests it is desirable to transform a retrieval formula in a schema to valid formulae in other schemata. In the case of searching with a retrieval formula "A.cast == Maedchen Amick" based on the schema A, it is desirable to rephrase the formula to the other formula "B.leading_actress == Maedchen Amick" to search in shop B. Conversely, in the case of searching with a retrieval formula "B.leading_actress == Maedchen Amick" based on the schema B, it may be desirable to stretch to the other formula "A.cast == Maedchen Amick" to search in shop A. The conceptual relation between "A.cast" and "B.leading_actress" is described in the schema ontology to accomplish these transformation. The relations used in the schema ontology are following two relations.

  1. Equality
    In the case wherein two attribute means the same relation between the object and the value.
    Ex. : "manufacturer" and "maker". Described as "manufacturer = maker"
  2. Inclusion
    The relation meant by the attribute A is a special one included in the relation meant by the attribute B.
    Ex. : "cast" and "leading_actress". Described as "cast > leading_actress"
In the schema ontology, attributes of the more than two schemata can be defined. An example of the schema ontology is shown in Fig.4.

@SCHEMAONTOLOGY { http://cm.dvl.co.jp/ontology/movie.sot
Last-modified{x}: Wed, 11 Dec 1996 17:26:00 GMT
MMF-version{x}: 1.0
Description-of-schema{x}: Ontology for the movie schema
Schema-definition-1{x}: http://cm.dvl.co.jp/schema/movie.scm
Id-of-schema-1{x}: MVS
Schema-definition-2{x}: http://cm.dvl.co.jp/schema/image.scm
Id-of-schema-2{x}: PCS
Schema-definition-3{x}: http://cm.dvl.co.jp/schema/defaults.scm
Id-of-schema-3{x}: CMS

Parent-attribute-4{x}: PCS.cast
Child-attribute-4{x}: MVS.leading_actress

Equal-attribute-5{x}: PCS.title
Equal-attribute-5{x}: CMS.name
}
Figure 4. An example of the schema ontology.

There are two processes for transforming a retrieval formula, consist of "Rephrase" and "Stretch". "Rephrase" means a transformation of an attribute in the formula to another attribute of equal or narrower meaning. "Stretch" is a transformation of an attribute to a wider attribute which might have a value designated in the formula. The conceptional relation for the preceding example is "A.cast > B.leading_actress". A merchandise of which metadata has a record "B.leading_actress : Maedchen Amick" in shop B is adequate for the retrieval formula "A.cast == Maedchen Amick". This means that the formula "attribute1 == value" can be transformed to the formula "attribute2 == value" when "attribute1 > attribute2" or "attribute1 = attribute2". This transformation is called "Rephrase". On the other hand, it is unclear whether a merchandise of which metadata has a record "A.cast : Maedchen Amick" in shop A is adequate to the formula "B.leading_actress == Maedchen Amick". Such a process of transforming the formula "attribute1 == value" to "attribute2 == value" when "attribute1 < attribute2" is called "Stretch". "Stretch" is useful for a search service to get more results when the merchandise items adequate for the retrieval formula designated by a user is few.

2.4 Core Ontology

In the case wherein a number of schemata is a few, the relations between schemata can be defined. While schemata increases, such becomes improbable since relations increase in a speed of a square. The reason why a number of schemata increases is that shops expand attributes or design their own schema to draw an attention to their own shop amongst the many other shops. Therefore we propose the core ontology as a central and standard ontology of attributes. An author who designs a new schema has only to define a set of relations from new schema to the core ontology. This avoids defining each relations from new schemata to the other schemata in designing a new schema.

3. Metadata Mediation Protocol

3.1 Overview

We have proposed that a metadata mediator be used as means which stores metadata and which supplies the metadata upon request. Metadata Mediation Protocol(MMP) is utilized to achieve communication between metadata mediators. A metadata mediator will be provided as a stand-alone unit, a part of a WWW server, or a part of a commerce management data base. It responds to a request for the registration, revision and deletion of metadata and also replies queries. As shown in Fig. 5, metadata mediators cooperate to reply queries and maintain metadata. To achieve a cooperation, a metadata mediator needs to inform its own capability to other mediators and to request other mediators to transfer some kind of metadata to it. In this chapter, MMP will be described which enables the metadata mediators to perform their various functions.


Figure 5. Cooperation between metadata mediators.

An example of a system wherein metadata repositories cooperate to manage metadata is Harvest[7]. In Harvest, gatherers collects metadata, a broker compiles the metadata collected. SOIF[4] is proposed as a format of metadata exchanged between the broker and each gatherer. Netscape also proposes the RDM[8], Resource Description Messages, based on Harvest. Using the RDM, the broker can exchange metadata, schema description and taxonomy description, with the gatherers or the other broker. However, since our metadata mediator works not only as a manager of metadata but also as a distributed problem solver, we need more functions for MMP than the RDM, such as a telling own ability or interest to the other mediators. MMP is used to achieve communication between metadata mediators. KQML[9] is proposed as an inter-agent communication language. KQML has a protocol set which enables agents to converse, thereby cooperating to solve tasks. We referred to KQML while designing the metadata mediation protocol. In the syntax of protocol, a message is described in a format almost identical to SOIF and transmitted, wrapped with http. MMP we have designed is based on the following scenarios:

3.2 Messages in MMP

Messages used in each scenario are described in the following tables.

3.3 Query Brokering

Mediator A shown in Fig. 6 may need to know metadata which have specific features. In this case, mediator A sends a query-request to mediator B the existence of which mediator A recognizes. Mediator B sends the query-response back to mediator A as a receipt acknowledgement , after adding query-id to the query-request. If mediator B has no metadata with the specific features and if the number of allowable hops written in the query-request, is positive value, mediator B forwards the query-request to mediator C, based on the knowledge that mediator C may have metadata suitable for the query, learnt from the advertising message from mediator C. The query-request has an attribute "from" which indicates the mediator to which a response must be sent. The value of the attribute "from" at the time of forwarding the query-request depends on the value of the attribute "transport-policy" of the original query-request. If this value is a "broker," mediator B should enter its address as the value of "from" in the forwarded query-request. If an answer is found in mediator C, the message flows as illustrated in Fig. 6. If the attribute "transport-policy" of the original query-request has the value of "recruit," the same value as "from" of the original query-request, i.e., mediator A, enters into "from" of the new query-request, and the message flows as depicted in Fig. 7.

Figure 6. A route of messages in "broker." Figure 7. A route of messages in "recruit."

3.4 Transmission of Changes in Metadata

Each mediator can transmit to any other mediator an object in which it is interested, in the form of a subscription request. For instance, while mediator A is collecting metadata about a particular feature (e.g., monochrome movies made in 1940s), it can request mediator B to transfer such metadata to it, by using the subscription message. Upon receiving the subscription-request, mediator B registers in mediator A such metadata it has, by using registry message. If the date of "time-to-kill" in the subscription message is set, until the date, mediator B transfers registered metadata which has the feature to mediator A.

Using a maintenance message, a mediator can request another to transfer any changes happen on some metadata item. Assume that mediator A has some metadata item transferred from mediator B and mediator A needs to know any changes on the original metadata item on mediator B. Mediator A can request mediator B to do this with sending the maintenance message. If the metadata designated by a maintenance-request are revised or deleted by the data set by the "time-to-kill", the data representing this change will be transferred from mediator B to mediator A.

3.5 Implementation

MMP is implemented on http since it has an affinity with the request/response model of http. Content-type of http, i.e., application/x-mmp, is written in the body, thereby to transmit a message wrapped with http. Each of the messages constituting MMP has three layers as shown in Fig. 8. A message relating to metadata manipulation, e.g., a registry message, can have a plurality of content-layers to transfer metadata in bulk-based form. Each layer of a message is written in SOIF. An example is illustrated in Fig. 8.

@COMMUNICATION{ -
Sender{x}: 192.168.1.1:8000
Receiver{x}: 205.120.1.1:8000
Date{x}:
}
@MESSAGE{ -
Type{x}:metadata-registry-request
Content-number{x}: 2
Expire{x}:
}
@METADATAREGISTRYBODY{ -
Schema{x}:CN http://www.dvl.co.jp/tags/cn.scm *
Content{x}:
CN.category{x}:()Game Playstation
CN.name{x}:()Formula 1
}
@METADATAREGISTRYBODY{ -
Schema{x}:CN http://www.dvl.co.jp/tags/cn.scm *
Content{x}:
CN.category{x}:()Game Playstation
CN.name{x}:()Rage Racer
}

Figure 8. An example of the message "metadata-registry."

4. Future Plan

4.1 Commerce Mediator

Commerce Mediator enables consumers to retrieve data of the merchandise available at Online Shopping sites, based on the characteristics of the merchandise written in MMF. Commerce Mediator will come into service in April 1997. A schematic view of Commerce Mediator is shown in Figure 9.

Figure 9. The flow in Commerce Mediator

Commerce Mediator comprises, in the most simple form, a metadata mediator and a commerce mediation proxy. The commerce mediation proxy functions as the gateway between the metadata mediator and a WWW browser.

Commerce Mediator System has three ways to gather metadata. First is the WWW page, which is managed by The Commerce Mediator Proxy, provided for inputting metadata. Second is the Metadata-Input System. The Metadata-Input System is an authoring tool which simplifies the effort of a seller to add metadata to HTML documents which a seller describes his merchandise and to prepare files which describe the metadata in MMF. The Metadata-Input System will be distributed free of charge. Third is the web robot on the site of Commerce Mediator. The Metadata-Description-Rule Extracting System extracts the pattern of positions in an HTML document where the features of merchandise are described. The Automatic Metadata-Collecting System works as a web robot and collects metadata from shops' HTML pages, based on the metadata-description rules extracted by the Metadata-Description-Rule Extracting System.

5. Conclusion

We have designed the metadata format(MMF) and the metadata mediation protocol(MMP), both are used in Commerce Mediator which enable consumers to retrieve data about the merchandise available at the WWW Online Shopping sites. MMF has the metadata as well as the description of the relationship between attributes and the description of attribute ontology, which insures the inter-transformability between different schemata. MMP enables data communication among distributed metadata mediators. The protocol makes it possible to register, modify, delete and retrieve metadata at any one of the metadata mediators. The metadata can therefore be managed in the same way at all metadata mediators. As a result, the metadata mediators can cooperate with one another. The detailed specifications of MMF and MMP can be obtained from http://www.dvl.co.jp/mediator/index.html.

6. Acknowledgment

I would like to thank Hiroyuki Suzuki-san for his invaluable comments on earlier drafts of this paper.

7. References

  1. Electronic Commerce Promotion Council (ECOM), (homepage), http://www.ecom.or.jp/eng/index.htm
  2. Deutsch P., Emtage A., "Publishing Information on the Internet with Anonymous FTP", http://info.webcrawler.com/mak/projects/iafa/iafa.txt
  3. Daniel R, "An SGML-based URC Service", http://www.nlc-bnc.ca/ifla/documents/libraries/cataloging/metadata/urc3.txt
  4. Hardy D., Schwartz M., Wessels D., "Harvest User's Manual", http://harvest.transarc.com/afs/transarc.com/public/trg/Harvest/user-manual/
  5. Weibel S, Godby J, et al., "OCLC/NCSA Metadata Workshop Report", http://www.oclc.org:5046/conferences/metadata/dublin_core_report.html
  6. Lagoze C, Lynch C, et al., "The Warwick Framework A Container Architecture for Aggregating Sets of Metadata ", http://cs-tr.cs.cornell.edu:80/Dienst/Repository/2.0/Body/
    ncstrl.cornell%2fTR96-1593/html
  7. Bowman C., Danzig P., et al., "Harvest: A Scalable, Customizable Discovery and Access System", Technical Report CU-CS-732-94, Univ. Colorado(1995)
  8. Hardy D., "Resouce Description Messages (RDM)", http://www.netscape.com/people/dhardy/rdm.html
  9. Finin T., Weber J., "Draft Specification of the KQML Agent-Communication Language", http://www.cs.umbc.edu/kqml/kqmlspec/spec.html



    Return to Top of Page
    Return to Technical Papers Index