|
Introduction:
Networked
knowledge organization systems typically contain objects of mixed media
types which are described using a multitude of diverse metadata schemas.
Hence machine understanding of metadata descriptions which conform to
schemas from different domains is a fundamental requirement for access to
information within networked knowledge organization systems. In
particular, there are three main scenarios in which interoperability
among metadata descriptions is required:
- To enable a single search interface
across heterogeneous metadata descriptions;
- To enable the integration or merging of
descriptions which are based on complementary but possibly
overlapping metadata schemas or standards;
- To enable different views of the one
underlying and complete metadata description, depending on the
user's particular interest, perpective or
requirements.
Metadata
descriptions from different domains are not semantically distinct but
overlap and relate to each other in complex ways. Achieving
interoperability between such metadata descriptions via
manually-generated one-to-one crosswalks [6] is useful, but this approach
does not scale to the many metadata vocabularies that will develop. A
more scalable and cost-effective approach is to exploit the fact that
many entities and relationships - for example, people, places, creations,
organisations, events, etc. - occur across all
of the domains. The Harmony project [1] has been investigating this more
general approach towards metadata interoperability and in the process has
developed the ABC model and vocabulary [2].
The
hypothesis is that such an approach will lead to more efficient, scalable
machine-translations between heterogeneous metadata descriptions. To test
this hypothesis and to evaluate the interoperability capabilities of the
ABC model, we applied it to some real multimedia examples and analyzed
the results of mapping from the ABC model to various different metadata
domains using XSLT [3]. This work revealed serious limitations in XSLT's ability to support flexible dynamic semantic
mapping. To overcome this, we developed MetaNet
[4], a metadata term thesaurus which provides the additional semantic
knowledge that is non-existent within declarative XML-encoded metadata
descriptions.
This paper
describes the optimum metadata mapping approach determined from applying
the ABC model to a small test set of multimedia examples. This approach
combines:
- the ABC event-aware metadata model,
developed within the Harmony project, as the underlying model for
scalable generic mappings between domain-specific vocabularies,
with;
- XSLT for parsing XML descriptions and
performing structural and syntactic mapping, and;
- MetaNet, a metadata term thesaurus, to provide
the semantic knowledge required to enable semantic mapping between
metadata terms from different domains or standards.
2 Definitions of Terms
This section
defines the key terms used throughout the remainder of the paper:
- Metadata - data about data - or more commonly
"descriptive information about Web resources". The use of
standardized descriptive metadata can substantially improve the
discovery and retrieval of relevant networked resources. Different communities
or domains have developed their own standardized metadata vocaularies to meet their specific needs.
- Vocabularies - shared terminologies with commonly
agreed-upon semantics for a domain. Common vocabularies enable search
engines, agents, authors and users to communicate within a domain.
- Schemas - provide a standard way of defining standard
domain-specific vocabularies by defining a common set of elements,
their semantics and the relationships between the elements.
- Ontology - a formal description of the concepts,
roles and relationships that exist for an agent or community of
agents. Ontologies provide a shared and
common understanding of a domain that can be communicated across
people and applications, and play a major role in supporting
information exchange and discovery.
- Thesaurus - the vocabulary of a controlled
indexing language, formally organized so that the a priori
relationships between concepts (for example "broader" and
"narrower") are made explicit. [7]
- Metadata Thesaurus - a thesaurus (defined according to ISO
2788 standard for monolingual thesauri [7]) which defines the
relationships between metadata terms from different domain
vocabularies.
3
Related Work
Thesauri have
been used to improve the precision and recall of information retrieval
systems for over 30 years. The introduction of automated information
retrieval has caused a dramatic increase in the demand for vocabulary
control, particularly in the last decade. Examples of well known thesauri
used to provide authority control over the terms used for indexing
documents in the bibliographic, medical and cultural domains respectively
are: the Library of Congress Subject Headings (LCSH) [9], the Medical
Subject Headings (MeSH) [10] and the Art and
Architecture Thesaurus (AAT). [11] In addition, thesauri have been used
within information retrieval systems to improve retrieval effectiveness
by providing semantic roadmaps. [12], [13], [14]
Since the
emergence of the Internet, a great deal of effort has been invested in
the development of metadata vocabularies to enable the exchange and
discovery of information across different applications and domains.
Metadata vocabularies such as Dublin Core [15], USMARC [16], INDECS [17],
MPEG-7 [18], FGDC [19], IEEE LOM [20] and CIDOC CRM [21] provide
standardized sets of descriptive elements to enable the exchange of
resources for specific applications or domains. Although these standards
enable interoperability within domains, they introduce the problem of
incompatibility between disparate and heterogeneous metadata descriptions
or schemas across domains.
A
literature survey reveals many different proposals for improving
interoperability between domain-specific vocabularies, thesauri and ontologies in the context of information retrieval
and exchange. These range from database schema integration [22], to the
use of ontologies in organizing and integrating
networked information systems (e.g. OBSERVER [23], InfoSleuth
[24], OntoSeek [25])
to the merging of monolingual [26] and multilingual thesauri. [8] Two of
the major research issues have been categorizing the complex kinds of interthesaurus semantic relationships which exist
[27] and automating the detection of these relationships during the
merging process. [28]
More
recently the approach to merging thesauri has been to represent them
formally using RDF Schemas [29] and to use inference engines to automate
the merging - such as has been proposed in the Ontology Inference Layer
(OIL). [30]
In this
paper we are not so much concerned with the specific process by which MetaNet is generated or with expressing the complete
set of possible term relations (as described in ISO 2788) in MetaNet. Our primary objective is to generate a
thesaurus which specifies (an albeit simplified)
set of semantic relationships between metadata terms from a number of
different domain schemas relative to the ABC underlying vocabulary (the
preferred terms) and hence also to each other. Our goal is then to
demonstrate how this semantic knowledge can be represented in a
machine-readable format (RDF Schema) and extracted and combined with the
syntactic and structural mapping capabilities of XSLT to enable the
implementation of flexible dynamic mappings between metadata descriptions
from different domains.
4
Overview of the ABC Underlying Metadata Model
The Harmony
Project [1] is investigating a generic approach to metadata
interoperability through the development of an event-aware metadata
model. The ABC model [2] defines a set of fundamental classes which
provide the building blocks for expression (through sub-classing) of
application-specific or domain-specific metadata vocabularies. The base
classes, shown below, were determined by analysing
commonalities between different communities' metadata models (including:
Dublin Core [15]; INDECS [17]; MPEG-7 [18]; CIDOC CRM [21]; IFLA [31].)
- Resources
- Events
- Inputs and Outputs
- Acts
- Context
- Event Relations
ABC adopts an
event-aware view for modeling the relationship between the various
manifestations of a creation. This event-aware view provides semantically
clear attachment points for the association of properties among the
various manifestations, events and contributors (agents) involved in a
resource's lifecycle. In addition, ABC provides a multiple views
philosophy for metadata modeling and recipes for inter-conversion between
those views. If life-cycle information is required, the event model can
be used. When single resource metadata is needed, a resource-centric
state model is used. Figure 1 shows the UML representation for the ABC
metadata model.
Omitted
Figure 1. UML representation of the ABC metadata model
5
A Simple Example
To test the ABC
model and evaluate XSLT for metadata mapping, we considered the following
simple illustrative example:
"A
resource which is a 130 min audio (MP3) recording of a 'Live at Lincoln
Center'
performance. The Orchestra is the New York Philharmonic. The performance
was on April 7, 1998 at 8 pm Eastern Time. The musical score performed is
'Concerto for Violin'. Copyright for the entire performance is held by Lincoln
Center
for the Performing Arts."
First we
describe this resource using the ABC model. We then attempt to map from
the ABC description to Dublin Core, MPEG-7 and ID3 [32] descriptions
respectively, using XSLT. Figure 2 illustrates the two steps involved in
mapping from the ABC metadata model to resource-centric models such as
Dublin Core, MPEG-7 and ID3:
- The structural mapping step involves
transferring event properties to the output resource and creating a
relationship between the output and input resources associated with
the event.
- The semantic mapping step involves
mapping the properties attached to the output resource to semantically-equivalent
properties in the output domain.
Appendix A contains the corresponding ABC,
Resource-centric, Dublin Core, ID3 and MPEG-7 descriptions.

Figure
2. Transformation from the ABC event-aware model to three different
resource-centric models
5.1
Structural Mapping Rules
For events
which generate an output resource from an input resource, the
transformation from an event-aware metadata model to a simple
resource-centric metadata model consists of the following steps:
- The Date, Time and Place properties
within the Event's Context node can be qualified using the Event
Type and transferred to the target output resource, e.g. Date.Performance, Time.Performance,
Place.Performance;
- The Role property of each Act associated
with an event becomes a qualifier on the Agent property which is
attached to the target output resource and its value is the Act's
Agent Name, e.g. the Agent.Orchestra
property has value "New York Philharmonic";
- A Relation property arc is generated from
the event type (e.g. Performance -> Relation.isPerformanceOf)
and is attached to the target output resource. The value of this
property is the patient input resource of the event (e.g.
"comp523").
- All other existing properties of the
input and output resource remain the same.
Other
inheritance and metadata derivation rules may be possible but these
require further investigation.
For example, a Description property for the output resource can be
generated from the Event Type and the input resource's Title e.g.
"Performance of 'Concerto for Violin'". Or in many cases, the Title
property can be inherited by the output resource directly from the Title
property of either the input resource or the event.
6 An Evaluation of XSLT for Metadata Mappings
The Extensible
Style Language (XSL) [3] consists of a transformation language (XSLT) and
a formatting language. The transformation language XSLT (which acts
independently of the formatting language) provides elements that define
rules for how one XML document is transformed into another XML document.
The transformed XML document may use the markup and DTD of the original
document or it may use a completely different set of tags. The ability of
XSLT to transform data from one XML representation to another makes it
appear to be ideal for metadata interchange applications.
An XSL
document contains a list of templates and rules. A template rule has a
pattern specifying the trees it applies to and a template to be output
when the pattern is matched. When an XSL processor formats an XML
document using an XSL style sheet, it scans the XML document tree looking
through each sub-tree in turn. As each tree in the XML document is read,
the processor compares it with the pattern of each template rule in the
style sheet. When the processor finds a tree that matches a template
rule's pattern, it outputs the rule's template. This template generally
includes some markup, some new data and some data copied out of the tree
from the original XML document.
Using XSLT
and the Xalan [33] XSLT processor we developed
XSL programs for transforming the ABC description above to DC, ID3 and
MPEG-7 descriptions, respectively. Appendix B
shows the resulting XSL files.
The
mapping implementations in Appendix B revealed
that although XSLT works well for the structural mapping from an event
model to a resource-centric model based on the set of rules described in
Section 3.1, it is inadequate for implementing flexible dynamic semantic
mappings between metadata vocabularies. This is due to:
- XSLT's limited capabilities for handling
variable input descriptions based on schemas which are not tightly
constrained;
- The non-existence of
machine-understandable semantic information in declarative
XML-encoded metadata descriptions;
- Processor-dependent handling of input
parameters and procedural code extensions;
- Limited string manipulation and
comparison functions, e.g. it is not possible to perform
case-insensitive string comparisons within XSLT.
The mappings
revealed that if the input XML descriptions are relatively fixed and
tightly constrained, then the semantic mappings can be hardwired and XSLT
is adequate. But if the input descriptions are at all variable or
unpredictable (e.g. undefined domain specific sub-classing and attributes)
then XSL simply cannot cope. Cawsey
investigated the use of XSLT for customizing RDF descriptions, reaching
similar conclusions. [34]
Below are
listed a number of possible approaches to handling the semantic mapping
problem. The approach chosen is a balance between simplicity on the one
hand, and flexibility or scalability on the
other. The wider the targeted scope of interoperability, the more
difficult it is to achieve accurate, precise mappings. Below is a list of
mapping approaches in increasing order of both scope and difficulty:
- Hardwire crosswalks between metadata
terms from specific metadata domains (easy, but only works for fixed
input);
- Extract mappings from a pre-defined
multiple-domain mapping matrix;
- Determine the semantic mappings from a
metadata term ontology;
- Determine the semantic mappings from a
generic ontology such as WordNet;
- Determine the semantic mappings from a
dynamically generated ontology created by using inferencing
to merge multiple domain-specific ontologies.
By reducing the
scope of the problem to interoperability between existing metadata
standards, then the fully generic approaches (e.g., 4 and 5 above) become
unnecessarily complex. Hence in the remainder of this paper we
investigate the less complex but still moderately flexible approaches (2
and 3) based on a mapping matrix and a metadata
term ontology, respectively.
7
Semantic Mapping via a Mapping Matrix
The second
approach in the list above involves linking a mapping matrix to the XSLT
processor. The mapping matrix explicitly defines the semantic mappings
between a fixed set of metadata vocabularies from a number of different
domains. Figure 3 illustrates such a mapping matrix. If XPath [35] is used to specify the elements, then to
some extent both the structural and semantic mappings can be defined.
Omitted
This
approach has certain debilitating limitations, however. A matrix is only
capable of specifying mappings which involve fairly simple one-to-one
mappings, and a two-dimensional matrix will only work if the mappings are
symmetrical in both directions across all the domains. If the mappings
are asymetrical then the matrix becomes highly
complex and multi-dimensional. However, the primary limitation of this
approach is that it simply does not scale - as the number of domains
grows and the mappings become asymmetrical, then the matrix becomes
excessively complex and unwieldy.
8
Development of MetaNet, a Metadata Term
Thesaurus
Rather than
limiting the semantic mapping to a fixed number of domains/vocabularies
(i.e. the number of columns in the mapping matrix), a more generic
approach is to extract the mapping dynamically from a thesaurus of
metadata terms, generated by formally defining relationships between
metadata terms from a number of different domains' standardized
vocabularies.
8.1 Intrathesaurus and Interthesaurus Relations
The ISO2788
standard for the identification and documentation of monolingual thesauri
[7] identifies the following types of intrathesaurus
relations:
- hierarchical
- associative
- equivalence
The
hierarchical relation occurs between concepts having
"broader/narrower" meanings. This can be further specialized
into the generic (BTG/NTG), whole-part (BTP/NTP) and instance (BT/NT)
relations. For the sake of simplicity, we have chosen only to model the
BTG/NTG relation (a common practice among thesauri developers) and the
equivalence relation, and not to include associative relations within MetaNet.
The
ISO5964 standard for the documentation and establishment of multilingual
thesauri [8] identifies the following types of interthesaurus
relations:
- exact equivalence
- partial equivalence
- single to multiple equivalence
- inexact equivalence.
These relations
indicate that the semantic relations between terms from different
metadata vocabularies are likely to be much more complex than one-to-one
exact equivalence and that even "exact equivalence" will be an
approximation. However, because the scope of our problem is limited to
relations between terms in a number of standardized English metadata
vocabularies, then we can expect the frequency of more complex mappings
to be less than for general natural language thesauri. For the first
draft of MetaNet, we decided only to consider
exact and partial equivalence relations and to combine them in the ET
relation which defines equivalent/overlapping terms. If two different
domains use two different metadata terms which are ETs
in our thesaurus then we make the assumption that the domains are
referring to semantically equivalent concepts.
Consequently the metadata term
thesaurus which we have developed, MetaNet
[4], contains only preferred terms (the ABC core vocabulary),
equivalent/overlapping terms (ET), narrower terms (NT) and broader terms
(BT), and attempts to encompass terms from the most significant and
widely-used metadata vocabularies (Dublin Core, IFLA, IEEE LOM, INDECS).
8.2 Description of MetaNet
The objective
of the MetaNet thesaurus is to provide
the semantic knowledge required to enable machine understanding of equivalence
and hierarchical (subtyping) relationships
between metadata terms from different domains. The scope of this
thesaurus is limited to the most significant metadata models/vocabularies
used for describing attributes and events associated with resources and
their life cycles. This encompasses metadata vocabularies from the
bibliographic, museum, archival, record keeping and rights management
communities. It has been developed by performing WordNet
[36] searches using the core terms from the ABC vocabulary, and
extracting those synonyms and hyponyms which could conceivably be used in
a metadata scheme to represent the original core term. In addition, the
results have been compared with the vocabularies of the DC, INDECS, IFLA,
IMS and CIDOC CRM vocabularies to check that the majority of the terms
used in these metadata dictionaries have been incorporated into the
thesaurus.
A
machine-readable RDF Schema representation of this thesaurus has been
developed. [37] The RDF and RDF Schema elements, Class, subClassOf, property, subPropertyOf are used to define the
hierarchical/subtyping and entity/attribute
relationships between metadata elements. The RDFS label element is
used to specify semantically equivalent terms which may be used. The ABC
core vocabulary is used as the top-level set of preferred terms. Although
this thesaurus has been generated manually, it could conceivably be
generated automatically by using inferencing
mechanisms to merge RDF Schemas from different domains, as has been
proposed in the Ontology Inference Layer (OIL). [30]
For
example, consider "Agent", which is a core term of the ABC
vocabulary and hence a preferred term in the MetaNet
thesaurus. Semantically equivalent terms for "Agent", commonly
used within other metadata vocabularies, include:
actor, contributor, creator, player, doer, worker, performer
Possible
narrower terms or hyponyms for "Agent" include:
author, composer, artist, musician, . . etc.
Table 2 is
an excerpt from the RDF Schema which illustrates the representation for
the "Agent" metadata term as well as its equivalent terms and a
partial hierarchy of its narrower terms.
|
Table 2. Excerpt from the RDF Schema
|
<?xml version="1.0"?> <rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdfs:Class rdf:ID="Agent"> <rdfs:comment xml:lang="en">The resources which contribute to or act in an event. Typically agents are people, groups of people, organisations or instruments.</rdfs:comment> <rdfs:label xml:lang="en">Actor</rdfs:label> <rdfs:label xml:lang="en">Contributor</rdfs:label> <rdfs:label xml:lang="en">Creator</rdfs:label> <rdfs:label xml:lang="en">Player</rdfs:label> <rdfs:label xml:lang="en">Doer</rdfs:label> <rdfs:label xml:lang="en">Worker</rdfs:label> <rdfs:label xml:lang="en">Performer</rdfs:label> <rdfs:subClassOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/> </rdfs:Class> <rdfs:Class rdf:ID="Author"> <rdfs:label xml:lang="en">Writer</rdfs:label> <rdfs:label xml:lang="en">Wordsmith</rdfs:label> <rdfs:subClassOf rdf:resource="#Agent"/> </rdfs:Class> <rdfs:Class rdf:ID="Journalist"> <rdfs:label xml:lang="en">Columnist</rdfs:label> <rdfs:label xml:lang="en">Reporter</rdfs:label> <rdfs:subClassOf rdf:resource="#Author"/> </rdfs:Class> </rdf:RDF> |
A Web
search and browse interface to MetaNet has also
been developed. [4]
Users can search on any common metadata term and retrieve a list of
equivalent terms, broader terms and narrower terms. Figure 3 shows the
results of a search on the term "author".

Figure
3. Results of MetaNet search
9
Linking MetaNet to XSLT
Using XSLT it
is possible to parse an input XML description and for each element
encountered call a Java procedural code extension which determines the
equivalent term in the output domain from the semantic realtionships specified in the MetaNet
thesaurus.
For
example, suppose the Java program, Mapping.java,
contains an extension function readMetaNet.
For each element encountered during parsing of the input metadata
description, the input element name (e.g. abc:Agent) and the output domain schema definition
(e.g. the Dublin Core schema) are passed to the readMetaNet
function. This function searches the MetaNet
RDF Schema file for an element in the output schema definition that is
equivalent to the input element name (e.g. dc:contributor), and returns this value. XSL
creates a new output element with this name in the output description.
Figure 4 illustrates the program flowchart.

Figure
4. Program flow for metadata description mappings
The XSL code in Table 3 illustrates
how to call a Java extension function, readMetaNet,
from the main XSL file.
|
Table 3. XSL code to call a Java extension
function, readMetaNet, from the
main XSL file
|
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc ="http://purl.org/dc/elements/1.1/"> xmlns:lxslt="http://xml.apache.org/xslt" xmlns:mapping="Mapping" extension-element-prefixes="mapping" version="1.0"> <lxslt:component prefix="mapping" elements="*" functions="readMetaNet"> <lxslt:script lang="javaclass" src="Mapping"/> </lxslt:component> <xsl:template match="ABC"> <xsl:apply-templates /> </xsl:template> <xsl:template match="*"> <xsl:element name="mapping:readMetaNet(., 'dc')"/> <xsl:value-of select="."/> </xsl:element> </xsl:template> </xsl:stylesheet> |
Below is a high-level, simplistic
algorithm describing the mapping process that is performed within the readMetaNet Java function in Figure 4.
|
Table 4. Algorithm describing the mapping
process within the readMetaNet Java
function in Figure 4
|
For each element in the input description { Search for the input element name in the output domain schema; if (found) { Map the input element to the equivalent output domain element; } else { Extract the Equivalent Terms (ETs) for the input element from MetaNet; Search the output domain schema for each of the ETs; if (an ET is found) { Map the input element to the equivalent output domain element; } else { Extract the broader terms (BTs) for the input element from MetaNet; Search for each BT in the output domain namespace; if (a BT is found) { Map the input element to the broader output domain element; } else { Extract the narrower terms (NTs) for the input element from MetaNet; Search for each NT in the output domain namespace; if (a NT is found) { Map the input element to the narrower output domain element; } } } } } endFor |
10 Conclusions, Limitations and Future Work
10.1 Conclusions
and Limitations
Our evaluation
of XSLT for mapping between metadata descriptions from different domains
revealed that although XSLT is good for syntactical and structural
mapping, semantic mappings need to be hardwired into the code. Flexible
semantic mapping is only possible with the assistance of semantic
knowledge bases provided by ontologies or
thesauruses such as the MetaNet thesaurus
described above.
The MetaNet thesaurus described here is a first draft
English version, based on the vocabulary of the ABC model. Although it
has only been applied to a relatively small sample set, some of the
limitations of this thesaurus are already evident. These include its
inability to support metadata vocabularies which use:
- Tokens, e.g. ID3 tags such as TPE2, which
are semantically meaningless. This limitation can be overcome by
either explicitly including such tags in the thesaurus or searching
the definitions (rather than element names) in the output namespace
for the input element name or its semantically equivalent terms;
- Abbreviations, e.g. acc.no.;
- Qualifers or hybrid words joined by a variety of
connectors, e.g. UserClass, Assistant
Editor, Art_Director, Time-span. This
problem can be solved to some extent by including "associated
terms" in the thesaurus and by ignoring typical
"connectors".
In addition,
the inherently ambiguous nature of language leads to the following
problems:
- Metadata terms with multiple possible
meanings, e.g."condition" - this
could be the current state of an object or it could be a restriction
on the permissable use of a resource. This
can be overcome by the use of unambiguous metadata terms by schema
designers.
- Multiple possible spellings for the same
word, e.g. artefact/artifact, colour/color.
- This thesaurus is based on nouns, e.g.
"creator", "publisher", and does not search for
related verbs, adverbs, adjectives in various tenses which could be
used to express the same semantics, e.g. "created_by",
"published_by". This problem
could, to some extent, be overcome through the use of stemming.
Currently only
English is supported. However, we believe that this thesaurus could be
extended to provide equivalent or overlapping terms for the ABC
vocabulary in other languages by following the recommendations specified
in ISO5964. [8]
10.2 Future Work
So far the ABC
model has only been tested on a relatively small sample set. We intend
carrying out more extensive evaluation of both the ABC model and the
hybrid mapping approach, by applying them to metadata tranformations
between large sets of sample records provided by a number of different
CIMI [38]
member organisations. The plan is to build a testbed using multimedia museum resources and
metadata descriptions provided by CIMI members and to use this testbed to implement and evaluate metadata
interoperability between different museums' descriptions.
The
Harmony ABC model exhibits many similarities with the CIDOC Conceptual
Reference Model (CRM) [21], a
domain ontology developed by the CIDOC Committee of the International
Council of Museums. In the near future we plan to investigate the possible
merging of the Harmony model and the CIDOC CRM model into a single
ontology. We plan to use the CIMI testbed
described above to evaluate the "super" ontology resulting from
harmonization of these two models.
The
mapping implementations above have all involved mapping from an
event-aware metadata model to a resource-centric metadata model. We are
also interested in the rules and mechanisms required for machine
translations between metadata descriptions based on different event-aware
metadata models, e.g. from ABC to INDECS or CIDOC CRM.
We would
also like to investigate mapping between "application profiles"
or schemas which mix metadata elements imported from multiple different
namespaces. The test examples considered so far only present the problem
of mapping from a single domain's metadata description to another single
domain's metadata description, e.g. pure DC to pure MPEG-7. A situation
that will become increasingly common in the future is the need to map
from a schema which imports elements from multiple namespaces to another
schema which imports a different set of elements from multiple
namespaces. In addition, each schema may impose its own local
- structural constraints, e.g. parent/child
relationships
- cardinality/occurrence constraints
- datatyping, enumeration and formatting constraints
on the element values.
We believe the
approach proposed in this paper will support mapping between mixed-domain
"application profiles", but need to test this through further
research involving machine translations between metadata descriptions
which conform with both complex local usage constraints, (defined by XML
Schemas [39]),
as well as namespace-specific semantic definitions (defined by RDF
Schemas).
Acknowledgements
The author
acknowledges the valuable contributions which discussions with Dan Brickley, Carl Lagoze,
Martin Doerr and Sigge
Lundberg have made to this work.
The work reported
in this paper has been funded by the Cooperative Research Centre for
Enterprise Distributed Systems Technology (DSTC) through the Australian
Federal Government's CRC Programme (Department
of Industry, Science and Resources).
References
[1]The
Harmony Project Home Page http://www.ilrt.bris.ac.uk/discovery/harmony/
[2] C. Lagoze, J. Hunter and D. Brickley (2000) "An Event-Aware Model for
Metadata Interoperability". ECDL 2000, Lisbon,
September
[3] XSL Transformations (XSLT) Version 1.0 (1999) W3C
Recommendation, 16 November http://www.w3.org/TR/xslt.html
[4] MetaNet Search Page http://sunspot.dstc.edu.au:8888/Metanet/Top.html#
[5] RDF Schema Specification 1.0 (2000) W3C Candidate
Recommendation, 27 March http://www.w3.org/TR/rdf-schema/
[6] Dublin Core/MARC/GILS Crosswalk (1999) November http://lcweb.loc.gov/marc/dccross.html
[7] ISO 2788 (1986) Documentation -- Guidelines for the
Development and Establishment of Monolingual Thesauri
[8] ISO 5964 (1985) Documentation -- Guidelines for the
Development and Establishment of Multilingual Thesauri
[9] Library of Congress Subject Headings, Cataloging
Distribution Service, Library of Congress http://lcweb.loc.gov/cds/lcsh.html
[10] Medical Subject Headings home page http://www.nlm.nih.gov/mesh/meshhome.html
[11] Art and Architecture Thesaurus Browser, Getty Research
Institute http://shiva.pub.getty.edu/aat_browser/
[12] C. Paice (1991) "A Thesaural Model of Information Retrieval". Information
Processing and Management, 27(5):433-447
[13] A.R. Aronson (1994) "Exploiting a Large Thesaurus
for Information Retrieval". RIAO 94, New York,
October
[14] W. Bruce Croft and J. Yufeng
(1994) "An Association Thesaurus for Information Retrieval".
RIAO 94, New York,
October
[15] Dublin Core Metadata Initiative http://purl.org/dc/
[16] MARC Standards, Library of Congress Network Development
and MARC Standards Office http://lcweb.loc.gov/marc/marc.html
[17] G. Rust and M. Bide (1999) "The indecs
Metadata Schema Building Blocks". Indecs
Metadata Model, November http://www.indecs.org/pdf/schema.pdf
[18] MPEG-7 Home Page http://www.darmstadt.gmd.de/mobile/MPEG7/index.html/
[19] Content Standard for Digital Geospatial Metadata (CSDGM) http://www.fgdc.gov/metadata/contstan.html
[20] IEEE Learning Technology Standards Committee's Learning
Object Meta-data Working Group, Approved Working Draft WD5 Learning
Object Meta-data Scheme http://ltsc.ieee.org/wg12/
[21] ICOM/CIDOC Documentation Standards Group (1999) Revised Definition
of the CIDOC Conceptual Reference Model, September http://www.geneva-city.ch/musinfo/cidoc/oomodel
[22] C. Batini, M. Lenzerini and S.B. Navathe
(1986) "A comparative analysis of methodologies for database schema
integration". ACM Computing Surveys, 18(4):323-364, December
[23] E. Mena, V. Kashyap,
A. Sheth and A. Illarramendi
(1996) "OBSERVER: An Approach for Query Processing in Global
Information Systems based on Interoperation across Pre-existing Ontologies". Proceedings of the 1st IFCIS
International Conference on Cooperative Information Systems
(CoopIS'96), Brussels, Belgium,
June (IEEE Computer Society Press)
[24] R. Bayardo, et al.
(1997) "InfoSleuth: Agent-based Semantic
Integration of Information in Open and Dynamic Environments". Proceedings
of ACM SIGMOD Conference on Management of Data, Tucson, Arizona, May,
pp. 195-206
[25] N. Guarino, C. Masolo and G. Vetere (1999)
"Ontoseek: Content-based Access to the
Web". IEEE Intelligent Systems, Vol. 14, No. 3, May/June,
70-80
[26] H. Mili and R. Rada (1988) "Merging Thesauri: Principles and Evalauation". IEEE Transactions on Pattern
Analysis and Machine Intelligence, 10(2):204-220
[27] M. Doerr and I. Fundulaki (1998) "A proposal on extended interthesaurus links semantics". Technical
Report TR-215, Institute
of Computer
Science-FORTH, March
[28] M. Sintichakis and P. Constantopoulos (1997) "A Method for Monolingual
Thesauri Merging". Proceedings of the 20th ACM International
Conference on Research and Development in Information Retrieval (ACM
SIGIR), Philadeplphia, PA, USA, July
[29] B. Amann and I. Fundulaki
(1999) "Integrating Ontologies and
Thesauri to Build RDF Schemas". In ECDL'99: Research and Advanced
Technologies for Digital Libraries, Paris, France,
September, Lecture Notes in Computer Science (Springer-Verlag), pp. 234-253
[30] Ontology Inference Layer http://www.ontoknowledge.org/oil/
[31] International Federation of Library Associations and
Institutions (IFLA) (1998) Functional Requirements for Bibliographic
Records, March http://www.ifla.org/VII/s13/frbr/frbr.pdf
[32] ID3 Tag Version 2.3.0 http://www.id3.org/id3v2.3.0.html
[33] Xalan-Java Overview http://xml.apache.org/xalan/overview.html
[34] A. Cawsey (2000)
"Presenting tailored resource descriptions: Will XSLT do the job?". WWW9 conference, Amsterdam,
May http://www.cee.hw.ac.uk/~alison/www9/paper.html
[35] XML Path Language (XPath)
Version 1.0 (1999) November http://www.w3.org/TR/xpath
[36] WordNet - a Lexical Database
for English http://www.cogsci.princeton.edu/~wn/online/
[37] RDF Schema Representation of the MetaNet
Thesaurus (2000) October http://archive.dstc.edu.au/maenad/metanet.rdf
[38] CIMI Consortium for Interchange of Museum Information http://www.cimi.org/
[39] XML Schema Language http://www.w3.org/XML/Schema
Appendix
A
A.1
ABC Description of Example Resource
<?xml version="1.0"?> <ABC> <Event id="E1" Type="Performance"> <Title>Live At the Lincoln Centre</Title> <Context> <Date>7/4/98</Date> <Time>20:00</Time> <Place>Lincoln Centre</Place> </Context> <Act id="Act1"> <Agent>New York Philharmonic</Agent> <Role>Orchestra</Role> </Act> <Input id="comp523"/> <Output id="audio821"/> <Rights> Lincoln Center for Performing Arts </Rights> </Event> <Resource id="comp523"> <Type>Musical Score</Type> <Title>Concerto for Violin</Title> </Resource> <Resource id="audio821"> <Type>audio</Type> <Format>MP3</Format> <Length units="mins"> 130 </Length> </Resource> <ABC>
A.2 Simple
Resource-centric Description of Example Resource
<?xml version="1.0"?> <Resource id="audio821"> <Title>Live At Lincoln Center</Title> <Date.Performance>1998-07-04</Date.Performance> <Time.Performance>20:00</Time.Performance> <Place.Performance>Lincoln Centre</Place.Performance> <Agent.Orchestra>New York Philharmonic</Agent.Orchestra> <Relation.isPerformanceOf>comp523</Relation.isPerformanceOf> <Description>Performance of 'Concerto for Violin'</Description> <Rights> Lincoln Center for Performing Arts </Rights> <Type>audio</Type> <Format>MP3</Format> <Length units="mins"> 130 </Length> </Resource>
A.3 Dublin
Core Description of Simple Example
<?xml version="1.0"?> <!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description about="audio821"> <dc:Title>Live At Lincoln Center</dc:Title> <dc:Date.Performance>1998-07-04T20:00-05:00</dc:Date.Performance> <dc:Coverage>Lincoln Centre</dc:Coverage> <dc:Contributor.Orchestra>New York Philharmonic</dc:Contributor.Orchestra> <dc:Relation.isPerformanceOf>comp523</dc:Relation.isPerformanceOf> <dc:Description>Performance of 'Concerto for Violin'</dc:Description.Performance> <dc:Rights> Lincoln Center for Performing Arts </dc:Rights> <dc:Type>audio</dc:Type> <dc:Format>MP3</dc:Format> </rdf:Description> <rdf:RDF>
A.4 MPEG-7
Description
<?xml version="1.0"?> <MPEG-7 id="audio821"> <CreationMetaInformation> <Creation> <Title>Live at Lincoln Center</Title> <Creator> <Role>Orchestra</role> <Name>New York Philharmonic</Name> </Creator> <CreationDate> <day>7</day> <month>4</month> <year>1998</year> </CreationDate> <Location> <PlaceName>Lincoln Center</PlaceName> </Location> </Creation> <Classification> <Genre>Performance</Genre> </Classification> </CreationMetaInformation> <MediaInformation> <MediaProfile> <MediaFormat> <Medium>MP3</Medium> <Length><m>130</m></Length> </MediaFormat> </MediaProfile> </MediaInformation> <UsageMetaInformation> <Rights> <RightsId IdOrganization='Lincoln Center'/> </Rights> </UsageMetaInformation> </MPEG-7>
A.5 ID3
Description
<?xml version="1.0"?> <ID3> <!-- Unique Identifier --> <UFID>audio821</UFID> <!-- Title --> <TIT2>Live At Lincoln Center</TIT2> <!-- Orchestra --> <TPE2>New York Philharmonic</TPE2> <!-- Type or Genre --> <TCON>Performance</TCON> <!-- Media Type sound originated from--> <TMED>Audio/MP3</TMED> <!-- Date Recorded --> <TDAT>7/4/98</TDAT> <!-- Time Recorded --> <TIME>2100</TIME> <!-- Length in millisecs--> <TLEN>7800000</TLEN> <!-- Original recording or source --> <TOAL>comp523</TOAL> <!-- Copyright Message --> <TCOP>Lincoln Center of Performing Arts</TCOP> </ID3>
Appendix
B
B.1 XSL for
Transforming from ABC to DC
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc ="http://purl.org/dc/elements/1.1/"> <xsl:output method="xml" indent="yes"/> <xsl:template match="ABC"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc ="http://purl.org/dc/elements/1.1/"> <rdf:Description> <xsl:apply-templates select="Event"/> <xsl:apply-templates select="Resource"/> </rdf:Description> </rdf:RDF> </xsl:template> <xsl:template match="Event"> <xsl:apply-templates select="Output"/> <xsl:apply-templates select="Context"/> <xsl:apply-templates select="Act"/> <xsl:apply-templates select="Input"/> <xsl:apply-templates select="Title"/> <xsl:apply-templates select="Rights"/> </xsl:template> <xsl:template match="Output"> <xsl:attribute name="about"> <xsl:value-of select="@id"/> </xsl:attribute> <xsl:copy-of select="*"/> </xsl:template> <xsl:template match="Context"> <xsl:apply-templates select="Date"/> <xsl:apply-templates select="Place"/> </xsl:template> <xsl:template match="Date"> <xsl:element name="dc:Date.{../../@Type}"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Place"> <xsl:element name="dc:Coverage"> <xsl:value-of select='.'/> </xsl:element> </xsl:template> <xsl:template match="Act"> <xsl:element name="dc:Contributor.{Role}"> <xsl:value-of select='Agent'/> </xsl:element> </xsl:template> <xsl:template match="Input"> <xsl:element name="dc:Relation.is{../@Type}Of"> <xsl:value-of select="@id"/> </xsl:element> </xsl:template> <xsl:template match="Title"> <xsl:element name="dc:Title"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Rights"> <xsl:element name="dc:Rights"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Resource"> <xsl:if test="@id=../Event/Output/@id"> <xsl:apply-templates select="Type"/> <xsl:apply-templates select="Format"/> </xsl:if> <xsl:if test="@id=../Event/Input/@id"> <xsl:element name="dc:Description"> <xsl:value-of select="../Event/@Type"/> of <xsl:text>"</xsl:text> <xsl:value-of select="Title"/> <xsl:text>"</xsl:text> </xsl:element> </xsl:if> </xsl:template> <xsl:template match="Type"> <xsl:element name="dc:Type"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Format"> <xsl:element name="dc:Format"> <xsl:value-of select="."/> </xsl:element> </xsl:template> </xsl:stylesheet>
B.2 XSL for
Transforming from ABC to ID3
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:id3 ="http://www.id3.org/id3v2.3.0"> <xsl:output method="xml" indent="yes"/> <xsl:template match="ABC"> <id3:ID3 xmlns:id3="http://www.id3.org/id3v2.3.0"> <xsl:apply-templates select="Event"/> <xsl:apply-templates select="Resource"/> </id3:ID3> </xsl:template> <xsl:template match="Event"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Output"> <xsl:element name="id3:UFID"> <xsl:value-of select="@id"/> </xsl:element> </xsl:template> <xsl:template match="Context"> <id3:TDAT> <xsl:value-of select="Date"/> </id3:TDAT> <id3:TIME> <xsl:value-of select="Time"/> </id3:TIME> </xsl:template> <xsl:template match="Act"> <xsl:element name="id3:TPE2"> <xsl:value-of select='Agent'/> </xsl:element> </xsl:template> <xsl:template match="Input"> <xsl:element name="id3:TOAL"> <xsl:value-of select="@id"/> </xsl:element> </xsl:template> <xsl:template match="Title"> <xsl:element name="id3:TIT2"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Rights"> <xsl:element name="id3:TCOP"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Resource"> <xsl:if test="@id=../Event/Output/@id"> <xsl:apply-templates select="Format"/> <xsl:apply-templates select="Length"/> </xsl:if> </xsl:template> <xsl:template match="Format"> <xsl:element name="id3:TMED"> <xsl:value-of select="."/>/<xsl:value-of select="../Type"/> </xsl:element> </xsl:template> <xsl:template match="Length"> <xsl:element name="id3:TLEN"> <xsl:value-of select=".*60*1000"/> </xsl:element> </xsl:template> </xsl:stylesheet>
B.3 XSL for
Transforming ABC to MPEG-7
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mpeg7 ="http://www.mpeg7.org/2000/MPEG7_schema/"> <xsl:output method="xml" indent="yes"/> <xsl:template match="ABC"> <MPEG-7> <xsl:apply-templates select="Event"/> <xsl:apply-templates select="Resource"/> </MPEG-7> </xsl:template> <xsl:template match="Event"> <xsl:apply-templates select="Output"/> <CreationMetaInformation> <Creation> <xsl:apply-templates select="Title"/> <xsl:apply-templates select="Act"/> <xsl:apply-templates select="Context"/> <xsl:apply-templates select="Input"/> </Creation> <Classification> <xsl:element name="Genre"> <xsl:value-of select="@Type"/> </xsl:element> </Classification> </CreationMetaInformation> <xsl:apply-templates select="Rights"/> </xsl:template> <xsl:template match="Output"> <xsl:attribute name="id"> <xsl:value-of select="@id"/> </xsl:attribute> </xsl:template> <xsl:template match="Title"> <xsl:element name="Title"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Act"> <Creator> <xsl:element name="Role"> <xsl:value-of select='Role'/> </xsl:element> <xsl:element name="Name"> <xsl:value-of select='Agent'/> </xsl:element> </Creator> </xsl:template> <xsl:template match="Context"> <CreationDate> <xsl:variable name="date" select='Date'/> <xsl:variable name="my" select="substring-after($date,'/')"/> <xsl:element name="day"> <xsl:value-of select="substring-before($date,'/')"/> </xsl:element> <xsl:element name="month"> <xsl:value-of select="substring-before($my,'/')"/> </xsl:element> <xsl:element name="year"> <xsl:value-of select="substring-after($my,'/')"/> </xsl:element> </CreationDate> <Location> <xsl:element name="PlaceName"> <xsl:value-of select='Place'/> </xsl:element> </Location> </xsl:template> <xsl:template match="Input"> </xsl:template> <xsl:template match="Rights"> <UsageMetaInformation> <Rights> <xsl:element name="RightsId"> <xsl:value-of select="."/> </xsl:element> </Rights> </UsageMetaInformation> </xsl:template> <xsl:template match="Resource"> <xsl:if test="@id=../Event/Output/@id"> <MediaInformation> <MediaProfile> <MediaFormat> <xsl:apply-templates select="Format"/> <xsl:apply-templates select="Length"/> </MediaFormat> </MediaProfile> </MediaInformation> </xsl:if> </xsl:template> <xsl:template match="Format"> <xsl:element name="Medium"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Length"> <Length> <xsl:if test="@units='mins'"> <xsl:element name="m"> <xsl:value-of select="."/> </xsl:element> </xsl:if> </Length> </xsl:template> </xsl:stylesheet>
|