|
Introduction:
Networked
knowledge organization systems typically contain objects of mixed media
types which are described using a multitude of diverse metadata schemas.
Hence machine understanding of metadata descriptions which conform to
schemas from different domains is a fundamental requirement for access to
information within networked knowledge organization systems. In
particular, there are three main scenarios in which interoperability
among metadata descriptions is required:
- To enable a single search interface
across heterogeneous metadata descriptions;
- To enable the integration or merging of
descriptions which are based on complementary but possibly
overlapping metadata schemas or standards;
- To enable different views of the one
underlying and complete metadata description, depending on the
user's particular interest, perpective or requirements.
Metadata
descriptions from different domains are not semantically distinct but
overlap and relate to each other in complex ways. Achieving
interoperability between such metadata descriptions via
manually-generated one-to-one crosswalks [6] is useful, but this approach
does not scale to the many metadata vocabularies that will develop. A
more scalable and cost-effective approach is to exploit the fact that
many entities and relationships - for example, people, places, creations,
organisations, events, etc. - occur across all of the domains. The Harmony
project [1] has been investigating this more general approach towards
metadata interoperability and in the process has developed the ABC model
and vocabulary [2].
The
hypothesis is that such an approach will lead to more efficient, scalable
machine-translations between heterogeneous metadata descriptions. To test
this hypothesis and to evaluate the interoperability capabilities of the
ABC model, we applied it to some real multimedia examples and analyzed
the results of mapping from the ABC model to various different metadata
domains using XSLT [3]. This work revealed serious limitations in XSLT's
ability to support flexible dynamic semantic mapping. To overcome this,
we developed MetaNet [4], a metadata term thesaurus which provides the
additional semantic knowledge that is non-existent within declarative
XML-encoded metadata descriptions.
This paper
describes the optimum metadata mapping approach determined from applying
the ABC model to a small test set of multimedia examples. This approach
combines:
- the ABC event-aware metadata model,
developed within the Harmony project, as the underlying model for
scalable generic mappings between domain-specific vocabularies,
with;
- XSLT for parsing XML descriptions and
performing structural and syntactic mapping, and;
- MetaNet, a metadata term thesaurus, to
provide the semantic knowledge required to enable semantic mapping
between metadata terms from different domains or standards.
2 Definitions of Terms
This section
defines the key terms used throughout the remainder of the paper:
- Metadata - data about data - or more commonly
"descriptive information about Web resources". The use of
standardized descriptive metadata can substantially improve the
discovery and retrieval of relevant networked resources. Different communities
or domains have developed their own standardized metadata
vocaularies to meet their specific needs.
- Vocabularies - shared terminologies with commonly
agreed-upon semantics for a domain. Common vocabularies enable
search engines, agents, authors and users to communicate within a
domain.
- Schemas - provide a standard way of defining standard
domain-specific vocabularies by defining a common set of elements,
their semantics and the relationships between the elements.
- Ontology - a formal description of the concepts,
roles and relationships that exist for an agent or community of
agents. Ontologies provide a shared and common understanding of a
domain that can be communicated across people and applications, and
play a major role in supporting information exchange and discovery.
- Thesaurus - the vocabulary of a controlled
indexing language, formally organized so that the a priori
relationships between concepts (for example "broader" and
"narrower") are made explicit. [7]
- Metadata Thesaurus - a thesaurus (defined according to ISO
2788 standard for monolingual thesauri [7]) which defines the
relationships between metadata terms from different domain
vocabularies.
3
Related Work
Thesauri have
been used to improve the precision and recall of information retrieval
systems for over 30 years. The introduction of automated information
retrieval has caused a dramatic increase in the demand for vocabulary
control, particularly in the last decade. Examples of well known thesauri
used to provide authority control over the terms used for indexing
documents in the bibliographic, medical and cultural domains respectively
are: the Library of Congress Subject Headings (LCSH) [9], the Medical
Subject Headings (MeSH) [10] and the Art and Architecture Thesaurus
(AAT). [11] In addition, thesauri have been used within information
retrieval systems to improve retrieval effectiveness by providing
semantic roadmaps. [12], [13], [14]
Since the
emergence of the Internet, a great deal of effort has been invested in
the development of metadata vocabularies to enable the exchange and
discovery of information across different applications and domains.
Metadata vocabularies such as Dublin Core [15], USMARC [16], INDECS [17],
MPEG-7 [18], FGDC [19], IEEE LOM [20] and CIDOC CRM [21] provide
standardized sets of descriptive elements to enable the exchange of
resources for specific applications or domains. Although these standards
enable interoperability within domains, they introduce the problem of
incompatibility between disparate and heterogeneous metadata descriptions
or schemas across domains.
A
literature survey reveals many different proposals for improving
interoperability between domain-specific vocabularies, thesauri and
ontologies in the context of information retrieval and exchange. These
range from database schema integration [22], to the use of ontologies in
organizing and integrating networked information systems (e.g. OBSERVER
[23], InfoSleuth [24], OntoSeek [25]) to the merging of monolingual [26]
and multilingual thesauri. [8] Two of the major research issues have been
categorizing the complex kinds of interthesaurus semantic relationships
which exist [27] and automating the detection of these relationships
during the merging process. [28]
More
recently the approach to merging thesauri has been to represent them
formally using RDF Schemas [29] and to use inference engines to automate
the merging - such as has been proposed in the Ontology Inference Layer
(OIL). [30]
In this
paper we are not so much concerned with the specific process by which
MetaNet is generated or with expressing the complete set of possible term
relations (as described in ISO 2788) in MetaNet. Our primary objective is
to generate a thesaurus which specifies (an albeit simplified) set of
semantic relationships between metadata terms from a number of different
domain schemas relative to the ABC underlying vocabulary (the preferred
terms) and hence also to each other. Our goal is then to demonstrate how
this semantic knowledge can be represented in a machine-readable format
(RDF Schema) and extracted and combined with the syntactic and structural
mapping capabilities of XSLT to enable the implementation of flexible
dynamic mappings between metadata descriptions from different domains.
4
Overview of the ABC Underlying Metadata Model
The Harmony
Project [1] is investigating a generic approach to metadata
interoperability through the development of an event-aware metadata
model. The ABC model [2] defines a set of fundamental classes which
provide the building blocks for expression (through sub-classing) of
application-specific or domain-specific metadata vocabularies. The base
classes, shown below, were determined by analysing commonalities between
different communities' metadata models (including: Dublin Core [15];
INDECS [17]; MPEG-7 [18]; CIDOC CRM [21]; IFLA [31].)
- Resources
- Events
- Inputs and Outputs
- Acts
- Context
- Event Relations
ABC adopts an
event-aware view for modeling the relationship between the various
manifestations of a creation. This event-aware view provides semantically
clear attachment points for the association of properties among the
various manifestations, events and contributors (agents) involved in a
resource's lifecycle. In addition, ABC provides a multiple views
philosophy for metadata modeling and recipes for inter-conversion between
those views. If life-cycle information is required, the event model can
be used. When single resource metadata is needed, a resource-centric
state model is used. Figure 1 shows the UML representation for the ABC
metadata model.
Omitted
Figure 1. UML representation of the ABC metadata model
5
A Simple Example
To test the ABC
model and evaluate XSLT for metadata mapping, we considered the following
simple illustrative example:
"A
resource which is a 130 min audio (MP3) recording of a 'Live at Lincoln
Center'
performance. The Orchestra is the New York Philharmonic. The performance
was on April 7, 1998
at 8 pm Eastern
Time. The musical score performed is 'Concerto for Violin'. Copyright for
the entire performance is held by Lincoln
Center
for the Performing Arts."
First we
describe this resource using the ABC model. We then attempt to map from
the ABC description to Dublin Core, MPEG-7 and ID3 [32] descriptions
respectively, using XSLT. Figure 2 illustrates the two steps involved in
mapping from the ABC metadata model to resource-centric models such as
Dublin Core, MPEG-7 and ID3:
- The structural mapping step involves
transferring event properties to the output resource and creating a
relationship between the output and input resources associated with
the event.
- The semantic mapping step involves
mapping the properties attached to the output resource to semantically-equivalent
properties in the output domain.
Appendix A contains the corresponding ABC,
Resource-centric, Dublin Core, ID3 and MPEG-7 descriptions.

Figure
2. Transformation from the ABC event-aware model to three different
resource-centric models
5.1
Structural Mapping Rules
For events
which generate an output resource from an input resource, the
transformation from an event-aware metadata model to a simple
resource-centric metadata model consists of the following steps:
- The Date, Time and Place properties
within the Event's Context node can be qualified using the Event
Type and transferred to the target output resource, e.g.
Date.Performance, Time.Performance, Place.Performance;
- The Role property of each Act associated
with an event becomes a qualifier on the Agent property which is
attached to the target output resource and its value is the Act's
Agent Name, e.g. the Agent.Orchestra property has value "New
York Philharmonic";
- A Relation property arc is generated from
the event type (e.g. Performance -> Relation.isPerformanceOf) and
is attached to the target output resource. The value of this
property is the patient input resource of the event (e.g.
"comp523").
- All other existing properties of the
input and output resource remain the same.
Other
inheritance and metadata derivation rules may be possible but these
require further investigation.
For example, a Description property for the output resource can be
generated from the Event Type and the input resource's Title e.g.
"Performance of 'Concerto for Violin'". Or in many cases, the Title
property can be inherited by the output resource directly from the Title
property of either the input resource or the event.
6 An Evaluation of XSLT for Metadata Mappings
The Extensible
Style Language (XSL) [3] consists of a transformation language (XSLT) and
a formatting language. The transformation language XSLT (which acts
independently of the formatting language) provides elements that define
rules for how one XML document is transformed into another XML document.
The transformed XML document may use the markup and DTD of the original
document or it may use a completely different set of tags. The ability of
XSLT to transform data from one XML representation to another makes it
appear to be ideal for metadata interchange applications.
An XSL
document contains a list of templates and rules. A template rule has a
pattern specifying the trees it applies to and a template to be output
when the pattern is matched. When an XSL processor formats an XML
document using an XSL style sheet, it scans the XML document tree looking
through each sub-tree in turn. As each tree in the XML document is read,
the processor compares it with the pattern of each template rule in the
style sheet. When the processor finds a tree that matches a template
rule's pattern, it outputs the rule's template. This template generally
includes some markup, some new data and some data copied out of the tree
from the original XML document.
Using XSLT
and the Xalan [33] XSLT processor we developed XSL programs for
transforming the ABC description above to DC, ID3 and MPEG-7
descriptions, respectively. Appendix B shows
the resulting XSL files.
The
mapping implementations in Appendix B revealed
that although XSLT works well for the structural mapping from an event
model to a resource-centric model based on the set of rules described in
Section 3.1, it is inadequate for implementing flexible dynamic semantic
mappings between metadata vocabularies. This is due to:
- XSLT's limited capabilities for handling
variable input descriptions based on schemas which are not tightly
constrained;
- The non-existence of
machine-understandable semantic information in declarative
XML-encoded metadata descriptions;
- Processor-dependent handling of input
parameters and procedural code extensions;
- Limited string manipulation and
comparison functions, e.g. it is not possible to perform
case-insensitive string comparisons within XSLT.
The mappings
revealed that if the input XML descriptions are relatively fixed and
tightly constrained, then the semantic mappings can be hardwired and XSLT
is adequate. But if the input descriptions are at all variable or
unpredictable (e.g. undefined domain specific sub-classing and attributes)
then XSL simply cannot cope. Cawsey investigated the use of XSLT for
customizing RDF descriptions, reaching similar conclusions. [34]
Below are
listed a number of possible approaches to handling the semantic mapping
problem. The approach chosen is a balance between simplicity on the one
hand, and flexibility or scalability on the other. The wider the targeted
scope of interoperability, the more difficult it is to achieve accurate,
precise mappings. Below is a list of mapping approaches in increasing order
of both scope and difficulty:
- Hardwire crosswalks between metadata
terms from specific metadata domains (easy, but only works for fixed
input);
- Extract mappings from a pre-defined
multiple-domain mapping matrix;
- Determine the semantic mappings from a
metadata term ontology;
- Determine the semantic mappings from a
generic ontology such as WordNet;
- Determine the semantic mappings from a
dynamically generated ontology created by using inferencing to merge
multiple domain-specific ontologies.
By reducing the
scope of the problem to interoperability between existing metadata
standards, then the fully generic approaches (e.g., 4 and 5 above) become
unnecessarily complex. Hence in the remainder of this paper we
investigate the less complex but still moderately flexible approaches (2
and 3) based on a mapping matrix and a metadata term ontology,
respectively.
7
Semantic Mapping via a Mapping Matrix
The second
approach in the list above involves linking a mapping matrix to the XSLT
processor. The mapping matrix explicitly defines the semantic mappings
between a fixed set of metadata vocabularies from a number of different
domains. Figure 3 illustrates such a mapping matrix. If XPath [35] is
used to specify the elements, then to some extent both the structural and
semantic mappings can be defined.
|
Table 1. Metadata mapping matrix
|
|
ABC Element
|
DC Element
|
ID3 Element
|
MPEG-7 Path
|
|
Resource/Title
|
Title
|
TIT2
|
CreationMetaInformation/Creation/Title/TitleText
(@TitleType="original")
|
|
Event/Act/Agent
|
Creator
|
TPE1
|
CreationMetaInformation/Creation/Creator
(@role="creator")
|
|
Publisher
|
TPUB
|
UsageMetaInformation/Publication/Publisher
|
|
Contributor
|
IPLS(involved People List),
TCOM(Composer),
TENC(Encoder),
TEXT(Lyricist),
TOLY(OriginalLyricist),
TOPE(Original Artist),
TPE2(Band, Orchestra, Accompaniment),
TPE3(Conductor),
TPE4 (Interpreter, Remixer, Modifier)
|
CreationMetaInformation/Creation/Creator
(@role)
|
|
Resource/Subject
|
Subject
|
TIT1
|
CreationMetaInformation/Creation/Classification/PackagedType
|
|
Resource/Description
|
Description
|
TIT3
|
CreationMetaInformation/Creation/CreationDescription
|
|
Event/Context/Date
|
Date.Creation
|
-
|
CreationMetaInformation/Creation/CreationDate
|
|
Date.Publication
|
-
|
UsageMetaInformation/Publication/PublicationDate
|
|
Date.Recording
|
TRDA
|
-
|
|
Resource/Type
|
Type
|
TCON
|
CreationMetaInformation/Classification/Genre
|
|
Resource/Format
|
Format
|
TFLT
|
MediaInformation.MediaProfile/MediaFormat/FileFormat
|
|
Format.length
|
TLEN
|
-
|
|
Format.size
|
TSIZ
|
-
|
|
Resource/Identifier
|
Identifier
|
UFID
|
MediaInformation/MediaIdentification/Identifier
|
|
Event/Input
|
Source
|
TOAL (Title of original
recording or source)
|
-
|
|
Event/Context/Place
|
Coverage.Place
|
-
|
-
|
This
approach has certain debilitating limitations, however. A matrix is only
capable of specifying mappings which involve fairly simple one-to-one mappings,
and a two-dimensional matrix will only work if the mappings are
symmetrical in both directions across all the domains. If the mappings
are asymetrical then the matrix becomes highly complex and
multi-dimensional. However, the primary limitation of this approach is
that it simply does not scale - as the number of domains grows and the
mappings become asymmetrical, then the matrix becomes excessively complex
and unwieldy.
8
Development of MetaNet, a Metadata Term Thesaurus
Rather than
limiting the semantic mapping to a fixed number of domains/vocabularies
(i.e. the number of columns in the mapping matrix), a more generic
approach is to extract the mapping dynamically from a thesaurus of
metadata terms, generated by formally defining relationships between
metadata terms from a number of different domains' standardized
vocabularies.
8.1 Intrathesaurus and Interthesaurus Relations
The ISO2788
standard for the identification and documentation of monolingual thesauri
[7] identifies the following types of intrathesaurus relations:
- hierarchical
- associative
- equivalence
The
hierarchical relation occurs between concepts having
"broader/narrower" meanings. This can be further specialized
into the generic (BTG/NTG), whole-part (BTP/NTP) and instance (BT/NT) relations.
For the sake of simplicity, we have chosen only to model the BTG/NTG
relation (a common practice among thesauri developers) and the
equivalence relation, and not to include associative relations within
MetaNet.
The
ISO5964 standard for the documentation and establishment of multilingual
thesauri [8] identifies the following types of interthesaurus relations:
- exact equivalence
- partial equivalence
- single to multiple equivalence
- inexact equivalence.
These relations
indicate that the semantic relations between terms from different
metadata vocabularies are likely to be much more complex than one-to-one
exact equivalence and that even "exact equivalence" will be an
approximation. However, because the scope of our problem is limited to
relations between terms in a number of standardized English metadata
vocabularies, then we can expect the frequency of more complex mappings
to be less than for general natural language thesauri. For the first
draft of MetaNet, we decided only to consider exact and partial
equivalence relations and to combine them in the ET relation which
defines equivalent/overlapping terms. If two different domains use two
different metadata terms which are ETs in our thesaurus then we make the
assumption that the domains are referring to semantically equivalent
concepts.
Consequently the metadata term
thesaurus which we have developed, MetaNet [4], contains only
preferred terms (the ABC core vocabulary), equivalent/overlapping terms
(ET), narrower terms (NT) and broader terms (BT), and attempts to
encompass terms from the most significant and widely-used metadata
vocabularies (Dublin Core, IFLA, IEEE LOM, INDECS).
8.2 Description of MetaNet
The objective
of the MetaNet thesaurus is to provide the semantic knowledge
required to enable machine understanding of equivalence and hierarchical
(subtyping) relationships between metadata terms from different domains.
The scope of this thesaurus is limited to the most significant metadata
models/vocabularies used for describing attributes and events associated
with resources and their life cycles. This encompasses metadata
vocabularies from the bibliographic, museum, archival, record keeping and
rights management communities. It has been developed by performing
WordNet [36] searches using the core terms from the ABC vocabulary, and
extracting those synonyms and hyponyms which could conceivably be used in
a metadata scheme to represent the original core term. In addition, the
results have been compared with the vocabularies of the DC, INDECS, IFLA,
IMS and CIDOC CRM vocabularies to check that the majority of the terms
used in these metadata dictionaries have been incorporated into the
thesaurus.
A
machine-readable RDF Schema representation of this thesaurus has been
developed. [37] The RDF and RDF Schema elements, Class, subClassOf,
property, subPropertyOf are used to define the
hierarchical/subtyping and entity/attribute relationships between
metadata elements. The RDFS label element is used to specify
semantically equivalent terms which may be used. The ABC core vocabulary
is used as the top-level set of preferred terms. Although this thesaurus
has been generated manually, it could conceivably be generated
automatically by using inferencing mechanisms to merge RDF Schemas from
different domains, as has been proposed in the Ontology Inference Layer
(OIL). [30]
For
example, consider "Agent", which is a core term of the ABC
vocabulary and hence a preferred term in the MetaNet thesaurus.
Semantically equivalent terms for "Agent", commonly used within
other metadata vocabularies, include:
actor, contributor, creator, player, doer, worker, performer
Possible
narrower terms or hyponyms for "Agent" include:
author, composer, artist, musician, . . etc.
Table 2 is
an excerpt from the RDF Schema which illustrates the representation for
the "Agent" metadata term as well as its equivalent terms and a
partial hierarchy of its narrower terms.
|
Table 2. Excerpt from the RDF Schema
|
<?xml version="1.0"?> <rdf:RDF xml:lang="en" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"> <rdfs:Class rdf:ID="Agent"> <rdfs:comment xml:lang="en">The resources which contribute to or act in an event. Typically agents are people, groups of people, organisations or instruments.</rdfs:comment> <rdfs:label xml:lang="en">Actor</rdfs:label> <rdfs:label xml:lang="en">Contributor</rdfs:label> <rdfs:label xml:lang="en">Creator</rdfs:label> <rdfs:label xml:lang="en">Player</rdfs:label> <rdfs:label xml:lang="en">Doer</rdfs:label> <rdfs:label xml:lang="en">Worker</rdfs:label> <rdfs:label xml:lang="en">Performer</rdfs:label> <rdfs:subClassOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#Resource"/> </rdfs:Class> <rdfs:Class rdf:ID="Author"> <rdfs:label xml:lang="en">Writer</rdfs:label> <rdfs:label xml:lang="en">Wordsmith</rdfs:label> <rdfs:subClassOf rdf:resource="#Agent"/> </rdfs:Class> <rdfs:Class rdf:ID="Journalist"> <rdfs:label xml:lang="en">Columnist</rdfs:label> <rdfs:label xml:lang="en">Reporter</rdfs:label> <rdfs:subClassOf rdf:resource="#Author"/> </rdfs:Class> </rdf:RDF> |
A Web
search and browse interface to MetaNet has also been developed. [4] Users can
search on any common metadata term and retrieve a list of equivalent terms,
broader terms and narrower terms. Figure 3 shows the results of a search
on the term "author".

Figure
3. Results of MetaNet search
9
Linking MetaNet to XSLT
Using XSLT it
is possible to parse an input XML description and for each element
encountered call a Java procedural code extension which determines the
equivalent term in the output domain from the semantic realtionships
specified in the MetaNet thesaurus.
For
example, suppose the Java program, Mapping.java, contains an extension
function readMetaNet. For each element encountered during parsing
of the input metadata description, the input element name (e.g. abc:Agent)
and the output domain schema definition (e.g. the Dublin Core schema) are
passed to the readMetaNet function. This function searches the
MetaNet RDF Schema file for an element in the output schema definition
that is equivalent to the input element name (e.g. dc:contributor), and
returns this value. XSL creates a new output element with this name in
the output description. Figure 4 illustrates the program flowchart.

Figure
4. Program flow for metadata description mappings
The XSL code in Table 3 illustrates
how to call a Java extension function, readMetaNet, from the main
XSL file.
|
Table 3. XSL code to call a Java extension
function, readMetaNet, from the main XSL file
|
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc ="http://purl.org/dc/elements/1.1/"> xmlns:lxslt="http://xml.apache.org/xslt" xmlns:mapping="Mapping" extension-element-prefixes="mapping" version="1.0"> <lxslt:component prefix="mapping" elements="*" functions="readMetaNet"> <lxslt:script lang="javaclass" src="Mapping"/> </lxslt:component> <xsl:template match="ABC"> <xsl:apply-templates /> </xsl:template> <xsl:template match="*"> <xsl:element name="mapping:readMetaNet(., 'dc')"/> <xsl:value-of select="."/> </xsl:element> </xsl:template> </xsl:stylesheet> |
Below is a high-level, simplistic
algorithm describing the mapping process that is performed within the
readMetaNet Java function in Figure 4.
|
Table 4. Algorithm describing the mapping
process within the readMetaNet Java function in Figure 4
|
For each element in the input description { Search for the input element name in the output domain schema; if (found) { Map the input element to the equivalent output domain element; } else { Extract the Equivalent Terms (ETs) for the input element from MetaNet; Search the output domain schema for each of the ETs; if (an ET is found) { Map the input element to the equivalent output domain element; } else { Extract the broader terms (BTs) for the input element from MetaNet; Search for each BT in the output domain namespace; if (a BT is found) { Map the input element to the broader output domain element; } else { Extract the narrower terms (NTs) for the input element from MetaNet; Search for each NT in the output domain namespace; if (a NT is found) { Map the input element to the narrower output domain element; } } } } } endFor |
10 Conclusions, Limitations and Future Work
10.1 Conclusions
and Limitations
Our evaluation
of XSLT for mapping between metadata descriptions from different domains
revealed that although XSLT is good for syntactical and structural
mapping, semantic mappings need to be hardwired into the code. Flexible
semantic mapping is only possible with the assistance of semantic
knowledge bases provided by ontologies or thesauruses such as the MetaNet
thesaurus described above.
The
MetaNet thesaurus described here is a first draft English version, based
on the vocabulary of the ABC model. Although it has only been applied to
a relatively small sample set, some of the limitations of this thesaurus
are already evident. These include its inability to support metadata
vocabularies which use:
- Tokens, e.g. ID3 tags such as TPE2, which
are semantically meaningless. This limitation can be overcome by
either explicitly including such tags in the thesaurus or searching
the definitions (rather than element names) in the output namespace
for the input element name or its semantically equivalent terms;
- Abbreviations, e.g. acc.no.;
- Qualifers or hybrid words joined by a
variety of connectors, e.g. UserClass, Assistant Editor,
Art_Director, Time-span. This problem can be solved to some extent
by including "associated terms" in the thesaurus and by
ignoring typical "connectors".
In addition,
the inherently ambiguous nature of language leads to the following
problems:
- Metadata terms with multiple possible
meanings, e.g."condition" - this could be the current
state of an object or it could be a restriction on the permissable
use of a resource. This can be overcome by the use of unambiguous
metadata terms by schema designers.
- Multiple possible spellings for the same
word, e.g. artefact/artifact, colour/color.
- This thesaurus is based on nouns, e.g.
"creator", "publisher", and does not search for
related verbs, adverbs, adjectives in various tenses which could be
used to express the same semantics, e.g. "created_by",
"published_by". This problem could, to some extent, be
overcome through the use of stemming.
Currently only
English is supported. However, we believe that this thesaurus could be
extended to provide equivalent or overlapping terms for the ABC
vocabulary in other languages by following the recommendations specified
in ISO5964. [8]
10.2 Future Work
So far the ABC
model has only been tested on a relatively small sample set. We intend
carrying out more extensive evaluation of both the ABC model and the
hybrid mapping approach, by applying them to metadata tranformations
between large sets of sample records provided by a number of different
CIMI [38]
member organisations. The plan is to build a testbed using multimedia
museum resources and metadata descriptions provided by CIMI members and
to use this testbed to implement and evaluate metadata interoperability
between different museums' descriptions.
The Harmony
ABC model exhibits many similarities with the CIDOC Conceptual Reference
Model (CRM) [21],
a domain ontology developed by the CIDOC Committee of the International
Council of Museums. In the near future we plan to investigate the
possible merging of the Harmony model and the CIDOC CRM model into a
single ontology. We plan to use the CIMI testbed described above to
evaluate the "super" ontology resulting from harmonization of
these two models.
The
mapping implementations above have all involved mapping from an
event-aware metadata model to a resource-centric metadata model. We are
also interested in the rules and mechanisms required for machine
translations between metadata descriptions based on different event-aware
metadata models, e.g. from ABC to INDECS or CIDOC CRM.
We would
also like to investigate mapping between "application profiles"
or schemas which mix metadata elements imported from multiple different
namespaces. The test examples considered so far only present the problem
of mapping from a single domain's metadata description to another single
domain's metadata description, e.g. pure DC to pure MPEG-7. A situation
that will become increasingly common in the future is the need to map
from a schema which imports elements from multiple namespaces to another
schema which imports a different set of elements from multiple
namespaces. In addition, each schema may impose its own local
- structural constraints, e.g. parent/child
relationships
- cardinality/occurrence constraints
- datatyping, enumeration and formatting
constraints on the element values.
We believe the
approach proposed in this paper will support mapping between mixed-domain
"application profiles", but need to test this through further
research involving machine translations between metadata descriptions
which conform with both complex local usage constraints, (defined by XML
Schemas [39]),
as well as namespace-specific semantic definitions (defined by RDF
Schemas).
Acknowledgements
The author
acknowledges the valuable contributions which discussions with Dan
Brickley, Carl Lagoze, Martin Doerr and Sigge Lundberg have made to this
work.
The work
reported in this paper has been funded by the Cooperative Research Centre
for Enterprise Distributed Systems Technology (DSTC) through the
Australian Federal Government's CRC Programme (Department of Industry,
Science and Resources).
References
[1]The
Harmony Project Home Page http://www.ilrt.bris.ac.uk/discovery/harmony/
[2] C. Lagoze, J. Hunter and D. Brickley (2000) "An
Event-Aware Model for Metadata Interoperability". ECDL 2000, Lisbon,
September
[3] XSL Transformations (XSLT) Version 1.0 (1999) W3C
Recommendation, 16 November http://www.w3.org/TR/xslt.html
[4] MetaNet Search Page http://sunspot.dstc.edu.au:8888/Metanet/Top.html#
[5] RDF Schema Specification 1.0 (2000) W3C Candidate
Recommendation, 27 March http://www.w3.org/TR/rdf-schema/
[6] Dublin Core/MARC/GILS Crosswalk (1999) November http://lcweb.loc.gov/marc/dccross.html
[7] ISO 2788 (1986) Documentation -- Guidelines for the
Development and Establishment of Monolingual Thesauri
[8] ISO 5964 (1985) Documentation -- Guidelines for the
Development and Establishment of Multilingual Thesauri
[9] Library of Congress Subject Headings, Cataloging
Distribution Service, Library of Congress http://lcweb.loc.gov/cds/lcsh.html
[10] Medical Subject Headings home page http://www.nlm.nih.gov/mesh/meshhome.html
[11] Art and Architecture Thesaurus Browser, Getty Research
Institute http://shiva.pub.getty.edu/aat_browser/
[12] C. Paice (1991) "A Thesaural Model of Information
Retrieval". Information Processing and Management,
27(5):433-447
[13] A.R. Aronson (1994) "Exploiting a Large Thesaurus
for Information Retrieval". RIAO 94, New York,
October
[14] W. Bruce Croft and J. Yufeng (1994) "An Association
Thesaurus for Information Retrieval". RIAO 94, New York,
October
[15] Dublin Core Metadata Initiative http://purl.org/dc/
[16] MARC Standards, Library of Congress Network Development
and MARC Standards Office http://lcweb.loc.gov/marc/marc.html
[17] G. Rust and M. Bide (1999) "The indecs Metadata
Schema Building Blocks". Indecs Metadata Model, November http://www.indecs.org/pdf/schema.pdf
[18] MPEG-7 Home Page http://www.darmstadt.gmd.de/mobile/MPEG7/index.html/
[19] Content Standard for Digital Geospatial Metadata (CSDGM) http://www.fgdc.gov/metadata/contstan.html
[20] IEEE Learning Technology Standards Committee's Learning
Object Meta-data Working Group, Approved Working Draft WD5 Learning
Object Meta-data Scheme http://ltsc.ieee.org/wg12/
[21] ICOM/CIDOC Documentation Standards Group (1999) Revised
Definition of the CIDOC Conceptual Reference Model, September http://www.geneva-city.ch/musinfo/cidoc/oomodel
[22] C. Batini, M. Lenzerini and S.B. Navathe (1986) "A comparative
analysis of methodologies for database schema integration". ACM
Computing Surveys, 18(4):323-364, December
[23] E. Mena, V. Kashyap, A. Sheth and A. Illarramendi (1996)
"OBSERVER: An Approach for Query Processing in Global Information
Systems based on Interoperation across Pre-existing Ontologies". Proceedings
of the 1st IFCIS International Conference on Cooperative Information
Systems (CoopIS'96), Brussels, Belgium,
June (IEEE Computer Society Press)
[24] R. Bayardo, et al. (1997) "InfoSleuth:
Agent-based Semantic Integration of Information in Open and Dynamic
Environments". Proceedings of ACM SIGMOD Conference on Management
of Data, Tucson, Arizona, May,
pp. 195-206
[25] N. Guarino, C. Masolo and G. Vetere (1999)
"Ontoseek: Content-based Access to the Web". IEEE
Intelligent Systems, Vol. 14, No. 3, May/June, 70-80
[26] H. Mili and R. Rada (1988) "Merging Thesauri:
Principles and Evalauation". IEEE Transactions on Pattern
Analysis and Machine Intelligence, 10(2):204-220
[27] M. Doerr and I. Fundulaki (1998) "A proposal on
extended interthesaurus links semantics". Technical Report TR-215, Institute
of Computer
Science-FORTH, March
[28] M. Sintichakis and P. Constantopoulos (1997) "A
Method for Monolingual Thesauri Merging". Proceedings of the 20th
ACM International Conference on Research and Development in Information
Retrieval (ACM SIGIR), Philadeplphia, PA, USA, July
[29] B. Amann and I.
Fundulaki (1999) "Integrating Ontologies and Thesauri to Build RDF
Schemas". In ECDL'99: Research and Advanced Technologies for Digital
Libraries, Paris, France,
September, Lecture Notes in Computer Science (Springer-Verlag), pp.
234-253
[30] Ontology Inference Layer http://www.ontoknowledge.org/oil/
[31] International Federation of Library Associations and
Institutions (IFLA) (1998) Functional Requirements for Bibliographic
Records, March http://www.ifla.org/VII/s13/frbr/frbr.pdf
[32] ID3 Tag Version 2.3.0 http://www.id3.org/id3v2.3.0.html
[33] Xalan-Java Overview http://xml.apache.org/xalan/overview.html
[34] A. Cawsey (2000) "Presenting tailored resource
descriptions: Will XSLT do the job?". WWW9 conference, Amsterdam,
May http://www.cee.hw.ac.uk/~alison/www9/paper.html
[35] XML Path Language (XPath) Version 1.0 (1999) November http://www.w3.org/TR/xpath
[36] WordNet - a Lexical Database for English http://www.cogsci.princeton.edu/~wn/online/
[37] RDF Schema Representation of the MetaNet Thesaurus (2000)
October http://archive.dstc.edu.au/maenad/metanet.rdf
[38] CIMI Consortium for Interchange of Museum Information http://www.cimi.org/
[39] XML Schema Language http://www.w3.org/XML/Schema
Appendix
A
A.1
ABC Description of Example Resource
<?xml version="1.0"?> <ABC> <Event id="E1" Type="Performance"> <Title>Live At the Lincoln Centre</Title> <Context> <Date>7/4/98</Date> <Time>20:00</Time> <Place>Lincoln Centre</Place> </Context> <Act id="Act1"> <Agent>New York Philharmonic</Agent> <Role>Orchestra</Role> </Act> <Input id="comp523"/> <Output id="audio821"/> <Rights> Lincoln Center for Performing Arts </Rights> </Event> <Resource id="comp523"> <Type>Musical Score</Type> <Title>Concerto for Violin</Title> </Resource> <Resource id="audio821"> <Type>audio</Type> <Format>MP3</Format> <Length units="mins"> 130 </Length> </Resource> <ABC>
A.2 Simple
Resource-centric Description of Example Resource
<?xml version="1.0"?> <Resource id="audio821"> <Title>Live At Lincoln Center</Title> <Date.Performance>1998-07-04</Date.Performance> <Time.Performance>20:00</Time.Performance> <Place.Performance>Lincoln Centre</Place.Performance> <Agent.Orchestra>New York Philharmonic</Agent.Orchestra> <Relation.isPerformanceOf>comp523</Relation.isPerformanceOf> <Description>Performance of 'Concerto for Violin'</Description> <Rights> Lincoln Center for Performing Arts </Rights> <Type>audio</Type> <Format>MP3</Format> <Length units="mins"> 130 </Length> </Resource>
A.3 Dublin
Core Description of Simple Example
<?xml version="1.0"?> <!DOCTYPE rdf:RDF SYSTEM "http://purl.org/dc/schemas/dcmes-xml-20000714.dtd"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/"> <rdf:Description about="audio821"> <dc:Title>Live At Lincoln Center</dc:Title> <dc:Date.Performance>1998-07-04T20:00-05:00</dc:Date.Performance> <dc:Coverage>Lincoln Centre</dc:Coverage> <dc:Contributor.Orchestra>New York Philharmonic</dc:Contributor.Orchestra> <dc:Relation.isPerformanceOf>comp523</dc:Relation.isPerformanceOf> <dc:Description>Performance of 'Concerto for Violin'</dc:Description.Performance> <dc:Rights> Lincoln Center for Performing Arts </dc:Rights> <dc:Type>audio</dc:Type> <dc:Format>MP3</dc:Format> </rdf:Description> <rdf:RDF>
A.4 MPEG-7
Description
<?xml version="1.0"?> <MPEG-7 id="audio821"> <CreationMetaInformation> <Creation> <Title>Live at Lincoln Center</Title> <Creator> <Role>Orchestra</role> <Name>New York Philharmonic</Name> </Creator> <CreationDate> <day>7</day> <month>4</month> <year>1998</year> </CreationDate> <Location> <PlaceName>Lincoln Center</PlaceName> </Location> </Creation> <Classification> <Genre>Performance</Genre> </Classification> </CreationMetaInformation> <MediaInformation> <MediaProfile> <MediaFormat> <Medium>MP3</Medium> <Length><m>130</m></Length> </MediaFormat> </MediaProfile> </MediaInformation> <UsageMetaInformation> <Rights> <RightsId IdOrganization='Lincoln Center'/> </Rights> </UsageMetaInformation> </MPEG-7>
A.5 ID3
Description
<?xml version="1.0"?> <ID3> <!-- Unique Identifier --> <UFID>audio821</UFID> <!-- Title --> <TIT2>Live At Lincoln Center</TIT2> <!-- Orchestra --> <TPE2>New York Philharmonic</TPE2> <!-- Type or Genre --> <TCON>Performance</TCON> <!-- Media Type sound originated from--> <TMED>Audio/MP3</TMED> <!-- Date Recorded --> <TDAT>7/4/98</TDAT> <!-- Time Recorded --> <TIME>2100</TIME> <!-- Length in millisecs--> <TLEN>7800000</TLEN> <!-- Original recording or source --> <TOAL>comp523</TOAL> <!-- Copyright Message --> <TCOP>Lincoln Center of Performing Arts</TCOP> </ID3>
Appendix
B
B.1 XSL for
Transforming from ABC to DC
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dc ="http://purl.org/dc/elements/1.1/"> <xsl:output method="xml" indent="yes"/> <xsl:template match="ABC"> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc ="http://purl.org/dc/elements/1.1/"> <rdf:Description> <xsl:apply-templates select="Event"/> <xsl:apply-templates select="Resource"/> </rdf:Description> </rdf:RDF> </xsl:template> <xsl:template match="Event"> <xsl:apply-templates select="Output"/> <xsl:apply-templates select="Context"/> <xsl:apply-templates select="Act"/> <xsl:apply-templates select="Input"/> <xsl:apply-templates select="Title"/> <xsl:apply-templates select="Rights"/> </xsl:template> <xsl:template match="Output"> <xsl:attribute name="about"> <xsl:value-of select="@id"/> </xsl:attribute> <xsl:copy-of select="*"/> </xsl:template> <xsl:template match="Context"> <xsl:apply-templates select="Date"/> <xsl:apply-templates select="Place"/> </xsl:template> <xsl:template match="Date"> <xsl:element name="dc:Date.{../../@Type}"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Place"> <xsl:element name="dc:Coverage"> <xsl:value-of select='.'/> </xsl:element> </xsl:template> <xsl:template match="Act"> <xsl:element name="dc:Contributor.{Role}"> <xsl:value-of select='Agent'/> </xsl:element> </xsl:template> <xsl:template match="Input"> <xsl:element name="dc:Relation.is{../@Type}Of"> <xsl:value-of select="@id"/> </xsl:element> </xsl:template> <xsl:template match="Title"> <xsl:element name="dc:Title"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Rights"> <xsl:element name="dc:Rights"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Resource"> <xsl:if test="@id=../Event/Output/@id"> <xsl:apply-templates select="Type"/> <xsl:apply-templates select="Format"/> </xsl:if> <xsl:if test="@id=../Event/Input/@id"> <xsl:element name="dc:Description"> <xsl:value-of select="../Event/@Type"/> of <xsl:text>"</xsl:text> <xsl:value-of select="Title"/> <xsl:text>"</xsl:text> </xsl:element> </xsl:if> </xsl:template> <xsl:template match="Type"> <xsl:element name="dc:Type"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Format"> <xsl:element name="dc:Format"> <xsl:value-of select="."/> </xsl:element> </xsl:template> </xsl:stylesheet>
B.2 XSL for
Transforming from ABC to ID3
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:id3 ="http://www.id3.org/id3v2.3.0"> <xsl:output method="xml" indent="yes"/> <xsl:template match="ABC"> <id3:ID3 xmlns:id3="http://www.id3.org/id3v2.3.0"> <xsl:apply-templates select="Event"/> <xsl:apply-templates select="Resource"/> </id3:ID3> </xsl:template> <xsl:template match="Event"> <xsl:apply-templates/> </xsl:template> <xsl:template match="Output"> <xsl:element name="id3:UFID"> <xsl:value-of select="@id"/> </xsl:element> </xsl:template> <xsl:template match="Context"> <id3:TDAT> <xsl:value-of select="Date"/> </id3:TDAT> <id3:TIME> <xsl:value-of select="Time"/> </id3:TIME> </xsl:template> <xsl:template match="Act"> <xsl:element name="id3:TPE2"> <xsl:value-of select='Agent'/> </xsl:element> </xsl:template> <xsl:template match="Input"> <xsl:element name="id3:TOAL"> <xsl:value-of select="@id"/> </xsl:element> </xsl:template> <xsl:template match="Title"> <xsl:element name="id3:TIT2"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Rights"> <xsl:element name="id3:TCOP"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Resource"> <xsl:if test="@id=../Event/Output/@id"> <xsl:apply-templates select="Format"/> <xsl:apply-templates select="Length"/> </xsl:if> </xsl:template> <xsl:template match="Format"> <xsl:element name="id3:TMED"> <xsl:value-of select="."/>/<xsl:value-of select="../Type"/> </xsl:element> </xsl:template> <xsl:template match="Length"> <xsl:element name="id3:TLEN"> <xsl:value-of select=".*60*1000"/> </xsl:element> </xsl:template> </xsl:stylesheet>
B.3 XSL for
Transforming ABC to MPEG-7
<?xml version="1.0"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:mpeg7 ="http://www.mpeg7.org/2000/MPEG7_schema/"> <xsl:output method="xml" indent="yes"/> <xsl:template match="ABC"> <MPEG-7> <xsl:apply-templates select="Event"/> <xsl:apply-templates select="Resource"/> </MPEG-7> </xsl:template> <xsl:template match="Event"> <xsl:apply-templates select="Output"/> <CreationMetaInformation> <Creation> <xsl:apply-templates select="Title"/> <xsl:apply-templates select="Act"/> <xsl:apply-templates select="Context"/> <xsl:apply-templates select="Input"/> </Creation> <Classification> <xsl:element name="Genre"> <xsl:value-of select="@Type"/> </xsl:element> </Classification> </CreationMetaInformation> <xsl:apply-templates select="Rights"/> </xsl:template> <xsl:template match="Output"> <xsl:attribute name="id"> <xsl:value-of select="@id"/> </xsl:attribute> </xsl:template> <xsl:template match="Title"> <xsl:element name="Title"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Act"> <Creator> <xsl:element name="Role"> <xsl:value-of select='Role'/> </xsl:element> <xsl:element name="Name"> <xsl:value-of select='Agent'/> </xsl:element> </Creator> </xsl:template> <xsl:template match="Context"> <CreationDate> <xsl:variable name="date" select='Date'/> <xsl:variable name="my" select="substring-after($date,'/')"/> <xsl:element name="day"> <xsl:value-of select="substring-before($date,'/')"/> </xsl:element> <xsl:element name="month"> <xsl:value-of select="substring-before($my,'/')"/> </xsl:element> <xsl:element name="year"> <xsl:value-of select="substring-after($my,'/')"/> </xsl:element> </CreationDate> <Location> <xsl:element name="PlaceName"> <xsl:value-of select='Place'/> </xsl:element> </Location> </xsl:template> <xsl:template match="Input"> </xsl:template> <xsl:template match="Rights"> <UsageMetaInformation> <Rights> <xsl:element name="RightsId"> <xsl:value-of select="."/> </xsl:element> </Rights> </UsageMetaInformation> </xsl:template> <xsl:template match="Resource"> <xsl:if test="@id=../Event/Output/@id"> <MediaInformation> <MediaProfile> <MediaFormat> <xsl:apply-templates select="Format"/> <xsl:apply-templates select="Length"/> </MediaFormat> </MediaProfile> </MediaInformation> </xsl:if> </xsl:template> <xsl:template match="Format"> <xsl:element name="Medium"> <xsl:value-of select="."/> </xsl:element> </xsl:template> <xsl:template match="Length"> <Length> <xsl:if test="@units='mins'"> <xsl:element name="m"> <xsl:value-of select="."/> </xsl:element> </xsl:if> </Length> </xsl:template> </xsl:stylesheet>
|