|
Introduction:
The
development of the Web and the application of scanning technology to
library and museum collections offer unprecedented opportunities for
increasing access to library and museum special collections and unique
resources. In the Fall of 1998 Colorado's
archives, historical societies, libraries, and museums undertook an
effort to develop a collaborative with the vision of creating a virtual
collection of these unique resources and special collections, expanding
access to information on Colorado's
history, culture, and scientific heritage through digitization.
The
library and museum leaders who envisioned this initiative developed it
with several assumptions. First, that to complete the digital library for
the people of Colorado,
the special collections held by Colorado's
libraries, had to be added to the existing finding tools, online
catalogs, online databases, and Web resources. Digitization offered new
opportunities to providing access to these collections. Due to the
uniqueness of the materials in special collections and the fact that a
range of cultural heritage institutions holds these resources, the
collection couldn't just represent the holdings of libraries. The
collection had to include the collections of archives, historical
societies, and museums. As a result, the collaborative had to include
these institutions from the beginning. Third, the collection would be a
distributed virtual collection, not a single centralized databases of
images, taking advantage of the Internet, as well as allowing maximum
flexibility for local distribution options. Fourth the emerging standards
and best practices for metadata and scanning would be utilized for the
Project. Lastly, the K-12 community, in meeting the state education
standards, would incorporate digital objects in their lessons.
An
important aspect of the Project was the environment in which the
collaborative began its work. As with many other states and regions, Colorado's
cultural heritage institutions hold significant amounts of primary source
materials. These collections are widely dispersed, with limited access,
sometimes poorly organized and generally underutilized. There is
increased public demand for access to and use of primary source materials
due to many of the pioneering projects, including those of the Library of
Congress, Cornell
University,
and the Denver Public Library. Librarians and archivists indicated a need
to bring together traditional library materials and digital content,
while reducing physical use of the original materials.
Omitted
Like many other
states, we had many well-intended, undertrained, individuals digitizing
primary source materials and putting them on the Web. As a result, the
quality of the images was inconsistent and the metadata frequently
inadequate. While we were trying to create a new collaborative, Colorado
had a strong collaborative library and archives environment. However,
there was limited collaboration in the museum and historical society
environment, with few generally adopted standards. Lastly, standards on
scanning and metadata were only just emerging, so the Project had to look
to best practices or community standards to guide the standards
initiative. Lastly, the traditional commercial vendors supporting the
cultural heritage communities had not developed software to support the
digital content metadata. Successfully addressing these environmental
issues was crucial for the Project.
During the
first year, the Project focused on development of the collaborative and
establishing standards and guidelines to inform the future digitization
projects of the partner institutions. In a distributed networked
environment standards are the key to success. In the International
Federation of Library Associations document Digital Libraries:
Definitions, Issues and Challenges it is noted that
The technical
architecture will be a collection of disparate systems and resources
connected through the Internet and integrated within one interface - a
Web enabled interface. Common standards are needed to allow digital
libraries to interoperate and share resources. [1]
Without
standards the goals of the Project could not be met. However, the
standards environment could best be described as challenging and ever
changing.
CDP
Environment
As we
examined the 15 existing digitization projects (http://coloradodigital.coalliance.org/browse.html)
that were underway in the fall of 1998, we found that different cultural
heritage institutions have different standards and different levels of
adoption. Libraries have a long tradition of standards that are broadly
adopted by all types of libraries for a wide range of materials. Archives
have a tradition of standards in the areas of preservation and
conservation and best practices for finding aids. Within the museum
community, specific segments have standards, such as art museums, however
across the museum community there are few commonly adopted standards. Additionally
museums and historical societies are faced with a wide range of
non-textual, three-dimensional objects that provide additional
difficulties in description and scanning. Historical societies frequently
have a library, archives, and museum. Within each area, they are likely
to use the standards for that function.
As the
Colorado Digitization Project includes all four types of cultural
heritage institutions, we found a wide range of standards and practices.
Many of the museums and historical societies offered an exhibit approach
to display their digital objects with no search capability to allow the
user to locate an individual image. Several of the libraries used AACR-2
based MARC records to describe their digital objects, linking from the
Web site to the library online public access catalog to offer searching
that retrieves the individual object. Others converted their finding aids
to HTML coded documents, linking from the collection level record in
their online catalog to the HTML page on the Web site. The libraries are
doing both item level and collection level cataloging, frequently
depending on how the materials are organized. Item level metadata records
were preferred where more in-depth description of the individual items
was available, while collection level cataloging linked to finding aids
is offered where the finding aid was in a format that could be converted
to HTML. One museum, the Colorado
Springs Pioneer Museum, offered a database of
metadata describing their 40,000 three-dimensional artifacts, unique
among Colorado's
museums.

As the CDP
was based on the strategy of distributed metadata and distributed images
Web search capability was a major concern. With the wide range of
approaches taken by the Colorado
institutions, Web searching was further compromised. For many of the
individual collections, retrieval via a Web search was unsatisfactory, as
the Web search engines cannot access local databases or online catalogs.
In other cases the Web search took the user to the highest level of the
Web page, when the digital content was five levels lower on the site. To
meet the CDP vision of enhancing access to this virtual collection of
digital images, another answer to searching beyond the Web had to be
developed. The other major issue faced by the CDP was that there was
limited software to support the new metadata standards. [2]
CDP
and Metadata
The
metadata working group, one of five working groups established to develop
the Project, began addressing ways to improve access to digital objects
within the current metadata environment. They had to address several
issues:
- How do we improve on current Web
searching options?
- How do we realize the goal of improved
access in a distributed network environment approach?
- How do we deal with the diverse set of
standards, diverse communities, diverse clientele, diverse missions,
and diverse knowledge base?
- What approach can we take that will
realize the goal of increased access, while allowing for local
flexibility and autonomy?
After
several months of exploration, the CDP metadata working group developed a
set of assumptions upon which to make decisions. These included:
- The CDP could not mandate one metadata
standard; rather we had to build on the standards already adopted by
the particular community, offering a variety of standards.
- The CDP could not rely on the Web search
engines to provide access at the desired level.
- The CDP wanted to offer searching across
print and digital collections.
To address
these assumptions, the CDP metadata working group recommended the
development of a union catalog of metadata, bringing together the
metadata from the various projects, creating a single physical union
catalog and providing enriched access to the digital objects through this
union catalog. The CDP union catalog would provide the expanded search
capabilities that weren't available via Web searching as users would be
searching this specialized union catalog rather than the more general
Web. Metadata records from local online catalogs or databases would be
loaded onto a system that would offer a physical union catalog. The
system would have to support cross database searching, allowing the user
to search online catalogs of library and archive collections, Web
resources, and the CDP union catalog.
Following
the policy of not requiring a single standard, the union catalog would
have to be able to load records from various metadata standard-based
records, including AACR-2/MARC, SGML/XML based format (e.g. Dublin Core
(DC)), as well as records created on individual databases. To make this
approach a reality, the metadata group believed that the CDP needed to
define a set of mandatory metadata elements (http://coloradodigital.coalliance.org/standards.html).
Using the Dublin Core framework as the basis, the working group
identified seven mandatory elements of the 15 DC elements. The mandatory
elements include the creator, title, subject, description, identifier,
date, and format. The remaining eight elements are optional, but desirable.
In addition to being able to load records from a variety of sources and
in a variety of formats, the union catalog software also had to allow for
online input of a MARC or Dublin Core record, support Z39.50 searching,
be a production product with product support, and, in the future, support
the Encoded Archival Description, and output a MARC or Dublin Core record.
As of the
summer of 1999, the OCLC Site Search software was the only system that
met the above requirements. The CDP worked with the Colorado State
Library, expanded their existing SiteSearch contract to include the Site
Search database builder software and additional simultaneous users. As of
April 2000, the Project has defined the indexed fields for AACR-2/MARC
and Dublin Core, determined screen displays, and qualifiers. Testing of
records from the museum, archive, and library communities will begin soon.
We know
that we will encounter new problems as we begin to load these individual
databases. For example, many museums and historical societies do not
include titles for their three-dimensional artifacts, rather relying on
extensive description of the physical object for retrieval. In contrast
many in the library and archival community frequently make up titles for
items without a title. While this issue may be resolved through keyword
searching, a user doing a title keyword search won't retrieve these
objects without titles. The CDP is looking at various options for dealing
with this issue. Another key issue will be that of subject terminology. Not
only are we facing the issue of various terminologies used by cultural
heritage institutions, we are also facing different terminology in
several of our Projects because subject-based experts are operating the
databases. For example, the paleontologist at the Florissant
Fossil
Beds
National Monument
is the developer of that collections database. Terms used in that
database are very specific to the field and probably unknown to most
general users. The CDP metadata working group will have to address this issue
at some time in the near future.
CDP and Scanning
In
addition to metadata standards, the CDP had to address the scan standards
(http://coloradodigital.coalliance.org/standards.html).
The major issue in developing the standards or best practices for
scanning, was the lack of understanding of what decisions needed to be
made. The first thing we had to do was get the participants to think
beyond digitization as a means of increasing access. Several of the
projects scanned images at a very low resolution, one appropriate to
thumbnails, but inadequate for even some Web viewing. This resulted in
images that had limited application. On the other hand, the scanning
working group understood that we couldn't 'sell' the participants on a
higher resolution level based only on the reason that it could serve as
an archival use version, since many of them didn't see their digitization
initiatives as preservation projects. Like others, the Colorado
participants were skeptical as to the viability of digitization as a
preservation medium.

Key to
establishing standards was the adoption of a set of principles for
scanning. The CDP incorporated the following principles into their
guidelines.
- Scanning at the highest resolution
appropriate to the informational content of the originals
- Scanning at an appropriate level of
quality to avoid rescanning and re-handling of the originals in the
future - scan once
- Creating and storing a master image file
that can be used to produce derivative image files and serve a
variety of current and future user needs
- Using system components that are
non-proprietary
- Using image file formats and compression
techniques that conform to industry standards
- Creating backup copies of all files on a
stable medium
- Creating meaningful metadata for image
files or collections
- Storing media in an appropriate environment
- Monitoring and recopying data as
necessary
- Outlining a migration strategy for
transferring data across generations of technology
- Anticipating and planning for future
technological developments
These principles
are derived from a set of recommendations developed by Howard Besser, Best
practices for image capture. California Digital Library at www.cdlib.org/standards/moaa=bp71w95.doc
Once these
principles were in place, we reviewed the best practices of institutions
such as the Library of Congress, National Archives and Records
Administration, Ohiolink, and others. As a result of this review, the CDP
established a set of best practices and minimum recommended standards.
Some of the areas addressed as best practices and minimum standards
include:
- projects must create master, access and
thumbnail version of the image
- the CDP established different standards
for different format of materials, ie. text, transparent images,
opaque images
- established minimum standards, with the
caveat that the scanning be done at the level appropriate to the
individual item
- established quality standards
To assure
that the institutions had access to equipment that supported these
standards, the CDP established five regional scan centers. These centers
provide the Colorado
institutions with relatively easy access to scanning equipment,
assistance by trained staff in scanning, and access to the union catalog
and local databases via the Web. Each institution has to do their own
scanning. Training sessions on scanning and metadata are being conducted
throughout the spring and summer, 2000 at these regional scan centers. It
is hoped that the combination of consulting on scanning, training, and
quality equipment will result in a consistent quality image, as well as
developing expertise at the local institution level.
Conclusion:
What Have We Learned?
What has
been the result of our work to date? In general the institutions are
pleased to have these standards and best practice guidelines, finding
them useful as they develop their own plans for digitization. Key to the
adoption of these standards and best practices, was that we had
demonstrated that the concerns of all parties were met through working
groups activities. We wouldn't have been successful had we developed
standards with one group, for example the libraries, and then dictated
that they be used by all other institutions. Museums had to feel that
their issues were address by the standards. The same goes for the
archives and historical societies. It was critical to have
representatives from all the cultural heritage institution types at the
table from day one when determining the standards or best practices.
The standards
had to accommodate all format of materials individual photographs,
collections of letters, three-dimensional artifacts, textile items, etc.
This issue was particularly important if we were going to be including
archives, museums, and historical societies, whose collections contain
many of these resources.

We found
that within the state we had the expertise to undertake this type of
project. The knowledge of catalogers and registrars transferred to the
description and subject analysis of digital objects. We also recognized
the need to think about the description of these digital objects in some
different ways. We needed to consider the functional and administrative
metadata, which is usually not relevant for print materials, to assure
ongoing management of these digital assets. Some of the new or emerging
metadata standards capture this data, while incorporating it into MARC or
Dublin Core record is still under development.
More
importantly, we realized that we could learn much from the
representatives from the different institution types. The library
cataloger's understanding of the descriptive and analytical needs of
three-dimensional objects was expanded. The approaches museums and
historical societies take to describing their collections was new to many
and had to be accommodated in the metadata. Similarly scanning needs of
the different institutions had to be considered. For example the
requirements for a digital image of a painting held by an art museum,
which would be used for scholarly research or commercial publishing,
required a different set of standards and skills than scanning
photographs from a library collection for general user access.
After all
this work, the adoption of the standards by institutions undertaking
digitization projects will be the true test of whether the standards are
appropriate, meet the needs of the institutions, and allow us to realize
our goal of increased access. Through the IMLS grant, the CDP is awarding
19 grants to 27 institutions. These grantees must adopt the CDP standards
for metadata and scanning. They must contribute records to the union
catalog of metadata. These institutions will be the pilot projects for
our standards, based on their experience and the continued development of
the standards by the standards setting communities; the CDP will be
modifying the standards and guidelines.
Notes
1. International Federation of Library Associations and
Institutions, 1998. Digital Libraries: Definitions, Issues and
Challenges. The Hague, Netherlands:
IFLA.
2. The metadata standards considered by the CDP included Dublin
Core, Encoded Archival Description, VRA, AACR-2/MARC, etc.
|