Introduction:
This article reports on
work designed to extend a preliminary investigation (Craven, 2000a) of
how people and organizations summarize their own web pages and,
specifically, how and to what extent they make use of meta tags,
especially those with the NAME attribute equal to DESCRIPTION.
The background to this
investigation was research conducted over a number of years directed
toward developing a prototype computerized abstractor's assistant
(Craven, 1988, 1991, 1993, 1996, 1998). As a kind of writer's assistant,
such a software package includes a simple word processor and other
general writer's tools (Kozma, 1991) with their functioning adapted to
fit the needs of abstractors and writers of other kinds of short
summaries. In addition, the package integrates such tools as an automatic
extractor, related specifically to the task of summarizing. In addition,
Paice (1994) has provided a list of desirable features for such a package.
A hybrid system, in
which some tasks are performed by human abstractors and others by software,
appears to be an appropriate short-term goal since purely automatic
abstracting methods (Paice, 1990, 1994; Endres-Niggemeyer, 1998, pp.
297-366; Pinto & Galvez, 1999) do not show immediate promise of
totally superseding human effort.
At least two possible
benefits are expected from the study of web-page authors' actual practice
in summarizing their documents in meta tags. The first of these is in the
design of software to assist authors in tag generation. This expectation is based in part
on an assumption that author-created descriptions will reflect features
that authors and other users consider desirable. The second anticipated
benefit is in browser design: to know whether introducing a feature to
display the description, as, for example, a title is commonly displayed
in the caption bar, would benefit visitors to a substantial number of web
sites (compare Beagle, 1999).
Other aspects of the
content of web pages have been studied by various researchers. King
(1998), for example, studied page layout of library home pages; Haas and
Grams (2000) concentrated on characteristics of the anchors found in
randomly selected pages; Almind and Ingwersen (1997) applied informetric
measures; and Harter and Ford (2000) studied links to e-journals and
their articles. Little investigation has been done of the meta tag with
NAME='DESCRIPTION' (hereafter referred to as the DESCRIPTION tag). Turner
and Brackbill (1998) do report the results of a small experiment that
showed that the addition of a DESCRIPTION tag did not improve
retrievability of web pages on Infoseek and Altavista.
Advice provided in both
printed and web-based sources on the function, content, structure, and
style of the DESCRIPTION tag has been reviewed elsewhere (Craven,
submitted). What follows is a brief summary of the main recommendations
garnered from that review.
1. The DESCRIPTION tag can be used for
an abstract.
2. Tag contents should not be
deceptive.
3. A description is particularly
useful for documents with little text.
4. The
description should be no more than two hundred characters (though some
recommended maximum range from one hundred to 256). There is a more
absolute technical upper bound of around one thousand characters.
5. The description should be concise.
6. The description should reflect the
single page, a whole site, or both (advice varies).
7. A number of keywords should be
included in the description.
8. The most important words should be
near the beginning of the description.
9. The description should not be the
same as the title.
Sample descriptions in
the sources showed various patterns. Some contained what appear to be
formulaic elements such as "Home page for"; others did not.
It has been noted that
there is also a DESCRIPTION element in the Dublin Core (2000), defined as
"an account of the content of the resource," with the further
comment that it "may include but is not limited to: an abstract,
table of contents, reference to a graphical representation of content or
a free-text account of the content." As a meta tag, this element
appears in standard form with the name DC.DESCRIPTION.
Sample
Methodology
In order to estimate
the total variety of use of the DESCRIPTION tag, it is desirable to
obtain a broad, representative sample of publicly available web pages,
especially of those pages employing meta tags, in particular the
DESCRIPTION tag. A survey of forty-two search engines (in December 1999)
(Craven, 2000) revealed no feature on any that permitted searching for
specific meta tags. In terms of sampling web pages in general, Askjeeves
and Webcrawler permitted peeking at sample queries, OpenDirectory
selected random categories if an empty query was entered, and
All-In-One's "What's New Too" option showed the day's
announcements of new pages.
Since a sufficient
proportion of pages indexed by Yahoo! used meta tags, including the
DESCRIPTION tag (twenty-six percent), it was feasible to proceed by
sampling from these pages. At that time, an application program was
created in Delphi
3 to access sample web pages and to log results. The NEWT ActiveX control
included with Delphi
3 was employed to handle the hypertext transfer protocol (HTTP) and to
display pages as they were downloaded.
In the preliminary
study, only pages returned directly by Yahoo!'s random page service were
used (level-one pages). Since such pages were presumably registered with
Yahoo!, it may well be that they were also registered with other web
search services. Although Yahoo! ignores meta tags, the pages' creators
may have been particularly sensitized to the value of meta tags for some
of these other search services. Thus, in order to investigate possible
differences in non-registered pages, the present study added two other
types of page: a page reachable by following a random link from a page
returned by the random-page service (level-two page) and a page reachable
by following a random link from a level-two page (level-three page).
Requests for random pages were submitted to Yahoo! by a research
assistant over a period of twenty-five days. The aim was to retrieve at
least eight hundred pages at each level. Because the initial set of pages
at level one was gathered in a mode that included page rendering and
downloading of inline image files while the pages at levels two and three
were retrieved with these options off, the assistant also collected an
additional set of pages at level one with the mode matching that of
levels two and three. This
was performed about one month after the original level-one set.
Within each set, a test
for duplication of uniform resource locators (URLs) was carried out for
those pages containing DESCRIPTION tags. The number of links on each page
was also recorded. Included in this count for purposes of this study were
not only links defined by hypertext reference (HREF) values in anchor (A)
and area tags but also links defined by source (SRC) values in frame tags.
Results
A natural multiplier
effect meant that the proportion of successful downloads decreased as
level increased. For example,
to obtain the first two hundred level-one pages required only 274
requests, a success rate of seventy-three percent, but to get the first
two hundred level-two pages required 448 requests, for a success rate of
44.6 percent. It had been established in the preliminary study that the
most common reason for failure (nearly four out of five) was timing out,
usually with a "not found" or "404" message displayed
in the hypertext markup language (HTML) viewer. The number of successful
downloads was 833 for the first set at level one, 821 at level two, 817
at level three, and 1039 for the second set at level one.
As shown in Figure 1,
the proportion of pages containing meta tags held very close to
sixty-seven percent at all three levels; the proportion in the
preliminary study had been somewhat lower at fifty-seven percent. The
proportion containing the DESCRIPTION tag specifically was noticeably
higher at level one (38.4 percent and 39.2 percent) than at level two
(26.1 percent) and level three (27.9 percent).

Only one case of
duplication of a URL was found among the pages with DESCRIPTION tags, at
level 3.
Pages with descriptions
typically contained a number of links, with means somewhat above the
fifteen calculated in the preliminary study (19.7 at level one, 23.9 at
level two, 22.7 at level three). The maximum number of links on a page
was 326 (at level three). As in the preliminary study, few pages had no
links (two and three at level one, one at level two, and one at level
three).
Repeating the findings
of the preliminary study, very few pages used Dublin Core elements (two
and two at level one, one at level two, and one at level three). Three of
these included a Dublin Core description, in one case without a
DESCRIPTION tag, in one case with a DESCRIPTION tag with the same value,
and in one case with a slight difference in the wording of the two tag
values.
Discussion
The proportion of pages
using meta tags was even more noticeably above that of the 24.4 percent
reported by Qin & Wesley (1998) for pages in polymer chemistry than
had been the case for the preliminary study. The proportion of pages
using the DESCRIPTION tag at level one was noticeably greater than the
proportion at levels two and three, which were close to the twenty-six
percent found in the preliminary study and much above the figure of about
twenty-one percent using meta tags with both the names KEYWORD and
DESCRIPTION cited by Clark (2000). The higher rate of page tagging at
level one would seem to support the hypothesis that developers are more
likely to pay attention to tagging for pages that they will be submitting
to search services.
Errors excepted, it
appears that the pages returned by requests to the Yahoo! random-page
generator are generally home pages. A rough confirmation of this
conclusion can be obtained from statistics on the forms of the URLs. Only
178 and 243, or 21.4 percent and 23.4 percent, of the URLs for the level
one pages contain the string ".htm" (which would include both
the ".htm" and the ".html" extensions). Although some
of the others will represent other file formats, most of them are simply
references to the default pages delivered by servers for their root or
other directories. By contrast, 611, or seventy-four percent, of the
level-two pages and 621, or 76.1 percent, of the level-three pages
contain ".htm" in their URLs.
The extremely low level
of duplication of URLs confirms that the Yahoo! random-page service gives
access to a large sample of web pages and suggests that page duplication
as such would not be a significant concern in future studies involving
this method of selection. On the other hand, there was some evidence of a
minor concentration of results on certain sites, most noticeably that of
the Toronto National Post, for
which the service, at various levels, returned a total of thirty-three
pages with the same description.
Comparison of Descriptions with Visible Text
Purpose
Two main questions were
posed regarding the relationship of the description to what the user
would actually see in the browser window. First, are pages with little
visible text more likely to be given descriptions as is recommended by
some of the sources? Second, to what extent do the descriptions merely
repeat words or phrases that are visible on the pages?
Methodology
Visible text was defined as all
text that was not part of a tag. This would typically include the page's
caption title and most text normally displayed as text in the viewer.
Visible textual material that would be excluded would include button
captions in forms and any text loaded as part of a frame. Text appearing
in graphic form would also be excluded as would any alternate text (ALT)
values given in the image (IMG) tag.
A description was defined as the value given to a DESCRIPTION
tag in a downloaded file.
Degree of match of each
description to the corresponding visible text was calculated for words
and phrases. The measure used for words was the density of non
case-sensitive matches of visible-text words within the description. The
measure used for phrasing was density of non case-sensitive matches of
visible-text two-word sequences within the description. A word was
defined as any sequence of alphabetic characters delimited by other types
of characters. In addition, the longest visible-text word sequence found
in each description was logged; in the case of a tie, the tied sequences
were all logged, separated by a delimiter, in a single record.
Results
Lengths of the visible
text varied between 0 and 178,637 bytes. The median length increased
somewhat with level (565.5 and 647 for level one, 764.5 for level two,
and 896 for level three). Shorter visible texts were significantly less
likely to be associated with descriptions, even leaving aside those pages
with no visible text (with p < 0.0001 in t-tests applied to the logarithm
of number of characters in visible text).
As shown in Figure 2,
the descriptions were generally relatively short. The mean length was
just over twenty words at all levels (21.9 and 24.2 at level one and 21.3
and 20.9 at the other two levels). But they could be fairly long, with
the longest being 173 words (the preliminary study had found one at 294
words). The shortest that contained any words consisted of the single
words "strong" and "test." Using a different measure, 181 and
238, or 56.6 percent and 58.5 percent, were no more than 150 characters
in length at level one; 128, or 59.8 percent, at level two; 142, or 62.3
percent, at level three; and 243 and 310, or 75.9 percent and 76.2
percent, were no more than two hundred characters at level one; 165, or
77.1 percent, at level two; and 178, or 78.1 percent, at level three. The
longest was 839 characters.

The density of
visible-text words in the descriptions ranged from zero percent to one
hundred percent, with means of 87.1 percent and 91.3 percent at level one
and 84.9 percent and 82.2 percent at the other two levels. As is clear
from Figure 3, the distribution was heavily skewed toward the high end.
This was especially true at level one, where 186 and 277, or 58.1 percent
and 68.2 percent, of the descriptions had a visible-text word density
greater than ninety percent, figures somewhat higher than the fifty-two
percent observed in the preliminary study. The skew was also found, to a
declining degree, at the other levels, where 105, or 49.1 percent, and
ninety-three, or 40.8 percent, showed a density of greater than ninety
percent. Visible-text word density was in fact one hundred percent for
117 and 168 descriptions, or 36.6 percent and 41.4 percent, at level one,
sixty-eight descriptions, or 31.8 percent, at level two, and sixty-two
descriptions, or 27.2 percent, at level three.

In contrast, the
distribution of density of visible text phrasing, as shown in Figure 4,
appears to be bimodal, as noted also in the preliminary study, with one
local maximum somewhere in the twenty to fifty percent range and another
in the ninety to one hundred percent range. In a number of cases
(thirty-six and forty at level one, thirteen at level two, and eighteen
at level three), the density was one hundred percent, meaning that the
entire description was word-for-word a sequence also found in the visible
text. Many such descriptions were short, but the longest (at level three)
was sixty-two words, exceeding the forty-six-word record set in the
preliminary study.

This entire description
of sixty-two words (http://octopus-design.co.uk:80/octopus/home-nos.htm)
was also the longest word sequence in a description that exactly matched
one in the visible text:
Octopus Design are a team of graphic design professionals
utilising the latest computer technology to assist a wide range of skills
effectively and creatively. If you are looking for a design company who
are committed to supplying top quality graphic design solutions on time
and on budget, why not view our portfolio here, e-mail us for more
information or telephone. UK
0118 934 4209
Discussion
As in the preliminary
study, the suggestion of one source that descriptions are especially
important to pages with little visible text was not reflected in
practice: pages with less visible text were actually less likely to
contain the DESCRIPTION tag at all levels. This observation does not, of
course, invalidate the advice.
The great majority of
the descriptions conformed to the common maximum-length guideline of two
hundred characters. A smaller majority conformed to the more restrictive
guideline of 150 characters given by HotBot and others. The fact that
more than twenty percent of descriptions at all levels, as in the
preliminary study, exceeded the two-hundred-character limit, in one case
reaching more than four times that number of characters, may suggest that
there is a need on the part of some authors for a way of including much
longer descriptive information. The one description in the preliminary
study that was almost nine times the recommended maximum length seems,
however, to have been particularly anomalous, exceeding even the more
technical limit of one thousand characters.
The bimodal appearance
of the phrasing density distribution clearly suggests at least two
approaches to authoring descriptions: on the one hand, production of an
expression that is original but that substantially echoes wording found
in the visible text and, on the other hand, exact copying of an entire
expression from somewhere in the visible text. It is expected that future
research will examine which parts of the visible text tend to be
duplicated in the latter situation. For example, is it the first two
hundred words as suggested by some of the existing guidelines?
Since most descriptions
are not word-for-word repetitions of information provided in the visible
text, a browser feature to display the description might be of value to
some users. Such a feature would need to take into account that some
descriptions are quite long. A single line, like the caption bar used for
titles, would not be sufficient (indeed, even titles of web pages are
sometimes too long to fit into the space available in the caption bar).
The increasing complexity of HTML is making it more and more difficult
for a user to access non-display text by the alternative expedient of
viewing the page source.
Given the number of
links on the typical page, it seems reasonable to assume that, in many
cases, the descriptions are intended to apply to a site or collection of
pages rather than to a single page. A related study, not yet reported on,
used the variation of appending the visible text of linked pages to the
visible text of the initial random page. As hypothesized, both word-match
and phrase-match density within the descriptions increased substantially
with the addition of the linked-page texts, suggesting that the
descriptions are indeed intended to apply to multiple pages. Another
test, to be carried out in the future, will compare the results of
following only local links to see whether authors are more likely to use
descriptions for sites rather than for multi-site groupings of pages.
Conciseness and Structure
Purpose
Questions to be
addressed under the heading of conciseness and structure included the
following. Are descriptions generally concise, as recommended, and how
does their conciseness compare with that of scholarly abstracts? Do
descriptions use complete sentences, as recommended for abstracts, or do
they tend to consist of title-like noun phrases or other syntactic
structures? Are home-page descriptions more likely to use complete
sentences than descriptions for other pages? Is formulaic phrasing, as
suggested by some of the samples provided by the sources, at all common?
Methodology
Simpson's l (Simpson,
1949), a measure of concentration or repetition of vocabulary equal to
the probability that the words occurring at two different random
locations in a text are the same, was computed for each description as a
possible negative indicator of conciseness.
All descriptions were
analyzed for general syntactic structure. For this purpose, each
description was considered to be divided into segments by sentence-level
punctuation marks: periods, exclamation marks, and question marks. Each
segment was then categorized as a noun phrase or sequence of noun phrases
(n), a verb phrase or sequence of verb phrases (v), an adjectival or
adverbial phrase or sequence (m), a sentence in the indicative mood (s),
a sentence in the imperative mood (c), or other (o). For example, the
description "Marshall Media produces high-quality CD-ROMS for
children and adults. Order online from our shop." would be coded sc.
To estimate inter-rater
reliability, the research assistant was first asked to re-code sixty-five
descriptions previously coded in the preliminary study. Consistency on
this test was 90.8 percent on all codings and rose to 95.4 percent when
all codings except n, nn, s, and ss were collapsed into a single
"other" category. Reliability was thus deemed more than
sufficient for the assistant to proceed with coding the main description
sets independently.
Results
Values for Simpson's l
were similar across the levels, with means of 0.0127 for both level one
sets and 0.0108 and 0.0136 for the other two levels, and ranged from
0.0000 to 0.2273. The most common value was zero, generally in shorter
descriptions, though the longest was thirty words: ""Desertcom
- Just what you've come to expect!!! A 15-year history of tradition with Southern California businesses.
Panasonic Digital Business System, PanaVOICE, Active Voice, Pacific Bell
Network Services and more!" (http://www.desertcom.com:80/telecom.htm).
The most common words
in the descriptions (at least one occurrence for every thirteen
descriptions), apart from obvious stopwords, were as shown in Table 1.
.
|
Table 1: Common words
in descriptions
|
|
Level 1a
|
Level 1b
|
Level 2
|
Level 3
|
|
DESIGN
INFORMATION
INTERNET
NEWS
ONLINE
SERVICE
SERVICES
SITE
WEB
|
27
30
35
27
25
25
30
35
44
|
BUSINESS
INFORMATION
NATIONAL
NEWS
SERVICES
WEB
WORLD
|
37
33
51
43
34
41
36
|
BUSINESS
ESTATE
INTERNET
NATIONAL
NEWS
ONLINE
REAL
SERVICES
SITE
WEB
WORLD
|
23
19
31
19
22
17
20
25
24
33
25
|
DESIGN
INFORMATION
INTERNET
NATIONAL
NEWS
ONLINE
SERVICES
SITE
WEB
WORLD
|
20
20
22
27
22
22
19
21
33
26
|
|
|
|
|
|
|
|
|
|
Common
syntactic patterns are shown in Table 2.
Omitted
In the "other" category, eighty-three descriptions
contained imperative (c) segments, almost always at the end, in eighteen instances
after a single indicative-mood (s) sentence. The largest number of
segments in a description was eleven, where the segments all consisted of
brief noun phrases separated by periods.
Discussion
The average values for
Simpson's l for descriptions were similar to those in the preliminary
study and only slightly lower than those observed in a previous study of
abstracts produced with computer assistance (Craven, 2000b).
ONLINE, SERVICE,
INFORMATION, INTERNET, and SERVICES had all been noted as among the most
common words in the preliminary study. All the occurrences of ESTATE and
all but one occurrence of REAL at level two were in the phrase REAL
ESTATE, which was fairly concentrated in two descriptions but appeared in
others as well. The word pair REAL+ESTATE was noted as the eighth most
common in Wolfram's (1999) study of term co-occurrence in Excite queries.
NUDE, PICS, and XXX were among words that were common in the queries
analyzed by Wolfram but occurred only a couple of times each in the
descriptions.
Noun phrases or
sequences of noun phrases were much more common than one might expect in
an abstract, especially as one progressed from level one to levels two
and three. The increasing preference for noun phrases with level would be
consistent with some tendency to apply abstract-like descriptions more to
home pages or pages to be registered with search services and to apply
subject-heading or title-like descriptions more to other, especially
subordinate, pages on a web site.
Imperative-mood
sentences would not be expected in abstracts, and they were relatively
rare in the web-page descriptions in spite of the presumably promotional
nature of many of the sites and the use of imperatives in some of the descriptions
provided as models.
Comparison with Keywords
Purpose
The main questions to
be addressed in comparing keywords were the extent to which keywords were
in fact found in descriptions and whether the advice to place keywords
near the beginning was followed. The amount of repetition within keyword
lists was also of interest.
Methodology
Wording and phrasing of
descriptions were compared to the contents of any meta tag with a name
attribute of KEYWORDS in a fashion similar to that applied for the
visible text. In addition, the mean position of keyword matches within
descriptions was calculated. Simpson's l was used to measure repetition
within keyword lists.
Results
The density of keywords
in the descriptions ranged from zero percent to one hundred percent but
averaged close to the thirty-eight percent value observed in the
preliminary study (36.9 percent and 36.7 percent at level one, 38.9
percent at level two, 39.9 percent at level three).
As in the preliminary
study, density in the descriptions of two-word sequences from the
keywords showed a local maximum in the zero-ten percent range at all
levels. No keyword word pairs were found in just under one-third of the
descriptions at all levels (ninety-three and 110 at level one, sixty-six
at level two, and sixty-five at level three). Again, a small number
consisted entirely of keyword word pairs (five at level one, four at
level two, and four at level three); the longest of these was 147 words
(<http://www.transdev.com/
eudora="autourl">http://www.transdev.com:80).
The mean position of
keyword matches was significantly more likely to be nearer the beginning
of the description than the end at all levels (0.0000 and p=0.0001 at
level one, p=0.0015 at level two, p=0.0001 at level three using a
chi-squared test), with the average position being around forty-five on a
0-100 scale. As in the preliminary study, length of the keywords had a
low positive correlation with length of description (0.2262 and 0.2146 at
level one, 0.1962 at level two, 0.2934 at level three).
For keyword lists,
Simpson's l was generally higher than for the descriptions, with means of
0.0301 and 0.0269 at level one and 0.0222 and 0.0260 at the other two
levels, and ranged from 0.0000 to 0.5000. The highest value was observed
in the clearly repetitive "sturgis, sturgis rally, STURGIS, STURGIS
RALLY, Sturgis, Sturgis Rally" (<http://www.sturgiscamping.com/
eudora="autourl">http://www.sturgiscamping.com:80).
Discussion
The results on average
position of keyword matches represented an addition to the preliminary
study where statistical significance had not been attained. They do
supply some support for the hypothesis that description developers/writers
are following the advice to put keywords near the beginnings of their
page descriptions, but the tendency does not appear to be very strong.
The slightly higher but
still modest values of Simpson's l for the keyword lists are not
consistent with a view of developers as engaged in widespread "word
stuffing" to increase retrievability of their pages. There are
obviously some exceptions.
Conclusion
From this study, the
following main findings have emerged:
1. Pages
with little visible text are actually less likely to be given
descriptions, contrary to what is recommended by some of the sources.
2. Descriptions
vary greatly in their repetition of words and phrases that are visible on
the pages: some repeat word for word; others repeat selectively.
3. Descriptions generally appear about
as concise as scholarly abstracts.
4. Unlike
abstracts, many descriptions tend to use noun phrases rather than
complete sentences. Use of complete sentences appears to be slightly more
characteristic of descriptions on home pages.
5. While
some words are found fairly frequently in descriptions, there is little
indication of widespread adoption of formulas.
6. Keywords
are found in descriptions to various extents. They have a slight tendency
to appear nearer the beginnings than the ends of descriptions, reflecting
in a very weak fashion the advice to place them up front.
7. On average, keyword lists are not
highly repetitive.
In future research, it
would be useful to rate a sample of descriptions for quality, using either
objective criteria or subjective human judgments or both. Even scholarly
abstracts have sometimes been found to be of poor quality. Pitkin,
Branagan, & Burmeister (1999), for example, demonstrated
inconsistencies and other defects in published author abstracts.
Inter-rater reliability might, however, be expected to be low. In a study
in which subjects rated different abstracts of the same document on
various criteria, agreement was sometimes very poor (Craven, 2000b).
A very simple kind of
assistance for web-page developers is already provided in WordPerfect,
namely, copying any abstract into the DESCRIPTION tag when exporting to
HTML. Conceivably, the abstract or description might also be
automatically copied to the corresponding Dublin Core tag. Dublin Core
elements were, however, rarely encountered in this study and the
DC.DESCRIPTION meta tag appears to be redundant, especially if it merely
duplicates the DESCRIPTION tag.
If more advanced tools
are to be produced to assist in the adding of appropriate meta tags to
HTML documents, it is likely that different tools will suit different
types of users. That individuals use quite different approaches in
writing abstracts has been noted in studies involving think-aloud
protocols (Endres-Niggemeyer, Waumans, & Yamashita, 1991); similar
observations are to be expected regarding the writing of other kinds of
summary. For composing web-page descriptions specifically, results of the
present study suggest that some authors might want a tool for copying
text from elsewhere on the page while others might find automatically
generated lists of key words or phrases to be helpful.
Acknowledgments
Research reported in
this article was supported in part by individual operating grant A9228 of
the Natural Sciences and Engineering Research Council of Canada.
The extensive
assistance of research assistant Michael Dub in data gathering and
categorization is also acknowledged.
References
• Almind, T.C., & Ingwersen, P. (1997). Informetric analyses
on the World Wide Web: Methodological approaches to 'Webmetrics'. Journal of Documentation, 53 (4), 404-426.
• Beagle, D. (1999). Visualization of metadata. Information Technology and Libraries,
18 (4), 192-199.
• Clark,
S. (2000). Back to basics: META
tags / WebDeveloper.com. Retrieved January 20, 2000 from the World Wide
Web: http://www.webdeveloper.com/html/html_metatags_part2.html.
• Craven, T.C. (1988). Text network display editing with
special reference to the production of customized abstracts. Canadian Journal of Information
Science, 13 (1/2), 59-68.
• Craven, T.C. (1991). Algorithms for graphic display of
sentence dependency structures. Information
Processing and Management, 27
(6), 603-613.
• Craven, T.C. (1993). A computer-aided abstracting tool
kit. Canadian Journal of
Information Science, 18
(2), 1993, 19-31.
• Craven, T.C. (1996). An experiment in the use of tools for
computer-assisted abstracting. In Hardin, S., ed., ASIS '96: Proceedings
of the 59th ASIS Annual Meeting 1996 (Volume 33), Baltimore, Maryland,
October 21-24, 1996. (pp. 203-208). Medford, New Jersey:
Information Today.
• Craven, T.C. (1998). Human creation of abstracts with
selected computer-assistance tools. Information
Research, 3 (4), paper 47.
On the World Wide Web: http://www.shef.ac.uk/~is/publications/infres/paper47.html.
• Craven, T.C. (2000a). Features of DESCRIPTION META tags in
public home pages. Journal of
Information Science, 26
(5), 303-311.
• Craven, T.C. (2000b). Abstracts produced using computer
assistance. Journal of the American
Society for Information Science, 51
(8), 245-256.
• Craven, T.C. (submitted). 'DESCRIPTION' META Tags in Locally Linked Web Pages.
Submitted for publication.
• Dublin Core Metadata Initiative / documents / proposed
recommendations / Dublin Core Element Set, version 1.1.2000. Retrieved
April 24, 2000 from the World Wide Web: http://purl.oclc.org/dc/documents/rec-dces-19990702.htm.
• Endres-Niggemeyer, B. (1998). Summarizing information. Berlin:
Springer.
• Endres-Niggemeyer, B., Waumans, W., & Yamashita, H.
(1991). Modelling summary writing by introspection: A small-scale
demonstrative study. Text, 11 (4), 523-552.
• Haas, S.W., & Grams, E.S. (2000). Readers, authors,
and page structure: A discussion of four questions arising from a content
analysis of Web pages. Journal of
the American Society for Information Science, 51 (2), 181-192.
• Harter, S.P., & Ford, C.E. (2000). Web-based analyses
of e-journal impact: Approaches, problems, and issues. • King, D.L.
(1998). Library home page design: A comparison of page layout for
front-ends to ARL library Web sites. College
and Research Libraries, 59
(5), 458-465.
• Kozma, R.B. (1991). The impact of computer-based tools and
embedded prompts on writing processes and products of novice and advanced
college writers. Cognition and
Instruction, 8 (1), 1-27.
• Paice, C. (1990). Constructing literature abstracts by
computer: Techniques and prospects. Information
Processing and Management, 26
(1), 171-186.
• Paice, C.D. (1994). Automatic abstracting. In: Kent A.,
& Hall, C.M., eds.,
Encyclopedia of Library and Information Science, (Vol. 53 [supplement
16], pp. 16-27). New York:
Dekker.
• Pinto, M., & Galvez, C. (1999). Paradigms for
abstracting systems. Journal of
Information Science, 25 (5)
365-380. Journal of the American
Society for Information Science, 51
(13), 1159-1176.
• Pitkin, R.M., Branagan, M.A., & Burmeister, L.F.
(1999). Accuracy of data in abstracts of published research articles. JAMA, 281 (12), 1110-1111.
• Qin, J., & Wesley, K. (1998). Web indexing with meta
fields: A survey of Web objects in polymer chemistry. Information Technology and Libraries,
17 (3), 149-156.
• Simpson, E.H. (1949). Measurement of diversity. Nature, 163, 688.
• Turner, T.P., & Brackbill, L. (1998) Rising to the
top: Evaluating the use of the HTML meta tag to improve retrieval of
World Wide Web documents through Internet search engines. Library Resources and Technical
Services, 42 (4), 258-271.
• Wolfram, D. (1999). Term co-occurrence in Internet
queries: An analysis of the Excite data base. Canadian Journal of Information and Library Science, 24 (2/3), 12-33.
|