[Page 63]
A moment's reflection on our literary heritage brings to mind a plentiful supply of gatekeepers, and rarely is the role an unsullied one: from Guillaume de Lorris to John Lydgate and beyond, medieval gatekeepers cheerfully admit wanderers into morally duplicitous bowers of earthly beauty; Shakespeare's equivocating devil-porter, pissed and pissing, staggers at Macbeth's Hell-Gate; and the execrable shapes of Sin and Death loom large and ghastly at the entrance to Milton's Chaos. As a modern-day gatekeeper to a virtual garden of etext delights I hope to spare my patrons the frustration, shocks, and struggles that literary adventurers often experience in such places; and if -- like their medieval counterparts -- the modern users find the etexts to be dizzying at first, I hope to be able to parlay this initial headiness into lasting enthusiasm and appropriate use.
Since opening in August 1992, the University of Virginia's Electronic Text Center has tried to exemplify what can now be achieved on a University-wide scale with electronic texts. The Center combines an on-line archive of thousands of texts with a library-based collection of hardware and software suitable for the creation and analysis of text. Through ongoing training sessions and support of individual teaching and research projects, the Library is building a diverse and expanding user community locally and providing a potential model for similar enterprises at other institutions.
The Center is staffed by the Coordinator and a team of graduate assistants, all currently drawn from various humanities departments at the University of Virginia. The staff members have backgrounds in bibliography, undergraduate teaching, textual editing, Special Collections, and graduate research. These skills reflect and support the needs of our patrons, and help us to provide nurturing and training appropriate to users familiar with the texts but often not familiar at all with the computer as a tool for textual inquiry.
Our etext endeavor comprises what may conveniently be thought of as an on-line and an on-site sphere of activity. The on-line component consists of a growing collection of electronic full-text databases, all accessible 24 hours a day by any University of Virginia student, faculty or staff member from anywhere in the world (contractual obligations prevent access by users who lack a University of Virginia affiliation). The Archive currently includes the following items:
These texts are not only on-line and available to multiple simultaneous users, but they all use a single common piece of search software. Having been taught to use one database, a user then has the knowledge necessary to search any of our databases. This fact has significant training implications and does much to overcome the frustration and inefficiency involved with CD-ROM based etexts, where each disk typically has a different search tool. Rather than teaching a patron to read a single electronic book, whose rules for access may well differ from the next electronic text collection he or she uses, we are able to teach the user how to negotiate a single software package through which all our electronic texts can be reached.
By buying the data alone, we can also create conglomerations of electronic books that can be searched together: for example, the collection of British philosophy available from InteLex Corp. is a valuable product, but the value is enhanced when these works exist, as they do at U Va, in a much larger collection of modern English texts. A user can choose to limit enquiries to Hume or to all the InteLex texts, but the user can just as easily remove this limitation and trace an image or concept out into other literary, historical, and philosophical works.
Figure 1 shows an example of such a search, in this case of the portion of the English Poetry Full-Text Database that has been released (about 1,500 works). The search window in the upper left-hand corner of the screen includes a record of one's past searches, a Key Word In Context (KWIC) concordance of the results (with SGML tags visible), and - in the column headed "Components"-a list of the categories that are marked with SGML tags and that can be used in building a search. The first six searches identify various forms of "gatekeeper" and "doorkeeper." Then, to widen the search, we ask for "gate" plus "door" near "keeper" (by default, near means within 80 characters). These searches are added together, and the "gatekeeper/doorkeeper" set is then limited to 19th-century works only. Running clockwise from the top right-hand corner are display windows showing a variety of 19th-century poems in which the idea of a gatekeeper is represented. The window in the lower left-hand corner shows the poem "First Impressions" with all the SGML tagging made visible.
All the electronic texts are encoded with Standard Generalized Markup Language (SGML). The large-scale electronic text databases -- the OED, the Chadwyck-Healey items - come fully marked up, and increasingly we are seeing producers of individual titles (such as Oxford University Press) also offering them in SGML form. The SGML markup not only means that texts can be added together in conglomerations but also that the data, with all its structural and typographic information, is not inherently wedded to a piece of software. It is, in a real sense, data that will outlive the software we currently use to explore and present it.
Those texts that come to us without any markup receive a basic level of tagging at the Etext Center, a task that is aided by the involvement of volunteers from various Library departments under a Staff Sharing program for cross-training. The use of volunteers from within the library in the creation of the etext archive is also an effective way of incorporating this new electronic data service into the fabric of the library. Because there is a danger with such enterprises that they exist in the library but are not really "of the library", we wanted to do what we could from the beginning to integrate electronic texts and print texts. To this end, the catalogers and bibliographers apply their professional skills to the acquisition and bibliographical control of the electronic texts in much the same way that they do with print items. The willingness with which the library as a whole has incorporated the etext initiative has contributed noticeably to the early success of the Etext Center.
The Electronic Text Center provides a place in which to use those texts not available on-line outside the Library. These include CD-ROM products such as The Global Jewish Database, Perseus (a collection of Greek texts and images), Immanuel Kant's Gesammelte Schriften,
Figure 2 shows an example of an electronic text
that is comprised of a digitized manuscript and
an SGML transcription. An original Jefferson
letter from the Special Collections Department
was digitized in the Etext Center (on screen, the
image is in color), and appears here with an
enlarged detail in the lower left-hand corner.
The image-viewing software gives one the
ability to alter the color balance in an image
and to enlarge details (the amount of
enlargement is dictated by the resolution at
which the digital image was scanned).
Alongside the image is a searchable
transcription of the text (here shown both as it
would usually appear and also -- bottom left-hand
corner -- in its "raw" SGML state). The
searchable text maintains the lineation and
typographic peculiarities of the manuscript
(such as the double hyphens for line-end
hyphenated words). However, this causes a
problem for searching. If the searchable text
existed here simply as the transcription, one
could not search for the line-end hyphenated
words as whole words, because (to take the first
one) "account" would exist in this letter only as
"ac-
From its inception, the Electronic Text Center
has been alert to the need for
on-going user
education. It became clear very quickly that it
was not enough simply to announce our services
and wait for users to arrive, especially as the
tools and methodologies offered are still
generally unfamiliar to faculty and students.
The assumption that "if you build it, they will
come" is only partially true. For many users it
is more accurately (if more clumsily) stated as
"if you expose them to it and support their use,
they will come back." In light of this, we
contact faculty, feature open houses, teach
general and advanced training sessions, offer
classes tailored to particular courses, and
provide short ad hoc sessions to walk-in users in
order to train them to some aspect of the
service. The training sessions structured to a
particular course have been particularly
successful, and they mean that the decision to
use etexts in a class -- often for the first time
does not obligate the faculty member to learn
immediately how to teach an unfamiliar set of
skills.
The Center continues to give significant time to
the creation of on-line and in-print
documentation (increasingly, these items are
available on the University of Virginia gopher
and World Wide Web servers). These
documents include introductions to the Etext
Center, to the use of the on-line archive (both
beginning and advanced sessions), to the off-line texts,
and to text creation (including OCR
scanning), text formatting (including SGML
markup), and text analysis software. We also
have something of an education role beyond the
University. Because we have received
considerable regional and national publicity,
Usage of the etexts and the Center has been
heavier and more diverse that we had any
right to suspect, a testament to the breadth of
the initial holdings and the manner in which
the services have been introduced. In 1993
there were over 7,500 remote logins from over
1,600 on-line users, and the Center itself has
seen a steadily increasing number of users. A
sampling of the on-line and off-line projects
undertaken by our users this year are listed
below:
As this service develops and matures, we are
seeing electronic texts and related technologies
become an increasingly valuable and valued
pedagogic and scholarly resource. Scholars
quickly understand that electronic documents
have several obvious benefits: they can be
searched quickly for phrases, words, and
combinations of words, allowing one to try out
notions and hypotheses with great speed; they
encourage large-scale searches over oeuvres,
genres, and centuries, searches which are
difficult and time-consuming with printed texts
alone; they can provide access to texts
otherwise unavailable, and they allow such
work to be done from one's home or office.
As a gatekeeper to this new realm, I play
variously the role of guide, collaborator,
cheerleader, and teacher. The lessons of the
past 18 months have been clear: that a body of
etexts delivered to the user on-line, through a
common interface, spurs use in a way that a
collection of texts on CD-ROMS in a library
-count" and would need to be searched
for in that form. However, by using a tag called
, which encloses a normalized form of the
line-end hyphenated word -- account --
but does not show up on-screen in the text-viewing
window (unless the user asks to see all the tags),
then one can have a fully searchable version of the manuscript
without the need to regularize (and therefore
lose) characteristic period or author details
such as double line-end hyphens. One could do
the same with spelling variants and
grammatical errors-in this letter, Jefferson
misspells "received" and misuses "it's," and
both could be followed by a tag. In this
letter, SGML tags are being employed to mark
off structure, to facilitate searching by the
inclusion of regularized forms in addition to the
transcribed forms, and (not visible in this
example) to record in the searchable text the
name and location of the digital image file of
the manuscript page.
[Page 66]
scores of librarians and scholars from other
institutions have phoned and e-mailed with
queries, and we have seen many on-site visitors,
including parties from the following: Harvard,
Indiana, Johns Hopkins, Iowa, Duke, Yale, the
University of Nottingham, Virginia Tech,
Emory, Kentucky, the University of Richmond,
UNC Chapel Hill, William & Mary, Oxford,
Groningen, Leiden, Macquarie University
(Sydney), University College, London, and the
British Library. We hope that this activity
will help foster the development of electronic
text services elsewhere, and by so doing to help
build a marketplace for etexts that in turn
should encourage publishers to make available
more electronic versions of texts for use on-line.
[Page 67]
cannot hope to do; that the integration of this
new service into the fabric of the library
enhances its ability to establish itself quickly;
that choosing, handling, and presentation of
etexts is a textual as much as a technical
endeavor and needs to be done by people with
textual and bibliographic skills; and that the
users will come and will find increasing use for
the various etext services, but they need the ongoing
support of gatekeepers who can identify
the means of entrance to this garden of
delights, demonstrate the enduring value of the
contents therein, and facilitate the growth of
new uses.
For further information, contact:
David Seaman, University of Virginia Library
phone: 804-924-3230.
E-mail: etextcenter@virginia.edu
Connect to the Electronic Text Center