Gateways, Gatekeepers, and Roles in the Information Omniverse
Proceedings of the Third Symposium held November 13-15, 1993 (Washington Vista Hotel, Washington, DC), edited by Ann Okerson & Dru Mogge. Washington, DC: Association of Research Libraries, 1994.
Gate-Keeping A Garden of Etext Delights: Electronic Texts and the Humanities at the University of Virginia Library
David Seaman, Coordinator
Electronic Text Center
University of Virginia Library
A moment's reflection on our literary heritage brings to mind a plentiful supply of gatekeepers, and rarely is the role an unsullied one: from Guillaume de Lorris to John Lydgate and beyond, medieval gatekeepers cheerfully admit wanderers into morally duplicitous bowers of earthly beauty; Shakespeare's equivocating devil-porter, pissed and pissing, staggers at Macbeth's Hell-Gate; and the execrable shapes of Sin and Death loom large and ghastly at the entrance to Milton's Chaos. As a modern-day gatekeeper to a virtual garden of etext delights I hope to spare my patrons the frustration, shocks, and struggles that literary adventurers often experience in such places; and if -- like their medieval counterparts -- the modern users find the etexts to be dizzying at first, I hope to be able to parlay this initial headiness into lasting enthusiasm and appropriate use.
Since opening in August 1992, the University of Virginia's Electronic Text Center has tried to exemplify what can now be achieved on a University-wide scale with electronic texts. The Center combines an on-line archive of thousands of texts with a library-based collection of hardware and software suitable for the creation and analysis of text. Through ongoing training sessions and support of individual teaching and research projects, the Library is building a diverse and expanding user community locally and providing a potential model for similar enterprises at other institutions.
The Center is staffed by the Coordinator and a team of graduate assistants, all currently drawn from various humanities departments at the University of Virginia. The staff members have backgrounds in bibliography, undergraduate teaching, textual editing, Special Collections, and graduate research. These skills reflect and support the needs of our patrons, and help us to provide nurturing and training appropriate to users familiar with the texts but often not familiar at all with the computer as a tool for textual inquiry.
Our etext endeavor comprises what may conveniently be thought of as an on-line and an on-site sphere of activity. The on-line component consists of a growing collection of electronic full-text databases, all accessible 24 hours a day by any University of Virginia student, faculty or staff member from anywhere in the world (contractual obligations prevent access by users who lack a University of Virginia affiliation). The Archive currently includes the following items:
Note: those of our texts that are publicly available can now be accessed by anyone through our world wide web server.
- The Oxford English Dictionary, second edition
- The entire corpus of Old English writings (c. 3,000 works).
- Selected Middle English titles (including the Riverside Chaucer, and works by Henryson, Gower, and the Gawain-Poet).
- Hundreds of Modern English literary, social, historical, religious, and philosophical works, from 1500 to the present.
- Smaller selections of French, Latin, and German works.
- The currently released parts of two massive databases from Chadwyck-Healey: J-P. Migne's Patrologia Latina, and the English Poetry Full-Text Database.
These texts are not only on-line and available to multiple simultaneous users, but they all use a single common piece of search software. Having been taught to use one database, a user then has the knowledge necessary to search any of our databases. This fact has significant training implications and does much to overcome the frustration and inefficiency involved with CD-ROM based etexts, where each disk typically has a different search tool. Rather than teaching a patron to read a single electronic book, whose rules for access may well differ from the next electronic text collection he or she uses, we are able to teach the user how to negotiate a single software package through which all our electronic texts can be reached.
By buying the data alone, we can also create conglomerations of electronic books that can be searched together: for example, the collection of British philosophy available from InteLex Corp. is a valuable product, but the value is enhanced when these works exist, as they do at U Va, in a much larger collection of modern English texts. A user can choose to limit enquiries to Hume or to all the InteLex texts, but the user can just as easily remove this limitation and trace an image or concept out into other literary, historical, and philosophical works.
Figure 1 shows an example of such a search, in this case of the portion of the English Poetry Full-Text Database that has been released (about 1,500 works). The search window in the upper left-hand corner of the screen includes a record of one's past searches, a Key Word In Context (KWIC) concordance of the results (with SGML tags visible), and - in the column headed "Components"-a list of the categories that are marked with SGML tags and that can be used in building a search. The first six searches identify various forms of "gatekeeper" and "doorkeeper." Then, to widen the search, we ask for "gate" plus "door" near "keeper" (by default, near means within 80 characters). These searches are added together, and the "gatekeeper/doorkeeper" set is then limited to 19th-century works only. Running clockwise from the top right-hand corner are display windows showing a variety of 19th-century poems in which the idea of a gatekeeper is represented. The window in the lower left-hand corner shows the poem "First Impressions" with all the SGML tagging made visible.
All the electronic texts are encoded with Standard Generalized Markup Language (SGML). The large-scale electronic text databases -- the OED, the Chadwyck-Healey items - come fully marked up, and increasingly we are seeing producers of individual titles (such as Oxford University Press) also offering them in SGML form. The SGML markup not only means that texts can be added together in conglomerations but also that the data, with all its structural and typographic information, is not inherently wedded to a piece of software. It is, in a real sense, data that will outlive the software we currently use to explore and present it.
Those texts that come to us without any markup receive a basic level of tagging at the Etext Center, a task that is aided by the involvement of volunteers from various Library departments under a Staff Sharing program for cross-training. The use of volunteers from within the library in the creation of the etext archive is also an effective way of incorporating this new electronic data service into the fabric of the library. Because there is a danger with such enterprises that they exist in the library but are not really "of the library", we wanted to do what we could from the beginning to integrate electronic texts and print texts. To this end, the catalogers and bibliographers apply their professional skills to the acquisition and bibliographical control of the electronic texts in much the same way that they do with print items. The willingness with which the library as a whole has incorporated the etext initiative has contributed noticeably to the early success of the Etext Center.
The Electronic Text Center provides a place in which to use those texts not available on-line outside the Library. These include CD-ROM products such as The Global Jewish Database, Perseus (a collection of Greek texts and images), Immanuel Kant's Gesammelte Schriften,
Thomas Aquinas's Omnia Opera, the CETEDOC Latin texts, and the I C A M E Collection of English Language Corpora; and other non-CD texts such as Hegel's The Phenomenology of Mind, and The Tale of Genji (in Japanese). The Center also makes available hardware and software that permits the creation and analysis of electronic texts, and it provides guidance and training for these new scholarly tools. At present we have MS-DOS machines, a NeXT, a Macintosh, an IBM RS/6000, scanners that turn printed text into computer-readable forms or produce digitized images, CD-ROM drives, large color monitors, and text-analysis software that can generate indices, collations, concordances, word-lists, statistical analyses, and hypertexts. Image- viewing software allows one to work with color and grayscale digital images alongside the searchable databases.
Figure 2 shows an example of an electronic
text that is comprised of a digitized manuscript and an SGML
transcription. An original Jefferson letter from the Special Collections
Department was digitized in the Etext Center (on screen, the image is in
color), and appears here with an enlarged detail in the lower left-hand
corner. The image-viewing software gives one the ability to alter the
color balance in an image and to enlarge details (the amount of
enlargement is dictated by the resolution at which the digital image was
scanned). Alongside the image is a searchable transcription of the text
(here shown both as it would usually appear and also -- bottom left-hand
corner -- in its "raw" SGML state). The searchable text maintains the
lineation and typographic peculiarities of the manuscript (such as the
double hyphens for line-end hyphenated words). However, this causes a
problem for searching. If the searchable text existed here simply as the
transcription, one could not search for the line-end hyphenated words as
whole words, because (to take the first one) "account" would exist in
this letter only as "ac-
From its inception, the Electronic Text Center has been alert to the need for on-going user education. It became clear very quickly that it was not enough simply to announce our services and wait for users to arrive, especially as the tools and methodologies offered are still generally unfamiliar to faculty and students. The assumption that "if you build it, they will come" is only partially true. For many users it is more accurately (if more clumsily) stated as "if you expose them to it and support their use, they will come back." In light of this, we contact faculty, feature open houses, teach general and advanced training sessions, offer classes tailored to particular courses, and provide short ad hoc sessions to walk-in users in order to train them to some aspect of the service. The training sessions structured to a particular course have been particularly successful, and they mean that the decision to use etexts in a class -- often for the first time does not obligate the faculty member to learn immediately how to teach an unfamiliar set of skills.
The Center continues to give significant time to the creation of on-line and in-print documentation (increasingly, these items are available on the University of Virginia gopher and World Wide Web servers). These documents include introductions to the Etext Center, to the use of the on-line archive (both beginning and advanced sessions), to the off-line texts, and to text creation (including OCR scanning), text formatting (including SGML markup), and text analysis software. We also have something of an education role beyond the University. Because we have received considerable regional and national publicity,
scores of librarians and scholars from other institutions have phoned and e-mailed with queries, and we have seen many on-site visitors, including parties from the following: Harvard, Indiana, Johns Hopkins, Iowa, Duke, Yale, the University of Nottingham, Virginia Tech, Emory, Kentucky, the University of Richmond, UNC Chapel Hill, William & Mary, Oxford, Groningen, Leiden, Macquarie University (Sydney), University College, London, and the British Library. We hope that this activity will help foster the development of electronic text services elsewhere, and by so doing to help build a marketplace for etexts that in turn should encourage publishers to make available more electronic versions of texts for use on-line.
Usage of the etexts and the Center has been heavier and more diverse that we had any right to suspect, a testament to the breadth of the initial holdings and the manner in which the services have been introduced. In 1993 there were over 7,500 remote logins from over 1,600 on-line users, and the Center itself has seen a steadily increasing number of users. A sampling of the on-line and off-line projects undertaken by our users this year are listed below:
- An English professor has added Mrs Sheridan's Lady Sidney Bidulph to the Frances Brooke novel she created last year, for use in a course on 18th-century women writers. These two long out-of-print works are only available to her students as electronic texts.
- Scholars attending an NEH summer seminar made heavy use of the Hebrew bible, the Talmud, and hundreds of books of rabbinical responses on the Global Jewish Database CD-ROM.
- Graduate bibliography students have used collating software, image scanning, and digitized sound while preparing and presenting editing projects.
- In Spring 1993, both the Computer Science and English Departments used the Center while teaching (both for the first time) etext-related courses.
- A French graduate student has generated cumulative sum analyses and word-frequency lists as part of a study of an Ivory Coast writer. Preliminary results were presented at a conference in the Spring of 1993.
- An Education School professor has scanned in sections from dozens of children's textbooks, to examine them as an on-going study of how children are taught language.
- A Religious Studies professor has worked with the InteLex texts of David Hume's writings while researching a book; the InteLex texts have also been used in a philosophy course.
- The English Poetry Database, although incomplete, has already brought into the classroom texts not available in print in the library.
- The Center has worked with an English professor to introduce students to the possibilities of hypertext as a tool for presenting and encountering literary texts.
As this service develops and matures, we are seeing electronic texts and related technologies become an increasingly valuable and valued pedagogic and scholarly resource. Scholars quickly understand that electronic documents have several obvious benefits: they can be searched quickly for phrases, words, and combinations of words, allowing one to try out notions and hypotheses with great speed; they encourage large-scale searches over oeuvres, genres, and centuries, searches which are difficult and time-consuming with printed texts alone; they can provide access to texts otherwise unavailable, and they allow such work to be done from one's home or office.
As a gatekeeper to this new realm, I play variously the role of guide, collaborator, cheerleader, and teacher. The lessons of the past 18 months have been clear: that a body of etexts delivered to the user on-line, through a common interface, spurs use in a way that a collection of texts on CD-ROMS in a library
cannot hope to do; that the integration of this new service into the fabric of the library enhances its ability to establish itself quickly; that choosing, handling, and presentation of etexts is a textual as much as a technical endeavor and needs to be done by people with textual and bibliographic skills; and that the users will come and will find increasing use for the various etext services, but they need the ongoing support of gatekeepers who can identify the means of entrance to this garden of delights, demonstrate the enduring value of the contents therein, and facilitate the growth of new uses.