[Electronic Text Center]

Procedures for Transcribing and Tagging Manuscripts

Lisa Spiro and Carolyn Fay, Electronic Text Center, University of Virginia

[ornament]

[Note: This document is intended as a supplement to the Electronic Text Center's helpsheet on Transcriptional Work.]

Not only does the Electronic Text Center offer online access to thousands of books, poems, and short prose works, but also to hundreds of rare manuscripts. By placing these documents online, the Electronic Text Center makes rich historical and literary documents readily available to a range of users.

Preparing manuscripts for the web requires special care, since they must be transcribed and marked up using special TEI tags for primary sources. For most manuscripts, we also include full-color digital images so that users may get a sense of the document as a physical object.

For examples of how the Electronic Text Center creates and presents manuscripts, see the following collections, several of which were created by participants in Rare Book School:

For both the Brooks and the Bitner projects, we have collaborated with the Valley of the Shadow project.

Goals

In preparing letters, diaries, and other manuscripts for the Electronic Text Center's collections, we aim to meet two related goals:

Work Flow

In transcribing and tagging a document, we go through a series of steps:

  1. Checking Transcriptions: When we receive a new manuscript project, typically the first step has already been completed--someone has already done the preliminary transcription. But often the transcriber has overlooked or mistranscribed some crucial words. To correct these errors, and to get to know the texts, we begin our project by comparing the transcription to the digital images of the original document--and to the document itself, if possible.


  2. Researching Confusing Passages: If we have questions about unclear words or phrases, we will do the research necessary to answer them, turning to resources such as the Oxford English Dictionary and Encyclopaedia Britannica.


  3. Tagging With the transcription completed, we are ready to do the basic tagging. First we mark up the overall structure of the document with divisional tags (e.g. <div1>); if we are tagging a letter, we also mark up the <opener> and <closer>. Next we tag paragraphs (<p>) and line breaks (<lb>), abbreviations (<abbr>), deletions (<del>), additions (<add>), and regularized spellings (<orig reg=>).


  4. Adding Informational Notes When a document is full of obscure references to people, places, and events, we often add short informational notes to aid in the reader's comprehension.


  5. Processing Images Typically Special Collections has created a digital image of each manuscript page. To process the images, we use editing tools such as ImageMagick and XV. We insert the images into the file using the <figure> tag, and we include descriptions of each image under the <figDesc> tag.


  6. Creating the Header We prepare the TEI header using the Electronic Text Center's web based form, recording such information as the names of the author and the recipient, the date, and so forth.


  7. Parsing the File After the header has been joined to the body of the text (through the "cat" command), we check over the file. If the text looks like it is in good shape, we check whether the tagging is correct by parsing. First we make sure that each tag opens and closes by running multidocs; then we check to see if the tags meet the guidelines of TEI Lite by using our "parse" program. If the file passes both of these "tests," it's ready to go online.


  8. Proofreading the File Once the text is put online, we proofread it carefully to make sure that the images load, that line breaks appear, and that the transcription is complete.


Some examples of tagging and transcription at work

In working on manuscript projects, we have come across several difficult cases that present both tagging and transcriptional challenges. Three of these cases follow.

The Case of the Mysterious Place Name

When we first confronted this scrawled place name (which is taken from John Booker's letter of December 22, 1863), we were utterly lost:

Image of Booker letter in which
the place name is difficult to discern

Since it was important to establish where Booker was writing from, we tried a variety of techniques to figure out what these words said. We traced them out on our own paper; we compared these characters to other characters in the letter; we called in others to consult with us; we looked at maps of North Carolina. Ultimately, two rather obvious clues enabled us to figure out the solution: first, G. Howard Gregory's 38th Virginia Infantry told us that Booker's company was encamped at Kinston, North Carolina during the winter of 1863-1864; second, James Booker's letter of January 1, 1864 was written from Kinston.

Once we determined what the correct spelling of the place name was, we were able to tag the dateline as follows:

<opener>
<dateline>
<name type="place"> Camp Near <orig reg="Kinston"> Kiston</orig>,
<abbr expan="North Carolina">N. C. </abbr></name> <lb>
<date n="1863-12-22"> <abbr expan="December">Dec. </abbr>
the 22<hi rend="superscript"> <orig reg="nd">th </orig> </hi> 1863</date>
</dateline>
<salute>Dear Cousin Unity</salute>
</opener>

We could have used the <sic> or the <corr> tags to mark or correct the misspelling of Kinston, but opted instead to use the <orig reg> tag and to make a note offering additional information about Kinston.

The modernized version of the dateline would appear as follows:

Camp Near Kinston North Carolina
December the 22nd 1863

Dear Cousin Unity

The Case of Remembering Memory
[taken from the helpsheet on "Transcriptional Work"]

As noted above, researching the context of an unclear passage in a manuscript can often help one determine the content of the passage.

Example: John and James Booker Collection. Letter to Chloe Unity Blair from John Booker, December 22, 1863, page 3. UVa Special Collections: MSS 11237.

John Booker letter excerpt of Memory Inman.
Upon initial reading, the above passage was difficult to transcribe. Our first attempt yielded:

I exspect thare will be a <lb>
weding near you in the christmas Memory <lb>
I <unclear>must</unclear> start home in the morning on furlow<lb>
The proximity of "christmas" and "Memory" and the lack of any punctuation between them led us to believe that the two words went together. However, it was difficult to make sense of the following sentence and what we rationalized as the verb "must" looked more like "man." A little research cleared the confusing words right up. First of all, John Booker's military service records did not indicate that he received furlough in December of 1863. Then, in consulting the regimental roster for the 38th Virginia Infantry, we discovered a soldier named Memory Inman had enlisted in the 38th, Company D along with John and James Booker. The passage should thus be tagged:

I <orig reg="expect">exspect</orig>
<orig reg="there">thare</orig> will be a <lb>
<orig reg="wedding">weding</orig> near you in the
<orig reg="Christmas.">christmas</orig>
<name type="person">Memory <lb>
Inman</name> starts home in the morning on
<orig reg="furlough">furlow</orig><lb>

When put through the TEI filter, the passage will appear as follows:

Original version
I exspect there will be a
weding near you in the Christmas Memory
Inman starts home in the morning on furlou


Modernized version
I expect there will be a
wedding near you in the Christmas. Memory
Inman starts home in the morning on furlough

The Case of the Multiple Correspondents

With several of the letters that we've edited, more than one correspondent has written the text. See, for instance, James and John Booker's letter of August 3, 1862. Initially, we were not sure how to handle this phenomenon-- should we treat John Booker's additions as a postscript? as a separate textual division?

In editing this letter, we decided to make use of two numbered divisions and to include a note about the long post-script. The tagging is as follows:

<closer>
<signed>James Booker</signed><lb>
<seg type="recepient">to <name>Miss C. U. Blair</name></seg>
</closer>
</div1>
<div1 type="letter">
<pb n="3">
<opener>
<seg>
<figure entity="F62AU3P3">
<figDesc>Third page of manuscript Civil War letter from James and John Booker to their cousin Chloe Unity Blair, dated August 3, 1862.</figDesc> </figure>
</seg>
<dateline>
<date n="1862-08-03">
Sunday <orig reg="evening">eavning</orig>
August the 3 <hi rend="supralinear">1862</hi></date>
</dateline>
<salute>Dear Cousin</salute>
</opener>
<p>
I write you a few lines<lb>

Tagging Envelopes

Often letters will be accompanied by envelopes. Although no TEI standards for tagging envelopes exist, we have decided to include them in the front matter, on the assumption that envelopes are not part of the letter proper but are what a reader probably first experienced. We mark relevant information such as <name> and <date>, and we include digital images of the envelope so that users can see such features as postmarks, sealing wax, and so forth. Consider the following example, taken from the Liberia Letters: William Douglass to Dr. James H. Minor, 1857 February 5.

<front>
<pb>
<div2 type="envelope">
<head>Envelope</head>
<p> <figure entity="page image name goes here"></figure>
29 24<lb>
<name type="person">Dr. James Minor</name><lb>
<name type="place">Cobham Depot Albemarle <lb> <abbr expan="County">Co.</abbr><lb>
Virginia <abbr expan="United States">States</abbr></name><lb>
Via <name type="place">England</name>
</p>
</div2>
</front>

This tagging would produce the following text in the modernized version:

29 24
Dr. James Minor
Cobham Depot Albemarle
County
Via England
Virginia States