[Electronic Text Center] [Introduction to TEI]

Guidelines for SGML Text Mark-up at the Electronic Text Center

David Seaman, Carolyn Fay and Lisa Spiro, Electronic Text Center, University of Virginia

Transcriptional Work

The Electronic Text Center collection includes a growing number of manuscripts--letters, diaries and other documents--most of which belong to the University of Virginia Library's Special Collections. These texts are processed either by Etext staff or by participants of the Rare Book School at UVA.

Our goal is to provide electronic manuscripts that are not only attractive and easy to read, but also accurate and useful to scholars, teachers and students, whose use of the documents may differ significantly. For example, American Civil War scholars may consult our collection of Civil War letters for uses of specific words or to find particular rhetorical strategies or styles; whereas high school students may read the same letters for thematic projects on religion or family in the Civil War. To allow a variety of users to view and search the texts in different ways, we process both the images and text of an electronic manuscript using the following procedures:

See also Lisa Spiro and Carolyn Fay's Procedures for Transcribing and Tagging Manuscripts


So that users can experience the full flavor of the manuscript, our electronic editions include color digital images of the manuscript pages, scanned by the UVa Library's Special Collections, usually in 24-bit color at 400 dpi. These images appear in the electronic text as in-line gifs that are linked to larger jpeg versions. When possible, images of the entire leaf, both verso and recto, are included, as well as images of the individual pages in order. In addition, we often offer a range of versions: "small," "medium" and "large" quality. The small images load most quickly, while the large images may be easier to read but take longer to load.

Example: The John and James Booker Civil War Letters


Our SGML-encoded electronic manuscripts use tags which allow readers not only to search for specific key fields (dates, names, places), but also to view different versions of the same text. Using the <orig> tag, we can record both period and modernized spelling, capitalization, and punctuation. A modern and original version can then be generated "on-the-fly" from the same SGML transcription, accomodating users who desire to read the documents exactly as written as well as users who prefer a modernized text.

Core TEI Tags for Transcription

Structural Elements:

Marking Data in the Text

Recording Corrections, Regularizations, Abbreviations, Omissions, Additions and Editorial Changes

Notes on Transcription

In order to insure that our transcriptions are as accurate as possible, the electronic text is checked several times against the best digital image we have of the manuscript. Spelling, grammar, lineation and hyphenation are recorded exactly as they appear in the manuscript. We also note the content and location of deletions and additions; one could also mark non-textual features of the manuscript including watermarks, stamps, type of paper, etc.

How to Handle Difficult Words and Passages

Notes on Annotations and Research

Annotations to the electronic manuscript are made using the <note> tag, which takes the attributes "target" and "id." References to people, places and events may be annotated, as well as any physical features of the manuscript that would not be otherwise apparent in the electronic text.

Example: Angelica Schuyler Church Papers. Letter to Angelica Schuyler Church from Alexander Hamilton, November 8, 1789. UVa Special Collections MSS 11245.

The Baron little Phillip<note target="n4">4</note>
and myself, with her consent, walked down<lb />
to the Battery, where with aching hearts and anxious eyes we<lb />
saw your vessel,

<note id="n4">[4] Philip Hamilton (1782-1801) was the eldest son of Alexander and Elizabeth Hamilton.</note>
Annotations not only situate the manuscript in context, but are also useful in clearing up transcription problems. As noted above, researching the context of an unclear passage in a manuscript can often help one determine the content of the passage.
Example: John and James Booker Collection. Letter to Chloe Unity Blair from John Booker, December 22, 1863, page 3. UVa Special Collections: MSS 11237.

John Booker letter excerpt of Memory Inman.
Upon initial reading, the above passage was difficult to transcribe. Our first attempt yielded:

I exspect thare will be a <lb />
weding near you in the christmas Memory <lb />
I <unclear>must</unclear> start home in the morning on furlow<lb />
The proximity of "christmas" and "Memory" and the lack of any punctuation between them led us to believe that the two words went together. However, it was difficult to make sense of the following sentence and what we rationalized as the verb "must" looked more like "man." By doing a little research, we were able to clear up these mysteries. First of all, John Booker's military service records did not indicate that he received furlough in December of 1863. Then, in consulting the regimental roster for the 38th Virginia Infantry, we discovered a soldier named Memory Inman had enlisted in the 38th, Company D along with John and James Booker. The passage should thus be tagged:

I <orig reg="expect">exspect</orig>
<orig reg="there">thare</orig> will be a <lb />
<orig reg="wedding">weding</orig> near you in the
<orig reg="Christmas.">christmas</orig>
<name type="person">Memory <lb />
Inman</name> starts home in the morning on
<orig reg="furlough">furlow</orig><lb />


Sample Letter: James Booker to Chloe Unity Blair, October 8, 1861

To see how one might use the core transcriptional tags to mark up a manuscript, consider the following example, taken from the Booker Collection. To show each stage of the transcription and tagging process, we provide page images of the letter, a faithful transcription of the manuscript in which line breaks are preserved, the complete tagging for the letter, and commentary on our tagging decisions.

Page Images

Page 1 Page 2


Manassas junction
Oct. 8th 1861

Dear Cousin

I write afew lines this
morning to inform you that I am well
at this time and hopeing that it
may find you all injoying the same
blesing, the health of our company
is better at this time than it has
bin for some time,

I have no news of intrust to write
to you, it is thought that we
will have a battle in a few days, its
reported that thay was fighting
yesterday at fawls Church I dont [ know] weth
er it was so or not, one of the Dan
ville Grays was upto see us last night
he said the yankees was in four
miles of them thay are stationed at
Farfax Court House six miles a head of
us, it is thought that we will
have a verry hard battle when it
does come off, I received a letter from
Addie [add note 1] last eavning it [ [unclear: ] ] afforded me
great pleasure to hear that he was
improveing so fast,

I will ad no more at [unclear: present] so good bye

[ Page 2]

write soon to your affectionate Cousin

James Booker

To Miss C. U. Blair


[1] "Addie" probably refers to Drury Addison Blair (1839-1864), the
Bookers' cousin. Blair joined Company D when it was formed in May of
1861, but was discharged due to chronic bronchitis in August of 1861
(Gregory 81). See James Booker's letter of July 14, 1861, in which "A.
Blair" includes a postscript to Chloe Unity Blair.

Tagged Version of the Letter

[TEI Header goes here]

<text id="Boo1j08">
<div1 type="letter" n="1861-10-08">
<pb n="1" />

<name type="place">Manassas
<orig reg="Junction">junction</orig>
<date n="1861-10-08">
<abbr expan="October">Oct.</abbr>
8<hi rend="superscript">th</hi> 1861
<salute>Dear Cousin</salute>

<p>I write
<orig reg="a few">afew</orig>
lines this <lb />
morning to inform you that I am well <lb />
at this time and <orig reg="hoping">hopeing</orig> that it <lb />
may find you all <orig reg="enjoying">injoying</orig> the same <lb />
<orig reg="blessing. The">blesing, the</orig> health of our company<lb />
is better at this time than it has <lb />
<orig reg="been">bin</orig> for some <orig reg="time.">time,</orig> </p>

<p>I have no news of <orig reg="interest">intrust</orig> to write <lb />
to <orig reg ="you. It">you, it</orig> is thought that we <lb />
will have a battle in a few <orig reg="days. It's">days, its</orig><lb />
reported that <orig reg="there">thay</orig> was fighting <lb />
yesterday at <name type="place"><orig reg="Falls Church.">fawls
Church</orig></name> I <orig reg="don't">dont</orig>
<add n="editor">know</add>
<orig reg="whether">weth <lb />
er</orig> it was so or <orig reg="not. One">not, one</orig>
of the <orig reg="Danville">Dan <lb />
ville</orig> Grays was <orig reg="up to">upto</orig> see us last
<orig reg="night.">night</orig> <lb />
<orig reg="He">he</orig> said the yankees was in four <lb />
miles of <orig reg="them.">them</orig>
<orig reg="They">thay</orig> are stationed at <lb />
<name type="place"><orig reg="Fairfax">Farfax</orig> Court House</name>
six miles <orig reg="ahead">a head</orig> of <lb />
<orig reg="us. It">us, it</orig> is thought that we will <lb />
have a <orig reg="very">verry</orig> hard battle when it <lb />
does come <orig reg="off.">off,</orig> I received a letter from <lb />
<name type="person">Addie</name><note target="n1">[1]</note>
last <orig reg="evening. It">eavning it</orig>
afforded me <lb />
great pleasure to hear that he was <lb />
<orig reg="improving">improveing</orig> so <orig
<salute>I will <orig reg="add">ad</orig> no more at
<unclear reason="under folded page edge">present</unclear>
so <orig reg="goodbye.">good bye</orig>

<pb n="2" />
<orig reg="Write">write</orig> soon to your affectionate
<signed><name type="person">James Booker</name></signed>
<seg type="recipient">To Miss C. U. Blair</seg>

<div1 type="notes">
<note id="n1">[1] "Addie" probably refers to Drury Addison Blair (1839-1864),
the Bookers' cousin. Blair joined Company D when it was formed in May of
1861, but was discharged due to chronic bronchitis in August of 1861 (Gregory 81).
See James Booker's letter of July 14, 1861, in which "A. Blair" includes a postscript to Chloe Unity Blair. </note>


In any tagging project, one makes particular choices based on the the goals of the project, the problems posed by the documents to be tagged, in-house procedures, and the standards of TEI Lite. With the Booker Collection, we were presented with letters written by men who had rather rudimentary writing skills; for instance, many words are misspelled, and the Bookers use commas rather than periods to separate sentences. For some users, such errors might impede comprehension, so we wanted to present a more readable version while also preserving the original features of the document. By using the <orig> tag and designing a special SGML to HTML filter, we were able to make two versions of the letter accessible: the original version, a transcription that retains all of the period spelling, capitalization, and punctuation, and the modernized version, a transcription with modernized spelling, capitalization, and punctuation. In adding <orig> tags, we were fairly light handed; we included the standard spelling of words, and standardized punctuation by replacing the sentence-ending commas with periods. In addition to regularizing spelling and punctuation, we tagged places and names with the <name> tag, and we added an informational note about a soldier mentioned in the letter. Although our tagging was quite comprehensive, we could have tagged even more information if we had decided it was important to our project; for instance, we might have marked the "Danville Grays," a company in the Confederate Army, with a tag such as <name type="company">, and we might have standardized capitalization by tagging "yankee" with <orig reg="Yankee">.