Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

Quick Start Guide for New Etexters

[Electronic 
Text Center]

III.

Tagging Your Text in TEI

TEI stands for Text Encoding Initiative, the guidelines of which define the specific rules of XML-encoding used at Etext. All TEI tags used in encoding texts are defined in the Document Type Declaration, or .dtd file. For more information on the major TEI text file divisions and other TEI info, go to the Etext Homepage, click on Standards and under TEI on the page that appears, click on The Electronic Text Center Introduction to TEI and Guide to Document Preparation.

A.

Basic Steps in Preparing the Text File

  1. Getting a physical copy of the text.
  2. Determining the text ID for the book being prepared.
  3. Determining basic divisions of text.
  4. Using search & replace commands and macros to enter tags.
  5. Inserting major division tags.
  6. Inserting tags for images.
  7. Inserting tags for front & back matter.

Where the phrases text file or text file name are used here, this means the text(s) located in your working directory, which often have a .txt appendix, for example babsubdeb.txt. babsubdeb.txt was the text file name for the file containing the uncoded text of the book, Bab: A Sub-Deb, by Mary Roberts Rinehart. The text file name is different from the unique text ID that you will give to the text, which identifies the text that has been fully prepared (with TEI encoding and header, images, etc.) The unique text ID is always made from part of the author's name, and part of a distinctive word from the title of the book, in this case RinSubd.

  1. Getting a physical copy of the text

    1. Open the text file in the jove editor, in your working directory. To do this, type jove filename (i.e. jove babsubdeb.txt).
    2. Get the basic information about the specific text from which the text was scanned: author, date of publication, publisher, etc.
    3. Launch an internet browser such as Netscape or Internet Explorer. The default home page is the Electronic Text Center Home Page, and in the upper left-hand corner is a link to UVA Library Home Page, and Virgo catalogue.
    4. Search Virgo for your text. If the library does not have it, click on Reference Sources (in the left-hand column) and then Library Catalogs & Other Book-Finders in the next window. Select WorldCat (VIRGO).
    5. Using WorldCat, find the text that most closely matches the information from your text file, and request the item from ILL, using the Request Item button. Print the information for your records.

  2. Determining the Text ID for the book being prepared

    The text file name (with the .txt appendix) is different from the unique text ID you will give to the text you are preparing. While the text file name follows no particular convention, the unique text ID follows an Etext convention comprised of 2 parts:

    • The first three (3) letters of the author's last name, and
    • The first four (4) letters of a distinctive word from the book's title, and
    • The first letter of the author's name and the word from the book title are capitalized

    Some examples are:
    Mark Twain, Huck Finn
    Jane Austen, Emma
    TwaFinn
    AusEmma
    NB—for multi-volume works, the convention is to include the volume number as the first character of the second part:
    Henry S. Williams,
    A History of Science, Volume II
    Wil2Sci
    NB—before assigning a unique text ID to your text, be sure it does not duplicate the text ID of another text already in our collection. To do this, from the Etext Home Page, click on Collections, English, On-Line Holdings, The Modern English Collection and then on the first letter of your author's last name. When you have found the listing of the author's works, first make sure we do not already have the text you have been assigned. Next, click on any title which might have a similar text ID. For example, imagine you have been given a text file of Fyodor Dostoyevsky’s Notes From A Dead House to mark up. You click on "D" on the Modern English Collection page, and scroll down to Dostoyevsky. You see we have two books by him in our collection, including Notes From The Underground.

    Click on the link for Notes From The Underground. This will take you to the page showing the major divisions of the text. Next, look at the URL address in the browser: http://etext.lib.virginia.edu/toc/modeng/public/DosNote.html

    The section of the URL reading DosNote indicates the unique text ID assigned to this work, and you will know not to use that as the text ID for Notes From A Dead House. When you open your text file, the text ID tag should be the first item; the tag is: <text id="XxxYyyy">

  3. Determining basic divisions of the text

    <TEI.2 id="Wil2Sci">

          <teiHeader>
          </teiHeader>

          <text id="Wil2Sci">

               <body>

                    <div1 type="chapter" n="1"> <head>Chapter 1</head>
                    </div1>
                    <div1 type="chapter" n="2"> <head>Chapter 2</head>

                         <div2 type="section"> <head>ASTRONOMICAL SCIENCE</head>
                         </div2>

                         <div2 type="section"> <head>FOOTNOTES</head>
                         </div2>

                    </div1>

                    <div1 type="chapter" n="3"> <head>Chapter 3</head>
                    </div1>

               </body>

          </text>

    </TEI.2>


    For more information, go to The Electronic Text Center Introduction to TEI and Guide to Document Preparation web page, and under A Practical Introduction to the TEI Tag Set, click on The Major Structural Divisions. The diagram above is based on the one there.

    A few rules of thumb about divs:

    • Although you should understand what the basic divisions of your text will be at this point, you will probably want to use search & replace and macros before actually tagging them.

    • All tags, except "empty tags" such as line break <lb/> or page break <pb/>, take a correlative closing tag: for example, the open tag <div1> requires a closing tag </div1>.

    • Tags are hierarchical; so within the <body></body> tags, the <div1 type="chapter"> must close </div1> before the body tag can be closed </body> — i.e.

            <body>
                 <div1 type=”chapter”>
                 </div1>
            </body>


      is correct;

            <body>
                 <div1 type=”chapter”>
            </body>
                 </div1>


      is not.

    • All <div> tags take a <head></head>—even if there is no text to be inserted in the <head></head>, these tags must be entered.

    • All opening <div> tags require that the type be specified, such as <div1 type="chapter"></div1>

    • The lowest level of division is <div1> (used to be <div0>)

    • If you use a <div2> somewhere within <div1> tags, everything within the <div1> tags must also be enclosed within <div2> tags. For example, if you have a long poem cited within a <div1> chapter, and want to give the poem its own <div2>, you must designate all the preceding and following text within that <div1> chapter as <div2 type="section">.

  4. Using search & replace commands and macros to enter TEI tags

    Once you are working on a book-length text, start your tagging with the search & replace command and macros. These are especially helpful in putting in the paragraph <p></p> commands that enclose every paragraph, and in removing "unambiguous end-of-line hyphens." See "Useful Commands in the Jove Editor" below (STEP III-B).

    NB—If at any point you replace something you shouldn't have, immediately close your document without saving, re- open and try again.

  5. Inserting major division tags

    see STEP III A #3, above.

  6. Inserting special tags for images

    Images are identified by tags within the text. Insert the image names in the text file before scanning the images, and MAKE A LIST OF THE IMAGE NAMES, since you will need to have them at two subsequent points. The naming convention for images is based on the unique text ID. It should be an 8-character ID (instead of 7, as for text IDs), consisting of:

    • the first 3 letters of the author's name (the same as in the unique text ID)

    • plus the first 2-4 letters of a distinctive word from the title (as many as possible from the four used in the unique text ID), and

    • either the page number where the image occurs in the text, or

    • a special convention name for images of the cover, spine, and title page of the book.

    If we use the text ID for Dostoyevsky, Notes from the Underground, DosNote, we have:

    Description
    Notation
    cover (where there is no back cover) DosNocov
    front cover (if back cover exists) DosNofco
    back cover DosNobco
    image of book spine DosNospi
    image of title page DosNotit
    image of frontispiece DosNofro
    image found on p. 113 DosNo113
    After determining the names for your images, write them on the back of your Virgo record. Basic tagging for images in the body is:

         <p>
              <figure entity="DosNo113">
                   <figDesc> </figDesc>
              </figure>
         </p>


    • All images must include the figure description <figDesc> tag. This is for users browsing in text-only mode. You should include a brief description of the image in these tags.

    • All images must be "declared" near the beginning of the text—but do this only after basic tagging (STEP VII).

    • All images require that they be bracketed by <p></p> tags.

    • Images for front matter (cover, spine, frontis etc.) have special tag-sets including some that do not take the <p> tag; refer to the Etext Standards page, and use as templates.

    • Images (almost) always must be bracketed by <p> tags, but they do not necessarily have to be immediately proximate to the <figure> tag. So you can also encode <figure> tags:

            <p> "Well, you don't, as a matter of fact.
            Suppose you take my word for that, and I agree
            to believe what you say about the wrong
            apartment, Even then it's rather

                 <figure entity="RinSub68">
                 <head>"NOW," HE SAID, "I WISH YOU WOULD
                 TELL ME SOME GOOD REASON WHY I SHOULD NOT
                 HAND YOU OVER TO THE POLICE."</head>
                 <figDesc>An imposing man leans over a
                 frightened girl, sitting at a
                 desk.</figDesc>
                 </figure>

            <pb n="69"/>

            unusual. I find a pale and determined looking
            young lady going through my desk in a business-
            like manner. She says she has come for a
            Letter. Now the question is, is there a
            Letter? If so, what Letter?" </p>

      Use this tagging when an image comes in the middle of a paragraph.

  7. Inserting special tags for front & back matter

    For templates/examples on tagging, go to The Electronic Text Center Introduction to TEI pages on Front Matter and Back Matter.

B.

Useful Commands in the Jove Editor

COMMAND PURPOSE
CTRL x CTRL s Save changes to a document
CTRL x CTRL c Leave the jove editor & return to UNIX (without saving)
ESC SHIFT < Move to the beginning of the document
ESC SHIFT > Move to the end of the document
ESC v Move cursor a text screen up
CTRL v Move cursor a text screen down
CTRL e Move to the end of a line
CTRL a Move to the beginning of a line
ESC f Moves cursor forward one word-group
ESC b Moves cursor backward one word-group
CTRL s characters Search down for characters
CTRL s characters$ Search for characters occurring at the end of a line. Useful to remove unambiguous hyphens at the end of lines: CTRL s -$
CTRL s ^characters Search for characters at the beginning of a line
^$ Indicates an empty line. You can search for empty lines and replace them with </p><p> or other tags, for example
ESC r characters Reverse search for characters
CTRL k "Kill" — deletes text from the cursor point until the end of the line
CTRL y "Yank" — re-inserts text cut from consecutive commands
CTRL g Aborts command (before ENTER)
ESC CTRL e TextToReplace ENTER TextToInsert ENTER Universal search and replace (will not prompt to confirm at each occurrence) — use to automate tagging
ESC q TextToReplace ENTER TextToInsert ENTER Search and replace — at each occurrence, will ask for confirmation; type Y or N
CTRL SHIFT @ Marks the point at which you begin marking text — use CTRL v or another method of moving through the text to arrive at the point where you wish to stop cut — then CTRL w will "wipe," or cut the defined section out
CTRL y Inserts "wiped" or "killed" text back into the file where the cursor is located
CTRL x CTRL i textfile Inserts textfile into the current file where the cursor is located

C.

Common TEI Tags

TEI TAG PURPOSE
<p></p> Paragraph; the default setting includes an initial indent.
<hi></hi> Italics is default for this command, but it may be changed by changing the attribute, i.e.:
<hi rend=”bold”></hi> Changes command from default (italics) to bold.
<pb/> Page break. This is an empty tag (i.e. there is no “closing” tag necessary).
<pb n=”3”/> The <pb/> tag can also be rendered to show page numbers.
<note target=”n1”>[1]</note> Tags the end/footnote marker in the body of the text.
<note id=”n1”>[1] TextOfNote</note> Tags the end/footnote information that appears at the end of the chapter, book, etc.
<list>
<item></item>
</list>
Use to render lists.
<list type=”ordered”>
<item n=”1”></item>
</list>
Use to render numbered lists.
<lg> </lg> Line group; used for rendering poetry. <l></l> marks the lines within line groups.
<lb/> Line break. This is an empty tag and thus contains the final / .
<q> </q> Quote, for block quote citations. Use with caution.
<foreign lang=”fre”> </foreign> Indicates text is in French (or other language).
“ger” “ita” “lat” etc. For other language codes, go to the Standards page, and under Special Characters and Language Codes, click on ISO 639 Language Codes.

D.

Common ISO Tags

In addition to the TEI tag set, there are other tag sets regularly used in Etext documents to describe specific characters. These are the ISO character tags, used for specific characters not represented by the basic keyboard set. As with TEI, there are two elements to the tag set: 1) declarations, which occur at the top of a file, and 2) the tags for specific characters, within the file itself. Just as a file encoded in TEI has to be declared as such at the top of the file, so too the use of ISO characters in a file must be declared. For more on the declarations that go at the top of the file, see Step VII: Concatenate / Final Tagging, below. The discussion here is mainly limited to the tags used within the file to describe specific characters.

For convenience, the TEI header includes the declarations ISOlat1, ISOlat2, ISOnum, ISOpub and ISOtech – the ISO character sets used most commonly in tagging files. NB—if your text contains non-English languages, you will need to declare them in the TEI Header (see below). For languages rendered in non-Roman characters such as Greek, you will need to make special ISO declarations after you concatenate the TEI header onto your text file; see also VII. Concatenate / Final Tagging, #10.

All ISO tags begin with & and are concluded with ; . For general references on specific tags, go to the Standards page, and click on ISO Special Characters.

TAG
REPRESENTATION
&mdash; dash
&lsquo;
&rsquo;
left single quotation mark
right single quotation mark
&ldquo;
&rdquo;
left double quotation mark
right double quotation mark
&aelig; æ ligature. Any ligature will follow this form.
&eacute;
&egrave;
é
è Similar tags work for other accented vowels.
&uuml; ümlaut

E.

Macros (in Jove)

A macro is a way of accomplishing several commands using only one command in JOVE. For an advanced tutorial on macros, see Creating and Saving Macros in JOVE by Johnnie Wilcox.

To define a macro (so it can be used more than once): CTRL x
SHIFT (
CommandsForMacroToMimic
CTRL x
SHIFT )
CTRL x e

ESC number CTRL x e
Executes macro once
(or)
Executes macro number times.

So, let’s define a basic macro:

1. CTRL x Starts macro (or other command)
2. SHIFT ( Command line reads, “Defining…”
3. CTRL s ^characters ^ indicates beginning of lines with characters. CTRL s searches for characters at the beginning of a line.
4. ENTER This will move your cursor from the command line to the first instance of a line beginning with characters in the file.
5. CTRL k Kills (deletes) all text from the cursor to the end of the line
6. CTRL x SHIFT ) Concludes macro definition.
7. CTRL x e Executes macro once. It is usually a good idea to execute your macros individually several times, to make sure they don’t have unintended results, before executing them multiple times at once with ESC number CTRL x e .

| Back | Next |



By Andrew Rouner and Matthew Gibson
Revised Cindy Speer, 2004