III. |
Tagging Your Text in TEI
TEI stands for Text Encoding Initiative, the guidelines of
which define the specific rules of XML-encoding used at
Etext. All TEI tags used in encoding texts are defined in
the Document Type Declaration, or .dtd file. For more
information on the major TEI text file divisions and other
TEI info, go to the Etext Homepage, click on Standards and under
TEI on the page that appears, click on The Electronic Text Center Introduction to TEI and Guide to Document Preparation.
A. |
Basic Steps in Preparing the Text File
- Getting a physical copy of the text.
- Determining the text ID for the book being prepared.
- Determining basic divisions of text.
- Using search & replace commands and macros to enter tags.
- Inserting major division tags.
- Inserting tags for images.
- Inserting tags for front & back matter.
Where the phrases text file or text file name are used here,
this means the text(s) located in your working directory,
which often have a .txt appendix, for example babsubdeb.txt.
babsubdeb.txt was the text file name for the file containing
the uncoded text of the book, Bab: A Sub-Deb, by Mary Roberts
Rinehart. The text file name is different from the unique
text ID that you will give to the text, which identifies the
text that has been fully prepared (with TEI encoding and
header, images, etc.) The unique text ID is always made from
part of the author's name, and part of a distinctive word
from the title of the book, in this case RinSubd.
- Getting a physical copy of the text
- Open the text file in the jove editor, in your working
directory. To do this, type jove filename (i.e. jove
babsubdeb.txt).
- Get the basic information about the specific text from which
the text was scanned: author, date of publication, publisher, etc.
- Launch an internet browser such as Netscape or Internet
Explorer. The default home page is the Electronic Text Center
Home Page, and in the upper left-hand corner is a link to UVA
Library Home Page, and Virgo catalogue.
- Search Virgo for your text. If the library does not have it,
click on Reference Sources (in the left-hand column) and then
Library Catalogs & Other Book-Finders in the next window. Select
WorldCat (VIRGO).
- Using WorldCat, find the text that most closely matches the
information from your text file, and request the item from ILL,
using the Request Item button. Print the information for your
records.
- Determining the Text ID for the book being prepared
The text file name (with the .txt appendix) is different
from the unique text ID you will give to the text you
are preparing. While the text file name follows no
particular convention, the unique text ID follows an
Etext convention comprised of 2 parts:
- The first three (3) letters of the author's last name, and
- The first four (4) letters of a distinctive word from the book's title, and
- The first letter of the author's name and the word from the book title are capitalized
Some examples are:
Mark Twain, Huck Finn
Jane Austen, Emma
|
TwaFinn
AusEmma
|
NB—for multi-volume works, the convention is to include
the volume number as the first character of the second
part:
Henry S. Williams, A History of Science,
Volume II
|
Wil2Sci
|
NB—before assigning a unique text ID to your text, be
sure it does not duplicate the text ID of another text
already in our collection. To do this, from the Etext
Home Page, click on Collections, English, On-Line
Holdings, The Modern English Collection and then on the
first letter of your author's last name. When you have
found the listing of the author's works, first make sure
we do not already have the text you have been assigned.
Next, click on any title which might have a similar text
ID. For example, imagine you have been given a text file
of Fyodor Dostoyevsky’s Notes From A Dead House to mark
up. You click on "D" on the Modern English Collection
page, and scroll down to Dostoyevsky. You see we have
two books by him in our collection, including Notes From
The Underground.
Click on the link for Notes From The Underground. This
will take you to the page showing the major divisions of
the text. Next, look at the URL address in the browser:
http://etext.lib.virginia.edu/toc/modeng/public/DosNote.html
The section of the URL reading DosNote indicates the unique text ID assigned to this work, and you will know not to use that as the text ID for Notes From A Dead House. When you open your text file, the text ID tag should be the first item; the tag is: <text id="XxxYyyy">
- Determining basic divisions of the text
<TEI.2 id="Wil2Sci">
<teiHeader>
</teiHeader>
<text id="Wil2Sci">
<body>
<div1 type="chapter" n="1"> <head>Chapter 1</head>
</div1>
<div1 type="chapter" n="2"> <head>Chapter 2</head>
<div2 type="section"> <head>ASTRONOMICAL SCIENCE</head>
</div2>
<div2 type="section"> <head>FOOTNOTES</head>
</div2>
</div1>
<div1 type="chapter" n="3"> <head>Chapter 3</head>
</div1>
</body>
</text>
</TEI.2>
For more information, go to The Electronic Text Center Introduction
to TEI and Guide to Document Preparation web page, and under A
Practical Introduction to the TEI Tag Set, click on The Major Structural Divisions.
The diagram above is based on the one there.
A few rules of thumb about divs:
- Although you should understand what the basic divisions of
your text will be at this point, you will probably want to use search & replace and
macros before actually
tagging them.
- All tags, except "empty tags" such as line break <lb/> or page break <pb/>, take a
correlative closing tag: for example, the open tag <div1> requires a closing tag </div1>.
- Tags are hierarchical; so within the <body></body> tags, the <div1 type="chapter"> must close </div1> before the body tag can be closed </body> — i.e.
<body>
<div1 type=”chapter”>
</div1>
</body>
is correct;
<body>
<div1 type=”chapter”>
</body>
</div1>
is not.
- All <div> tags take a <head></head>—even if there is no text to be inserted in the <head></head>, these tags must be entered.
- All opening <div> tags require that the type be specified, such as <div1 type="chapter"></div1>
- The lowest level of division is <div1> (used to be <div0>)
- If you use a <div2> somewhere within <div1> tags, everything within the <div1> tags must also be enclosed within <div2> tags. For example, if you have a long poem cited within a <div1> chapter, and want to give the poem its own <div2>, you must designate all the preceding and following text within that <div1> chapter as <div2 type="section">.
- Using search & replace commands and macros to enter TEI tags
Once you are working on a book-length text, start your
tagging with the search & replace command and macros.
These are especially helpful in putting in the paragraph
<p></p> commands that enclose every paragraph, and in removing "unambiguous end-of-line hyphens." See "Useful Commands in the Jove Editor" below (STEP III-B).
NB—If at any point you replace something you shouldn't
have, immediately close your document without saving, re-
open and try again.
- Inserting major division tags
see STEP III A #3, above.
- Inserting special tags for images
Images are identified by tags within the text. Insert
the image names in the text file before scanning the
images, and MAKE A LIST OF THE IMAGE NAMES, since you
will need to have them at two subsequent points. The
naming convention for images is based on the unique text
ID. It should be an 8-character ID (instead of 7, as
for text IDs), consisting of:
- the first 3 letters of the author's name (the same as in the unique text ID)
- plus the first 2-4 letters of a distinctive word from the title (as many as possible from the four used in the unique text ID), and
- either the page number where the image occurs in the text, or
- a special convention name for images of the cover, spine, and title page of the book.
If we use the text ID for Dostoyevsky, Notes from the Underground, DosNote, we have:
Description
|
Notation
|
| cover (where there is no back cover) |
DosNocov |
| front cover (if back cover exists) |
DosNofco |
| back cover |
DosNobco |
| image of book spine |
DosNospi |
| image of title page |
DosNotit |
| image of frontispiece |
DosNofro |
| image found on p. 113 |
DosNo113 |
After determining the names for your images, write them on the back of your Virgo record. Basic tagging for images in the body is:
<p>
<figure entity="DosNo113">
<figDesc> </figDesc>
</figure>
</p>
- All images must include the figure description <figDesc> tag. This is for users browsing in text-only mode. You should include a brief description of the image in these tags.
- All images must be "declared" near the beginning of the text—but do this only after basic tagging (STEP VII).
- All images require that they be bracketed by <p></p> tags.
- Images for front matter (cover, spine, frontis etc.) have special tag-sets including some that do not take the <p> tag; refer to the Etext Standards page, and use as templates.
- Images (almost) always must be bracketed by <p> tags, but they do not necessarily have to be immediately proximate to the <figure> tag. So you can also encode <figure> tags:
<p> "Well, you don't, as a matter of fact.
Suppose you take my word for that, and I agree
to believe what you say about the wrong
apartment, Even then it's rather
<figure entity="RinSub68">
<head>"NOW," HE SAID, "I WISH YOU WOULD
TELL ME SOME GOOD REASON WHY I SHOULD NOT
HAND YOU OVER TO THE POLICE."</head>
<figDesc>An imposing man leans over a
frightened girl, sitting at a
desk.</figDesc>
</figure>
<pb n="69"/>
unusual. I find a pale and determined looking
young lady going through my desk in a business-
like manner. She says she has come for a
Letter. Now the question is, is there a
Letter? If so, what Letter?" </p>
Use this tagging when an image comes in the middle of a paragraph.
- Inserting special tags for front & back matter
For templates/examples on tagging, go to The Electronic Text Center Introduction to TEI pages on Front Matter and Back Matter.
|
B. |
Useful Commands in the Jove Editor
|
COMMAND
|
PURPOSE
|
|
CTRL x CTRL s
|
Save changes to a document
|
|
CTRL x CTRL c
|
Leave the jove editor & return to UNIX (without saving)
|
|
ESC SHIFT <
|
Move to the beginning of the document
|
|
ESC SHIFT >
|
Move to the end of the document
|
|
ESC v
|
Move cursor a text screen up
|
|
CTRL v
|
Move cursor a text screen down
|
|
CTRL e
|
Move to the end of a line
|
|
CTRL a
|
Move to the beginning of a line
|
|
ESC f
|
Moves cursor forward one word-group
|
|
ESC b
|
Moves cursor backward one word-group
|
|
CTRL s characters
|
Search down for characters
|
|
CTRL s characters$
|
Search for characters occurring at the end of a line. Useful to remove unambiguous hyphens at the end of lines: CTRL s -$
|
|
CTRL s ^characters
|
Search for characters at the beginning of a line
|
|
^$
|
Indicates an empty line. You can search for empty lines and replace them with </p><p> or other tags, for example
|
|
ESC r characters
|
Reverse search for characters
|
|
CTRL k
|
"Kill" — deletes text from the cursor point until the end of the line
|
|
CTRL y
|
"Yank" — re-inserts text cut from consecutive commands
|
|
CTRL g
|
Aborts command (before ENTER)
|
|
ESC CTRL e TextToReplace ENTER TextToInsert ENTER
|
Universal search and replace (will not prompt to confirm at each occurrence) — use to automate tagging
|
|
ESC q TextToReplace ENTER TextToInsert ENTER
|
Search and replace — at each occurrence, will ask for confirmation; type Y or N
|
|
CTRL SHIFT @
|
Marks the point at which you begin marking text — use CTRL v or another method of moving through the text to arrive at the point where you wish to stop cut — then CTRL w will "wipe," or cut the defined section out
|
|
CTRL y
|
Inserts "wiped" or "killed" text back into the file where the cursor is located
|
|
CTRL x CTRL i textfile
|
Inserts textfile into the current file where the cursor is located
|
|
C. |
Common TEI Tags
| TEI TAG
|
PURPOSE
|
| <p></p>
|
Paragraph; the default setting includes an initial indent.
|
| <hi></hi>
|
Italics is default for this command, but it may be changed by changing the attribute, i.e.:
|
| <hi rend=”bold”></hi>
|
Changes command from default (italics) to bold.
|
| <pb/>
|
Page break. This is an empty tag (i.e. there is no “closing” tag necessary).
|
| <pb n=”3”/>
|
The <pb/> tag can also be rendered to show page numbers.
|
| <note target=”n1”>[1]</note>
|
Tags the end/footnote marker in the body of the text.
|
| <note id=”n1”>[1] TextOfNote</note>
|
Tags the end/footnote information that appears at the end of the chapter, book, etc.
|
<list>
<item></item>
</list>
|
Use to render lists.
|
<list type=”ordered”>
<item n=”1”></item>
</list>
|
Use to render numbered lists.
|
| <lg> </lg>
|
Line group; used for rendering poetry. <l></l> marks the lines within line groups.
|
| <lb/>
|
Line break. This is an empty tag and thus contains the final / .
|
| <q> </q>
|
Quote, for block quote citations. Use with caution.
|
| <foreign lang=”fre”> </foreign>
|
Indicates text is in French (or other language).
|
| “ger” “ita” “lat” etc.
|
For other language codes, go to the Standards page, and under Special Characters and Language Codes, click on ISO 639 Language Codes.
|
|
D. |
Common ISO Tags
In addition to the TEI tag set, there are other tag sets
regularly used in Etext documents to describe specific
characters. These are the ISO character tags, used for
specific characters not represented by the basic keyboard
set. As with TEI, there are two elements to the tag set: 1)
declarations, which occur at the top of a file, and 2) the
tags for specific characters, within the file itself. Just as
a file encoded in TEI has to be declared as such at the top
of the file, so too the use of ISO characters in a file must
be declared. For more on the declarations that go at the top
of the file, see Step VII: Concatenate / Final Tagging,
below. The discussion here is mainly limited to the tags used
within the file to describe specific characters.
For convenience, the TEI header includes the declarations
ISOlat1, ISOlat2, ISOnum, ISOpub and ISOtech – the ISO
character sets used most commonly in tagging files. NB—if
your text contains non-English languages, you will need to
declare them in the TEI Header (see below). For languages
rendered in non-Roman characters such as Greek, you will need
to make special ISO declarations after you concatenate the
TEI header onto your text file; see also VII. Concatenate /
Final Tagging, #10.
All ISO tags begin with & and are concluded with ; . For
general references on specific tags, go to the Standards
page, and click on ISO Special Characters.
TAG
|
REPRESENTATION
|
|
—
|
dash
|
‘
’
|
left single quotation mark
right single quotation mark
|
“
”
|
left double quotation mark
right double quotation mark
|
|
æ
|
æ ligature. Any ligature will follow this form.
|
é
è
|
é
è Similar tags work for other accented vowels.
|
|
ü
|
ümlaut
|
|
E. |
Macros (in Jove)
A macro is a way of accomplishing several commands using only one
command in JOVE. For an advanced tutorial on macros, see Creating and Saving
Macros in JOVE by Johnnie Wilcox.
|
To define a macro (so it can be used more than once):
|
CTRL x
SHIFT (
CommandsForMacroToMimic
CTRL x
SHIFT )
|
CTRL x e
ESC number CTRL x e
|
Executes macro once
(or)
Executes macro number times.
|
So, let’s define a basic macro:
|
1. CTRL x
|
Starts macro (or other command)
|
|
2. SHIFT (
|
Command line reads, “Defining…”
|
|
3. CTRL s ^characters
|
^ indicates beginning of lines with characters. CTRL s searches for characters at the beginning of a line.
|
|
4. ENTER
|
This will move your cursor from the command line to the first instance of a line beginning with characters in the file.
|
|
5. CTRL k
|
Kills (deletes) all text from the cursor to the end of the line
|
|
6. CTRL x SHIFT )
|
Concludes macro definition.
|
|
7. CTRL x e
|
Executes macro once. It is usually a good idea to execute your macros individually several times, to make sure they don’t have unintended results, before executing them multiple times at once with ESC number CTRL x e .
|
|
|