Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

A Practical Introduction to the Tag Set

Guidelines for SGML Text Mark-up at the Electronic Text Center
David Seaman, Electronic Text Center, University of Virginia
[ornament]

Large Scale Divisions and Small Scale Elements

The TEI guidelines draw a distinction between two broadly defined classes of structure: the major "structural division" markers (such as a chapter) and smaller "elements" (such as paragraphs, italicized phrases, verse lines, or individual speeches in a play).



The Major Structural Divisions

This category includes the major partitions of the text beginning with descriptions of the largest partitions: the "TEI" document, the "TEI" header, the "text", the "body" of the text, "front" and "back" matter; and ending with descriptions of partitions within the textual body itself: volumes, chapters, sections, acts, etc.

  • The TEI Tag and Its Two Divisions

    All of the texts we prepare share the same basic set of large-scale divisions. Each text is bound in its entirety first by a pair of tags -- <TEI.2> </TEI.2> -- that mark it as conforming to the Text Encoding Initiative rules. The <TEI.2> tag pair, which must be given an "id" identical to the "text id," encloses two major sections: a <teiHeader> and a <text>. The <teiHeader> records information about the print source, about the creator of the electronic version, about changes we have made, and so on (The <teiHeader> is actually generated by a Header Template; it is not manually encoded):
    <TEI.2 id="AusEmma">
    <teiHeader>
    [Source and processing information goes here]
    </teiHeader>
    <text id="AusEmma">
    [All of the material that is part of Emma goes here]
    </text>
    </TEI.2>



  • The Text Tag and Its Major Divisions

    • The Body
      Within the <text> boundaries, the work is divided into its major sections. Every text has a <body>, in which the main part of the text is found. Amongst other things, this arrangement allows one to search for items only in the <body> of the text, filtering out the text in the <teiHeader>.
      <TEI.2 id="AusEmma">
      <teiHeader>
      [Source and processing information goes here]
      </teiHeader>
      <text id="AusEmma">
      <body>
      [text goes here]
      </body>
      </text>
      </TEI.2>



    • Front and Back Matter
      In addition to <teiHeader> and <body>, some texts may also have two other main sections: Front Matter and Back Matter. The former typically encloses prefatory matter such as an introduction or table of contents; the latter typically marks off appendices or indeces.
      <TEI.2>
      <teiHeader>
      [Source and processing information goes here]
      </teiHeader>
      <text id="AusEmma">
      <front>
      [preface, etc. goes here]
      </front>
      <body>
      [main body of the text goes here]
      </body>
      <back>
      [appendices, etc goes here]
      </back>
      </text>
      </TEI.2>



    • Textual Divisions within the Body
      In the body of the text, we number our major divisions consecutively, based on their hierarchical relationship to each other within a work. In our usage, the largest structural division is tagged <div1>, followed by <div2>, <div3>. As an example, a chapter in a novel is more often than not going to be the initial structural division, marked <div1>:
      <TEI.2 id="AusEmma">
      <teiHeader>
      [Source and processing information goes here]
      </teiHeader>

      <text id="AusEmma">
      <body>

      <div1 type="chapter" n="1">
      <head>Chapter 1</head>
      [text of Chapter 1 goes here]
      </div1>

      <div1 type="chapter" n="2">
      <head>Chapter 2</head>
      [text of Chapter 2 goes here]
      </div1>

      <div1 type="chapter" n="3">
      <head>Chapter 3</head>
      [text of Chapter 3 goes here]
      </div1>

      <div1 type="chapter" n="4">
      <head>Chapter 4</head>
      [text of Chapter 4 goes here]
      </div1>

      </body>
      </text>
      </TEI.2>

      A chapter is not inherently numbered as <div1> in every work: if the work is a multi-volume novel, then the volume, and not the chapter, is the largest internal structural division -- the volume becomes the <div1> for those texts, and the chapter becomes <div2>:
      <TEI.2 id="xxxxxxx">
      <teiHeader>
      [Source and processing information goes here]
      </teiHeader>
      <text id="xxxxxxx">
      <body>

      <div1 type="volume" n="1"> [Volume 1 starts here, including chapters]
      <div2 type="chapter" n="1.1"> [Chapter 1.1 goes here] </div2>
      <div2 type="chapter" n="1.2"> [Chapter 1.2 goes here] </div2>
      <div2 type="chapter" n="1.3"> [Chapter 1.3 goes here] </div2>
      </div1>

      <div1 type="volume" n="2"> [Volume 2 starts here, including chapters]
      <div2 type="chapter" n="2.1"> [Chapter 2.1 goes here] </div2>
      <div2 type="chapter" n="2.2"> [Chapter 2.2 goes here] </div2>
      <div2 type="chapter" n="2.3"> [Chapter 2.3 goes here] </div2>
      </div1>

      </body>
      </text>
      </TEI.2>

Attributes

Tags can be further expanded and defined through the use of "attributes", which are descriptive components within the opening tag.

  • A List of Commonly Used Attributes

    • "type": defines the "type" of structure you are talking about.
      Example: <div1 type="chapter">


    • "n": defines the "number" of a particular structure or group and helps to further individualize an "id" or "type." The "n" can be alphabetic or numerical data -- n="1", n="1a", or n="VIII" are all valid.
      Example: <div1 type="chapter" n="1">


    • "id": names and "identifies" a specific reference you wish your tag to point to. The actual "id" must begin with a letter, not a numeral; it can only contain letters, numbers, the period, and the hyphen. The id must be unique to that element. Often used with <note>s to link it with a <note target=> value.
      Example: <note id="n1" n="1">


    • "rend": Indicates how the element in question is to be rendered typgraphically.
      Example: <epigraph rend="bold">


    • "lang": Indicates the language of the material within the tag, using the language abbreviation code from ISO 639.
      Example: <foreign lang="fre">Zut Alors!</foreign>

      Formally, the lang attribute is an IDREF; a reference to the id value of a <language> element in the TEI header. It is a requirement of the TEI scheme that the lang attribute point to a <language> element. This means that each language used in the document should be declared in the TEI header using the <language> element. In the Header Template there is an "AdBlock" button to bring up further language fields that will then, once filled out, be inserted into the header. Once in the header, the field will look like the following:

      <langUsage>
      <language id="fre">French</language>
      </langUsage>

General Guidelines for Attribute Usage

  • Like divisional markers, attributes must be named in a hierarchy of order. The basic rule is that broader and more global information be defined before those attributes that further qualify or constrict that information:

    • "type": If you are in a situation that requires you to define the "type" of structure you are using, the "type" attribute must always be declared first. The logic behind this follows some basic and yet existential thinking such that a thing cannot really be given an age or individual identity without our first knowing what kind of thing we're talking about.
      Example: <div1 type="volume">


    • "n": Sometimes a "n" (number) attribute can be used by itself. For instance in the case of pagebreaks:
      Example: <pb n="456">
      However, whenever multiple attributes are being used to define a tag, the "n", since it is more specific in its identification parameters, will be last:
      Example: <div1 type="volume" n="2">


    • "id": If you are in a situation that requires you to uniquely identify a tag that will be used to reference another specific location in one or more texts, the "id" attribute must always be declared first.
      Example: <note id="n5" n="5">


    • "target": follows the same rules and dictates as the "id" attribute descriptor. In fact, "target" and "id" are often used in conjunction with one another as in the case of footnotes where the <note target="n5" n="5"> points from a specific place in the text to the <note id="n5" n="5"> which contains the actual information of the footnote itself.


    • "entity": The "entity" attribute descriptor is simply the way figures declare their identification (i.e. instead of "id")
      Example: <figure entity="TwaFifrn">


    • "rend": While rend is mostly used by itself to describe typographical rendering, there are some cases where it can be listed with other attribute descriptors. And in those cases, "rend" will be listed last in hierarchic order.
      Example: <figure entity="TwaFifrn" rend="inline">
      Otherwise, rend will usually be used by itself.
      Example: <epigraph rend="italics">


  • As we use them, ALL attributes must be enclosed by double-quotation marks. For example: <div1 type="chapter" n="1"> cannot be rendered as <div1 type='chapter' n='1'> (Our current search and display software, OpenText, does not allow single quotation marks around attribute values).


Common Attribute Definitions for Major Structural Divisions

The following are typical "type" attributes for major divisional structures in prose and poetry:

  • <div1 type="volume"> a division for a multi-volume work.

  • <div1 type="book"> a division of a text into books, commonly used in epics and the Bible.

  • <div1 type="part"> a division of a text into groups of smaller divisions, commonly used in non-fiction works.

  • <div1 type="chapter"> the basic structural division of a novel or novella.

  • <div1 type="canto"> a division of a book of poetry set up in multiple "chapters" such as Dante's Divine Commedy.


  • <div1 type="poem"> a division of a work of poetry.


  • <div1 type="act"> a division of a multi-act drama.


  • <div1 type="drama"> a division of a single-act play.


  • <div1 type="letter"> a division for a work such as an epistolary novel or a literal body of letters.



Tagging of Smaller-Scale Elements

Unlike the tags that mark the larger structural hierarchy, this second class of tags identify individual aspects of a text. They are not part of a numbered hierarchy. Common examples include tags to mark typographical elements, titles, paragraphs, or lines. Examples of element tags are listed in the sections below, and a far more extensive list can be found in the Text Encoding Initiative's Guidelines for Electronic Text Encoding and Interchange.

Examples of the use of these smaller-scale elements can be found throughout the other sections of this guide.


| Back | Next |