Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

Quick Start Guide for New Etexters

[Electronic 
Text Center]

VII.

Concatenate / Final Tagging

Concatenation prefixes the header file you created to the text file you tagged, and assigns it a new name – the unique text ID. In this step, you will enter tags which you were asked to ignore earlier, because the TEI header provides templates for completing those tags.

  1. Return to your home directory (in UNIX/Exceed): simply type cd [ENTER] and you return to your default directory.

  2. Go to your working directory: type cd working. Type ls to list all your files. Check to make sure your header was copied successfully into your directory. The file will be the unique text ID plus the .xml appendix, i.e. RinSubd.xml.

  3. Concatenate the files. Type:

          cat header.xml filename > UniqueTextID

    For example:

          cat RinSubd.xml babsubdeb.txt > RinSubd

    NB — do not concatenate and write to a file with the same name as your header or text file!

  4. Open the new file in jove. It now begins with the TEI header.

  5. Go to the tag <TEI.2 id=””>

  6. Change the tag to read: <TEI.2 id=”UniqueTextID”>, for example, <TEI.2 id=”RinSubd”>.

  7. If you did not previously add the </TEI.2> tag to the end of your file, insert it now. This tag closes the <TEI.2 id=”XxxYyyy”> tag.

  8. Next, make your jpg declarations. To do this, go to the line:

          <!ENTITY filename SYSTEM “filename.jpg” NDATA jpg>

    This line is a template for your declarations. Use the CTRL k and CTRL y commands to cut and paste this line each time for as many images as you have.

  9. For each image you have, insert the name of the image in both instances where the template reads filename. So, for the text Mother by Kathleen Norris, NorMoth, the beginning of the file is changed to:

          <!DOCTYPE TEI.2 SYSTEM ‘teixlite.dtd’ [
          <!NOTATION jpg SYSTEM “JPEG”>
          <!ENTITY NorMocov SYSTEM “NorMocov.jpg” NDATA jpg>
          <!ENTITY NorMospi SYSTEM “NorMospi.jpg” NDATA jpg>
          <!ENTITY NorMotit SYSTEM “NorMotit.jpg” NDATA jpg>
          <!ENTITY NorMo001 SYSTEM “NorMo001.jpg” NDATA jpg>
          <!ENTITY NorMo029 SYSTEM “NorMo029.jpg” NDATA jpg>
          <!ENTITY NorMo054 SYSTEM “NorMo054.jpg” NDATA jpg>
          <!ENTITY NorMo083 SYSTEM “NorMo083.jpg” NDATA jpg>
          <!ENTITY NorMo095 SYSTEM “NorMo095.jpg” NDATA jpg>
          <!ENTITY NorMo108 SYSTEM “NorMo108.jpg” NDATA jpg>
          <!ENTITY NorMo137 SYSTEM “NorMo137.jpg” NDATA jpg>
          <!ENTITY % ISOlat1 SYSTEM "ISOlat1.pen"> %ISOlat1;
          <!ENTITY % ISOlat2 SYSTEM "ISOlat2.pen"> %ISOlat1;
          <!ENTITY % ISOnum SYSTEM "ISOnum.pen"> %ISOlat1;
          <!ENTITY % ISOpub SYSTEM "ISOpub.pen"> %ISOlat1;
          <!ENTITY % ISOtech SYSTEM "ISOtech.pen"> %ISOlat1;
          ]>
          <TEI.2 id="NorMoth">


  10. If you have a text with non-Roman characters, you will need to add ISO declarations for those characters, which is a process similar to adding jpg declarations. From the Etext home page, go to Standards, and under Special Characters and Language Codes, click on ISO Special Characters.

    Here, under iso num and iso pub, you can find how to tag special characters such as ampersand, degree, fractions, etc., in addition to finding the character sets for Greek, Cyrillic, etc.

    To declare the character sets for Greek, for example, add them in after the character sets already declared in the header:

          <!DOCTYPE TEI.2 SYSTEM 'teixlite.dtd' [
          <!NOTATION jpg SYSTEM "JPEG">
          <!ENTITY % ISOlat1 SYSTEM "ISOlat1.pen"> %ISOlat1;
          <!ENTITY % ISOlat2 SYSTEM "ISOlat2.pen"> %ISOlat1;
          <!ENTITY % ISOnum SYSTEM "ISOnum.pen"> %ISOlat1;
          <!ENTITY % ISOpub SYSTEM "ISOpub.pen"> %ISOlat1;
          <!ENTITY % ISOtech SYSTEM "ISOtech.pen"> %ISOlat1;
          <!ENTITY % ISOgrk1 SYSTEM "ISOgrk1.pen"> %ISOgrk1;
          <!ENTITY % ISOgrk2 SYSTEM "ISOgrk2.pen"> %ISOgrk2;
          <!ENTITY % ISOgrk3 SYSTEM "ISOgrk3.pen"> %ISOgrk3;
          <!ENTITY % ISOgrk4 SYSTEM "ISOgrk4.pen"> %ISOgrk4;
          ]>


  11. Find the <text id="XxxYyyy"> tag in the file. The ID here cannot exactly replicate the ID used in the <TEI.2 id="XxxYyyy"> tag. In order to differentiate the two, add a T to the end of the <text id="XxxYyyy"> declaration: for example, if your text were NorMoth, you would use

          <TEI.2 ID="NorMoth">

    and

          <TEXT ID="NorMothT">

  12. Save, and exit the jove editor.

VIII.

Checking TEI Tags

Usually when you check your text, the errors you will find are either those where you have failed to close an open tag, or have improperly coded a tag. Look for failure to include both <> characters, failure to add the / at the beginning of a closing tag or at the end of an empty tag, or failure to add quotation marks in a tag that requires them.

There are several ways of checking the tags in your TEI document, or parsing it:

A.

Parsing in UNIX:

The UNIX parser is paxer. At the UNIX prompt, type

      paxer filename

and wait. If there are errors in your file, paxer will spit them all out in a list. (If there are more errors than can fit on the screen at a time, you can view them screen by screen by first typing paxer filename | more at the UNIX prompt, and hitting ENTER to view each successive screen of error messages.)

The errors generated by paxer will give you the line number and character position of each of the errors it encounters. To view and fix errors, open your file in another jove window and move to the appropriate line by ESC linenumber ESC g .

One of the benefits of parsing with paxer is that it allows you to see all error messages for the file at once — so, if you have an error that occurs over and over, you have the ability to see it and correct it immediately, rather than incurring the same error message time and time again as you would in NoteTab (see below). However, parsing with paxer can be slow, and you may find it easier to view/fix file errors in the Windows environment, in which case you can parse using NoteTab.

B.

Parsing in NoteTab Pro:

  1. To parse using NoteTab, you will have to transfer your file from the Etext server to your local PC. To do this, open up SecureFX and log in to etext.lib.virginia.edu using your user ID and password. Locate your file in your working directory on the Etext server, and drag it into the C:\etext\document folder on your local machine.

  2. From the Windows desktop, click to open NoteTab Pro. When the program opens, go into the File menu and select Open. Locate your file in C:\etext\document, click on it, and then hit Open.

  3. To parse the file, click on XML parse in the clip library on the left of the screen. A new window containing the file error.txt will open, looking something like this:

          C:\etext\document\filename
               Error Message
               URL: file:///C:/etext/document/filename
               Line 00013: Text where error occurs
               Pos 00020: Position of error on line


    Thus, in this file at line 13, position 20, you will see your file’s first error. To move this line of the text, go back into the window with your file, open up the Search menu, select Go to Line, type the line number and hit OK.

  4. NoteTab only shows you one error at a time, so after correcting your first error, hit XML parse again and the next one will show up in the error.txt window. Once you have corrected all the errors in your file, clicking on XML parse in NoteTab will produce only a line that contains the file path and name of your text: i.e. C:\etext\document\filename . At this point, re-upload the file onto your home directory in the Etext server using SecureFX.

IX.

Share Your Joy

When you have eliminated all parsing errors, email the assistant director, Cindy Speer, , informing her that the text is ready. In your message, include the text ID and location, and the names and location of any associated jpg and gif images.

Cindy will compress the file for your to check in an internet browser. You can view the file by going to the Modern English Collection page and clicking on any publicly available text. Then replace the text ID in the URL with your file’s text ID, and hit ENTER. Check for any additional errors, correct them, and email again when the text is finalized.


| Back | Next |



By Andrew Rouner and Matthew Gibson
Revised Cindy Speer, 2004