Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

Etext Center Guidelines for the Creation of Archivable Illustrations

Guidelines for SGML Text Mark-up at the Electronic Text Center
David Seaman, Electronic Text Center, University of Virginia
[ornament]

Etext staff: if in doubt see David Seaman for guidance on the settings at which to scan an image.

We have eight years of experience in the creation of digital copies of book illustrations, typescript, and manuscript, so don't try to "go it alone". The .tiff files may be sizeable -- don't be offput, and especially don't be tempted to scan at too low a resolution (or God forbid, at 8-bit colour), just because a tiff is a big file. And we don't want to have to re-scan at a later date. The tiffs go off onto a CD as soon as we have made jpeg and gif versions for current everyday use.

Rules and Regulations of Image Scanning and Encoding

The following list explains the items we typically scan, their specifications for scanning, and how to name them for our electronic text database at the Etext Center:

  • What Typically Warrants Scanning
    • Images of the spine, front cover, end-papers (ONLY if visually interesting), frontispiece (if there is one), and title-page.
    • All other images in the text or anything that warrants visual interest--including ornamental capitalizations and small images embedded in the text itself.
  • Scanning Specifications
    • At the Etext Center, when we say an "image" we mean the entire page upon which the image is placed even if it is something as small as an ornamental capitalization. When you draw your box around the image that you want to scan, leave a few millimeters on each side of the page so the viewer can better appreciate the three-dimensionality of the book as a physical object.
    • All images are scanned and saved as 400 dpi (dots per inch), 24-bit color tiffs. See Special Collections Image Scanning for more information.
  • Image Naming Conventions
    • Again, all images will be saved in uncompressed tif format.
    • An image name can have no more than 8 characters as some of our work is done in the MS-DOS environment. These characters can only be numbers and letters--no punctuation.
    • At Etext, we typically name images so that they will correspond to the texts they are a part of.
      Example: if you are tagging the frontispiece, the titlepage, and an illustration on page 122 in Booth Tarkington's The Flirt (a work whose UVa ID is TarFlir) you would name these images as follows:

      Frontispiece: "TarFfpc"
      Titlepage: "TarFttl"
      Page 122: "TarFl122"



Illustration and Image Tags

The following tags are used to tag illustrations and information that goes with illustrations.

  • <figure> </figure>

    The <figure> tag pair indicates the location of a graphic, illustration, or figure. The filename for the digital image is given as the value of an entity= attribute.

  • <figure entity="FILENAME">

    "Entity" specifies the file in which the graphic image of the figure is stored. Do not include a suffix denoting the image type (e.g. FILENAME.gif). Usually, we will name the image file using as much of the work's unique ID as possible, and the page number on which the illustration occurs. As some of our work is done in the MS-DOS environment, the image filename should not be longer than eight characters.

    So, for an illustration from Booth Tarkington's The Flirt (a work whose UVa ID is TarFlir) the entity value for an illustration on page 122 would read as follows:

    <figure entity="TarFl122"> </figure>

  • <head> </head>

    The <head> tag may be used to transcribe (or supply) a heading or title for the graphic itself:
    Example:



    <figure entity="TarFl122">
    <head> "Kiss me some more darl----"</head>
    </figure>
  • <figDesc> </figDesc>

    The <figDesc> tag is important. The tag contains a brief prose description of the appearance or content of a graphic figure. The reason it is necessary to have is because the information in this tag allows the user to search for information within a particular illustration.
    Example:

    <figure entity="TarFl122">
    <head>"Kiss me some more darl----"</head> <figDesc>Grayscale illustration of a young girl trying to kiss a boy, under moonlight. </figDesc> </figure>

    Click here to see the image.





  • Note: if it is possible to use terms from the following control vocabulary, that would be to our advantage: The Thesaurus for Graphics Materials, consisting of 5,997 terms and numerous cross references indexing visual materials. TGM I is a companion document to Thesaurus for Graphic Materials II: Genre and Physical Characteristic Terms

  • You may also have one or more paragraphs following the <head> and preceding the <figDesc> to transcribe any additional text relating to the figure found in the print source.

    The <head> and <figDesc> fields are valuable sets of information for PAT searches -- as the set of etext images grows, they will allow a user to search image captions, and descriptions of those images. For a WWW user coming to the data through a VT100 client such as Lynx, the field should be able to be sent as an alternative to the graphical image.


Other Simple Examples

<figure entity="EliMid10">
<head>Dorothea</head>
<figDesc>
An engraved portrait of Dorothea posed thoughtfully at a writing table. Three stacked books stand in the right foreground. Dorothea's right hand holds a quillpen.</figDesc>
</figure>

Click here to see image.




<figure entity="EliMid50">
<head>Mr. Casaubon and Dorothea</head>
<figDesc>An engraving by W.L. Taylor showing Mr. Casaubon and Dorothea, presumably in their "hour's <hi>tête-à-tête</hi>." Casaubon sits in an upholstered wooden chair in the left background corner, facing the viewer, with Dorothea's right hand in his own. Dorothea sits on a footstool at center-right, turned towards Casaubon. The left quarter of her face is visible to the viewer. The setting is a sunny room with one curtained window and one uncurtained, open window behind the figures. </figDesc>
</figure>


Click here to see image.



SGML Text Embedded in Image Files

A growing number of our electronic texts have book illustrations and other book-related images along with the tagged ASCII text. To include an attribution record in these book illustrations we bury a version of the TEI header into the binary code of the image. The user who saves an image from a text on our etext server now gets -- in Trojan Horse fashion -- a tagged full-text record of the creation of that image as part of the single image file they save. The image header and related <figDesc> information gives us a searchable SGML text database for our images.

For a description of an early implementation of "text in images", see David Seaman: "Campus Publishing in Standardized Electronic Formats: HTML and TEI." in Scholarly Publishing on the Electronic Networks, 1994.




Specific Procedures for Adding Image Headers

Image Processing on Unix: ImageMagick

The mogrify part of this impressive Unix tool allows us to perform batch image conversions from one format to another (e.g., TIFF to JPEG) and to add tagged text headers into the images as we convert.

ImageMagick, is available from
ftp.x.org/contrib/applications/ImageMagick/
and is on the UVa etext machines. See the ImageMagick README file for more information.

For an interactive on-line implementation of ImageMagick, see the Image Machine at:

http://www.vrl.com/Imaging/


Overview

  1. To change formats from TIF to JPEG , type:

    mogrify -format jpg -quality 50 *.tif

    This will convert all the tif images within that directory to jpg files of 50% quality. You can use the same command syntax for other formats.

    To resize them as well, add the -geometry command:

    mogrify -format jpg -quality 50 -geometry 30% *.tif

  2. To add text into the image comments field, type:

    mogrify -comment @text.file image.file

    NOTE: the text.file should have each line of text preceded by a hash mark and a space; this enables programs such as JPEGView for the Macintosh to read the comments as well.

    You can batch process this as well:

    mogrify -comment @text.file *.jpg

  3. It is possible to convert formats and add a text header with a single command:

    mogrify -format jpg -quality 50 -comment @text.file *.tif

    All tif files will be converted to jpgs that contain the text comments in the text.file.

Step by Step Instructions for UVa Etext processors

1. Use the new TEI header template in etext/Done; it has several new fields:

  1. Just after the "Creation of machine-readable version:" field, there are two lines to indicate who created the digital images.


  2. The first note field should be used to indicate the existance of images; also note if the images come from a different source than the print text.


  3. In the <editorialDecl> section, there's now a standard indication about how we store the images.


  4. There's an extra <textClass> section which includes keywords and terms to indicate the artist, the type of visual work, and the type and dpi of the digital image; modify those fields as appropriate (i.e., if you have a 24-bit color image at 400 dpi, that's the only information that should appear in that field).


To add a header to an image:

  1. Make a copy of the completed TEI header for the text in question

  2. put a hash mark and a space at the beginning of every line:

    # <titleStmt>
    # <title>blah [a machine-readable transcription]</title>
    # <author>blah</author>
    # <respStmt>

    The hash marks are necessary for some image viewers. This text is now ready to go into the image(s).

    You can now simultaneously convert your tifs to jpgs and add in the header information above to those jpgs.

    If the header text file is called AutWork.header, and your various tiff files are image1.tif, image2.tif, image3.tif, and image4.tif, then this is what you do:


  3. Make sure the tiff files are in the same directory in which you are doing your mogrification.


  4. Type the following command:

    mogrify -format jpg -quality 50 -geometry 30% image*.tif


You have now converted all the image*.tif files into image*.jpg files, and those .jpg files have the textual information from the header embedded within them; the .tif files have remained unchanged. (You can view the text in the images by viewing the .jpg files in xv, calling up the control window, and choosing the "comments" button.)

If you want textual information that's specific to one particular image, you need only do the following:

  1. Repeat step 1 above.

  2. Repeat step 2, but add the following into the text after <text id=XXXXXXX>:

    <body>
    <p>
    <figure entity="XXX">
    <head>XXX</head>
    <figDesc>
    XXXXXXXX
    </figDesc>
    </figure>
    </p>
    </div0>
    </body>
    </text>
    </TEI.2>


  3. Fill in the fields with the information appropriate to the individual image. (These tags will also need the hash mark and space before them.)


  4. Repeat step 3 above.


Image Processing on the Mac: ADDJFIFcomment

  • 1. Move a jpg to the Mac; save it again as a jpg using JPEGView -- this process will only work with a Mac conformant jpg.

  • 2. Once you have a Mac jpg, call up the ADDJFIFcomment application; type in your text, and select "add"; then select the jpg file to which you would like to add comments.

  • 3. NOTE: if you want comments in a gif as well, follow steps 1 and 2, and then call that new jpg into JPEGView and save as a gif; ADDJFIFcomment won't take anything but a Mac conformant jpg.



Alternative, and much less preferable methods, used before ImageMagick

  • 1. Call up the image in xv, and save it in PBM (ascii) format; it will assign either a .ppm or .pgm suffix depending on whether the file is color or greyscale.

  • 2. Issue the following command:
    csplit -f pnum file.pgm 02
    or
    csplit -f pnum file.ppm 02
    This will result in two output files: pnum00 and pnum01. These two files are your original file.pgm split into two: the first line and everything following the first line. We want to insert the header after the first line in the .ppm or .pgm file.

  • 3. Concatenate the header and the two "pnum" files in the following order, to create a new file (here called "file-2.pgm"):

    cat pnum00 text.header pnum01 >file-2.pgm

  • 4. Call up file-2.pgm in xv and save back to JPEG, or convert to GIF; the text remains embedded.

NOTE: The text header must have a pound symbol and a space at the beginning of every line:
#


# text of header goes here
#

| Back | Next |