Image Scanning: A Basic HelpsheetElectronic Text Center
University of Virginia
Charlottesville, VA 22904
The Electronic Text Center currently has three Epson 4870 and two Fujitsu ScanPartner 15C scanners, connected to Pentium PCs. Currently, we are using Epson Scan or ScandAll software for graphics scanning and OmniPage Pro or ABBYY FineReader for text scanning (optical character recognition, or OCR). PDF creation is also available using Adobe Acrobat. Additionally, the two Fujitsu scanners have an automatic document feeder attachment for processing large numbers of pages.
Image Scanning : The Process
The initial two decisions to make about any image scan concern the image type you need to create (greyscale, color), and the resolution at which you want it to be created, measured in dots per inch (dpi). In essence, you are determining how many dots per linear inch the scanner will record, and how much information each dot will record. The more dots per inch, and the more information in each dot, the bigger the file.
These decisions will be affected by the use you intend to make of the image. An item that has long-term viability -- a scan of an original document that is part of an ongoing project -- will need to be scanned at higher settings than a piece of clipart for a web page.
- 1-bit black and white (each dot can be either black or white)
There is little reason ever to use 1-bit black and white scanning. The visual quality is poor -- the image looks stark and edges of lines tend to be jagged -- and the 1-bit file is also not amenable to JPEG, the best of the image compression schemes (see below).
- 8-bit greyscale (each dot can be one of 256 grey shades)
8-bit greyscale works well for most non-color images, and gives a good, clear image. For archival images, you are better served scanning non-color images in 24-bit color (see Archival Imaging, below).
- 8-bit color (each dot can be one of 256 colors)
8-bit color ("color photo") gives a less photo-realistic image than 24-bit color ("millions of colors"), and can look a little grainy at times. However, the 8-bit color file will be much smaller than the 24-bit color file in an uncompressed form, and you may be working with viewers and programs that do not allow you to use a 24-bit color image. Increasingly, image viewing software that does not support 24-bit color (or cannot display it through your monitor/color board combination) will translate the 24-bit image into an 8-bit one. 8-bit color may well be suitable for "clip art" and web page images.
- 24-bit color (each dot can be one of 16.8 million colors)
24-bit color gives a much more photorealistic image, but results in a much larger file than 8-bit color. However, the JPEG process will reduce the filesize dramatically.
Resolution -- Dots Per Inch
The choice of dpi is ruled often by practical considerations. The higher the dpi number, the more information in the file, and the greater the ability to enlarge a detail from that image (if your viewing software supports such a feature). Note, though, that if the original image does not have much detail to enlarge, a high dpi setting may gain you little.
Raising the dpi value also increases the file size, sometimes beyond a size which your viewing software can cope with (or you can store). To take an extreme case, a 400 dpi, 24- bit color TIFF image that is as big as the bed of the scanner (8.5x14 inches) would be 55 megabytes in size (uncompressed -- see note on compression below). So, there is a degree of experimentation, and of tailoring the resolution to the purpose of the scan, in choosing a dpi value.
If the image will have its principal life on screen (such as an image for a web page), as opposed to being printed out, and if you do not need to enlarge details from it, there is no reason except archival concerns to scan at better than 100 dpi, because screen resolution is lower than this typically.
So, just as with image type, you need to match resolution to the purpose of the scan. A "clipart" image for a web page is fine at 100 dpi; an archival scan of a manuscript is not.
The following chart shows the size of an uncompressed 1" x 1" image in different types and resolutions:
Resolution (dpi) 400x400 300x300 200x200 100x100 2-bit Black and white 20K 11K 5K 1K 8-bit greyscale or color 158K 89K 39K 9K 24-bit colour 475K 267K 118K 29K
Image File Formats
At the scanner you are likely to create an image in an uncompressed format such as TIFF (works on all platforms), BMP (MS windows only) or PICT (Mac only). The TIFF file has long-term archival use, but is usually too large as a file to work with effectively, especially if you want an image as part of an HTML document on the Web.
There are several digital image formats that save a file in a compressed form -- GIF, Group IV FAX compression, and JPEG being the most common. Group IV FAX is of use if you have black and white drawings (not greyscale or color); GIFs give moderate compression on greyscale or 8-bit color (256 colors); the most useful of all is JPEG, which gives extraordinary compression on greyscale, 8-bit, and 24-bit color images. Note, however, that a JPEG compression does not simply store information in an abbreviated fashion; it also deletes (loses) information from the file. If you are working with large color files, a practical working method may be to archive the original 24-bit color images in a rich but large format such as TIFF and work with JPEG versions that will be a fraction of the size. At normal size it is difficult to tell a JPEG from a TIFF, even though the former file size may be 10-40 times smaller than the latter. You will see, however, that as you begin to enlarge the two files the JPEG image begins to "break down" much sooner than the TIFF (its constituent pixels become visible).
Note About JPEG
You are better off scanning at 24-bit and then making a JPEG than scanning at 8-bit. This does not, of course, mean that you need to keep the 24-bit uncompressed file -- just use it as a stage towards making the JPEG.
This section is an excerpt from the Archival Digital Image Creation helpsheet.
At the scanner
- Scan at 600 dpi by default. There may be cases where you will vary this depending on the amount of detail in the original, its physical size, and the predictable uses.
- Scan at 24-bit colour by default. Even greyscale book illustrations and engravings look much more realistic at 24-bit colour than at 8-bit greyscale, and the JPEG file produced from the 24-bit original is rarely any larger in KB than that made from an 8-bit original. You can always create a greyscale copy of the colour original if needs dictate.
- Create a TIFF file at the scanner -- an uncompressed format that is as close as we've got to an archival form. The TIFF uncompressed archival copy is large (which means that it has a lot of information in it, which is good). Filesize should not be a deciding factor in image resolution or bit density, although at the upper extremes it may be necessary to use 8-bit greyscale. In our case currently, this off-line storage is on a tape archiving system.
- Use the automatic colour and contrast balance on the scanner. Do no additional colour correction on the archival TIFF: better to have them archived with a consistent and known bias -- the bias imposed by a particular device (e.g. an Epson flatbed scanner). We need to avoid unrecorded and ad hoc correction of the originals, especially as the best we can do is to correct for a particular monitor. We might well consider the inclusion of a standard colour reference strip at the margin of each image.
- Before the TIFF is archived off-line, create one or more JPEG images for current use -- you might decide on a high-detail (low loss) and a low-detail (high loss) version. The precise settings are determined by the type of image -- as a rule of thumb, aim to have the better copy come in at 300-500 KB and the poorer copy at under 100 KB. Do whatever colour correction is necessary on the JPEGs.
- Taking the British Library's Beowulf Project as an example, I have turned a 21 MB TIFF file into a 600 KB and a 65 KB pair of JPEGS -- the former allows a lot of flexibility of use (details can be enlarged several times without pixelation); the latter allows little in the way of flexibility of use, but is very useable at regular size and loads quickly even on low-end graphical systems.
I strongly encourage that we also fill in a short tagged header template for each image or related group of images, saying how, when, and by whom it was created. The Etext Center has modified the TEI header to fill this role; at worst, a set of HTML <meta> tags in the <head> section would be better than nothing.
I would also suggest that this header should be added into the binary code of the image file itself. The Etext Center does this routinely now with book illustrations and Special Collections items, and is streamlining the process.
This data control adds to the creation time, but means that we have a searchable record of the item, a bibliographical header for future cataloging, and we keep track of what we have got. We should think of ourselves as building a text database to our images as we create the images. For some groups of images, a single header may do for all the images in a group -- you may not need a different header for each specific image.
For more information, see: Introduction to Imaging: Issues in constructing an image database by Howard Besser and Jennifer Trant.
Summary of questions to ask yourself when scanning an image
1) What is the primary purpose of this image? An item on a web page? An item in a printed document? If the former, the dpi can be lower (100 dpi) without the user seeing much difference.
2) Do I need to extract details and enlarge them -- if so, you need to scan at a higher dpi in order to give your image this elasticity.
1) Do I need color or greyscale?
2) Do I need high or low color content? If I am not able to use JPEG compression, can I afford to have high color content? That is, can I cope with the size of the file that 24-bit color produces?
1) What image file formats does my end-use software support? TIFF and BMP or PICT only? GIF and JPEG? If the latter, do I need to keep a TIFF master copy too (probably not if the image does not have long-term archival value, such as a picture of your pets to put on your Web homepage).
Image Scanning : the software
The PCs at the Electronic Text Center use Epson Scan or ScandAll and Photoshop to scan images.
- Open Photoshop by double-clicking on its desktop icon.
- From the File menu,
- select Import... TWAIN_32. This will prompt the scanner software to open automatically and preview what's on the scanner bed.
- After the initial preview, you will most likely want to change the settings for the Image Type and the Resolution.
Setting the Image Type (the bit density)
DeskScan's simplified scanning terminology can cause problems, because it is not always obvious which image type to use. "Black and white drawing" is the most descriptive of all -- use it for an image that has no shading -- just black and white (a pen and ink sketch, perhaps).
Resolution 400x400 300x300 200x200 100x100 Image Type B & W Drawing 20 11 5 1 (1-bit B&W) Color Drawing 158 89 39 9 (1-bit color) B & W Photo 158 89 39 9 (8-bit grayscale) Color Photo 158 89 39 9 (8-bit color) Millions of Colors 475 267 118 29 (24-bit color)
Note: The Halftone settings are infrequently used for specialized printing purposes, and the low quality of this image type does not provide any advantage. We don't recommend this setting.
Setting the Image Resolution
Because of the simplification of scanning terminology DeskScan uses, you need to remember the following if you decide to change the dots per inch setting:
-- To change the dpi (up to 600 dpi), go to the Custom menu, then choose the Print Path option, and do it manually.
Starting the Scanning Process
After you have specified dots per inch and image type, click on the Preview button. This will scan your item and send the output to the screen.
- You must now select an area to be scanned (using the mouse to draw a box around all or part of your image)
- Click on the Zoom button to see if your box is drawn accurately;
- Click on the "yin/yang" button between the contrast and
brightness slider bars to perform an automatic contrast/brightness
Note that above the Preview button there is a counter that shows the size your file will be in an uncompressed format.
- Once the image is ready to save, click on Final. This will prompt Deskscan to scan the image with your adjusted settings and send the file into Photoshop for processing and/or conversion to another format (such as GIF or JPEG).
It is usually most efficient to do all your scanning in Deskscan, then all your processing in Photoshop. It is possible to maintain several files open at once in Photoshop while you continue to scan. To scan your next item, simply place it on the scanner bed, click on Preview, and repeat the process articulated above. After you have finished scanning, exit Deskscan.
Quirks of the Program
DeskScan tends to run a little dark, especially on darker images and photographs. If this happens, nudge the brightness control up a little. DeskScan can also slightly favor green in color images with a lot of white, cream, or yellow in them (a manuscript page, for example). To compensate for this, go to the Tools menu, select the "Color Adjustment" option, and move the pointer away from the green a little.
Saving Your Files
- From the File menu in Photoshop
- select Save As...
- Name your file
- select an image format from the Save As drop-down menu
- Be sure to save only in C:\data.
Note about saving gifs:
- If you have scanned an image as an 8-bit image type (such as Sharp Color Photo or Sharp Black and White Photo), you can select Compuserve GIF as a format from the Photoshop Save As menu.
- If you have scanned an image as a 24-bit image type (i.e. Millions of Colors), you will need to export the file as a GIF. To do this, select Export and GIF89a Export... from the File menu.