ABBYY FineReader: A Basic HelpsheetElectronic Text Center
University of Virginia
Charlottesville, VA 22904
One brand of OCR software we use in the Electronic Text Center is ABBYY FineReader. This software provides many of the same features found in OmniPage Pro (including the ability to learn new characters, scan discrete zones in a document, spell-check, etc.) but it is especially well-suited for optical recognition work involving certain non-western typefaces like Cyrillic. Text generated by this program can be saved in a variety of formats.
The FineReader desktop
The FineReader desktop consists of several toolbars and a viewing area divided into four frames:
- The leftmost frame, labeled "Batch," displays thumbnail images of each page scanned into a given file. Maneuver through the pages in your file by clicking on the appropriate thumbnail.
- The frame to the immediate right of the "Batch" window, labeled "Image," displays an image of the page currently selected (once scanned) and any zones within it that have been defined.
- Stretching out below the "Image" frame is another window displaying a "zoom" window providing a close-up look at portions of the scanned document for the purpose of correcting the OCR product.
- The rightmost window, labeled "Text," displays the results of the OCR process as a word-processor document that may be manually corrected.
You can adjust the size and display of the viewing panels by selecting options from the View drop-down menu.
Quick Help: Using OCR Wizard
For most OCR work, a few basic default FineReader settings will produce satisfactory results. If the original document contains clear, readable text (such as a printed book or output from a laser printer), is arranged in a standard single or multi-column format, and features a typeface approximately 8 pt. or larger, use the completely automated process or OCR Wizard.
For a completely automated OCR experience select the first large process button on the "Wizard" toolbar, labeled "Scan&Read Multiple Images." No further user input is required if this option is selected.
To be led through the OCR process with automated help, select the small arrow next to the first process button. Then, from the drop-down menu click on the "Scan&Read Wizard" option.
FineReader's OCR Wizard will take you through the scanning process step-by-step, prompting you to answer several questions about your document:
- How will you get your image?
- Select "From scanner"
- FineReader indicates what settings it thinks are optimal and then opens the scanner dialog box. If your document is relatively clear, choose "Line Art." If you suspect the OCR program will encounter more challenging text, though, choose "Grayscale" as this will preserve more textual information for FineReader to draw upon.
- Select a recognition language
- Select the appropriate option
- FineReader now determines the layout and initiates the OCR process
- Check the recognized text in the editor and estimate it
- Estimate whether the text produced by the OCR program has many or few errors. If you think there are many errors, FineReader will try to troubleshoot the problem.
- Check the recognized text before saving it?
- This option lets you decide whether to use the integrated spell-checker before saving the results of the OCR process.
- Select a saving mode
- Select the appropriate option
- Be sure that you save your files ONLY in C:\data
Using the Process Buttons (controlling FineReader manually)
For more control over the OCR process, you can forego the OCR Wizard and conduct the OCR process manually, using the large process buttons.
There are three main steps in the OCR process: scanning, drawing zones, and character recognition.
Scanning: Press the process button labeled 1. Scan on the "Wizard Toolbar" to begin. A dialog box will appear allowing the user to determine the scanning properties. Place your text on the scanner bed, adjust the settings, and click the OK button. The scanner will scan the page; a progress bar near the bottom of the screen indicates the progress. When the scan is complete, FineReader will display a thumbnail image of the page in the leftmost panel and a larger page image in the panel to its immediate right.
Drawing zones: In the center panel, draw a box (or multiple boxes) around those parts of the text that you wish to scan. The boxes, or "zones," will be numbered in the order in which you draw them, so draw your zones in the order in which you want text to appear. A zone can be resized by clicking and dragging in one of the four corners. If you want to get rid of a zone you've drawn, right-click once inside that zone and choose Delete Block from the menu that appears.
Character recognition: After you've drawn the zone(s), press the third process button labeled "2. Read All," FineReader will attempt to recognize each letter of text (thus the term "optical character recognition") within the zones you drew in the previous step. A progress bar will appear to indicate the progress. When the recognition process is complete, FineReader will display the text in the right window. Text that FineReader suspects might contain an error is displayed with blue high-lighting.
You can edit the text in the right window at this stage, or you can save the text to a file and edit it later in a word processor.
Once you have begun scanning it's wise to save your work every ten
pages or so.
- Select "Save Text As" from the File menu. (Alternatively, go to the fifth, rightmost process button labeled "4. Save," and then click that button.)
- Be sure that you save only in C:\data.
- Select a file type. You have the option of saving the file in a wide range of word-processing formats (WordPerfect, MS Word, etc.), as well as in ASCII. FineReader will often "over-format" a text, that is, it will often attempt to create exactly the look of the printed page and as a consequence fill the output document with word processing codes. Rich Text Format (RTF) seems to be a good format for saving your scanned text; it will maintain bold, italics, and some fonts, but is readable by all word processors.
- Click "OK"
Training FineReader to Recognize Special Characters
A useful FineReader feature is the ability to train the program to recognize special characters, such as ligatures or the Middle English thorn and yogh, that it would otherwise miss. This feature can be helpful in reducing the error rate even if your text does not contain special characters, since certain standard characters resemble each other, such as capital letter O and numeral zero, etc.
- To initiate the training process, choose "Options" from the Tools menu. In the dialog box that appears, click on the tab entitled recognition and enable training by selecting "Train user pattern." Click "OK" to return to the program.
- After initiating training, FineReader will prompt you for corrections to what it perceives as mistakes in the OCR process. It will continue to do so until you feel it has compiled a large enough database of common problems and close the training window.
- Once you feel FineReader is trained, close the window and have it apply the appropriate changes to the text documents when prompted.
- In all subsequent scanning, FineReader will recognize text using this training file until you select either a new training file or turn off the option completely.
A More Detailed Explanation of FineReader's Settings
For maximum control over the OCR process, you may adjust the process settings manually.
From the Tools drop-down menu, choose "Options". A dialogue box will appear containing seven panels:
- General - This panel includes basic options involving the display and function of FineReader. The default settings should be sufficient in most cases.
- View - This panel has options affecting the aesthetics of the OCR display, including a means to change high-lighting colors.
- Scan/Open Image - These options control the interaction of FineReader with the scanner and should be left at their defaults.
- Formatting - One of the more useful panels, the options here allow the user to determine how faithfully the OCR process retains the original document's formatting.
- Check Spelling contains menu options to enable automatic proofreading.
- Recognition - This panel contains several options controlling features of the OCR process.
- Document type allows the user to predetermine the layout of the scanned text.
- Print type allows FineReader to compensate in response to different quality source documents.
- Training allows the user to teach FineReader how to read ligatures and other special characters (see the more detailed description of this process above).