Text Collation Software: A variety of methods
WordPerfect has a feature that compares two versions of a single document and displays the differences between them. Each differing phrase will be listed twice: once in redline, indicating the phrase as it appears in the first document; and once w! indicating the phrase as it appears in the second document.
To use the "Compare" feature in WordPerfect 8 open the primary document, then:
- Select "File", then "Document", then "Add Compare Markings".
- Fill in the complete pathname for the second document, and select the "Word" compare option.
- WordPerfect will mark-up the on-screen document with the differences.
Hint: Go to the Tools menu and make sure that the "Spell-As-You-Go" feature is turned off.
The diff utility in UNIX (the operating system run by most of the UVA servers) will compare two input files and create an output file containing the lines differing between the two. At the UNIX prompt, the diff command is:
diff [-b] [-i] [-w] input1 input2 > output
where "input1" is the first file, "input2" the second file, and "output" the name you wish diff to give the file containing the differences. In addition, the following optional switches can be included in the command:
-b Ignores leading spaces and tab characters and considers other strings of ! compare as equal.
-i Ignores the case of letters.
-w Ignores all spaces and tab characters.
UNIX features a number of related utlities addressing special problems or requirements:
-- bdiff works like diff but is designed for very large files.
-- diff3 works like diff but is designed to compare three files.
-- comm compares two files and generates an output file that contains either: all lines common to the files; or all lines unique to one or the other file.
Collate will take multiple documents and produce a document detailing the textual variants with a number of different possible settings and apparatuses. A complex set of variables means that Collate takes time to master, but certain settings must be included in order for the simplest files to run. The software needs a set of rules to obey, and so a "Prepare" file containing collating parameters must be created. This file must include:
- MS List: a list of the input files to be collated
- Block Markers: a list of tags that tell the software about the texts' basic structural features.
- In addition, the input files must have at least one tag at the top of the file telling the software where the text begins.