Etext HomeGeneral InfoCollectionsServicesFeaturesStandardsContact UsQuestions?VIRGO

Collate 2

Collate 2 is a text collation program running on Macintosh that will compare multiple witnesses of a work and produce a variety of reports detailing textual differences. The description below outlines how Collate works, and more important, on what sorts of texts it will produce useable results.

[The first two sections of the following is taken from Collate 2: A User Guide, by Peter Robinson (Oxford: 1994)]

1. How Collate Works

Firstly, Collate expects that the scholar, not the program, will decide what broad sequences of text correspond with one another and should be collated against one another. This is a significant difference from some other collation programs (notably CASE) which are able to look through the text for paragraphs (for example) beginning or ending with the same words and which should be collated against one another. Rather, Collate expects that corresponding blocks of text in the various files should be identically marked ('<CH 3>' or '<L 15>' or '<ACT I><SC ii><L 1 >' etc.) The scholar, not the program, must define and mark the corresponding blocks in each file. Collate will then find these corresponding blocks in each file and then collate them against one another. Indeed, the Block Maps collation facility in Collate 2 means that the Witness Files can have blocks in any order and Collate 2 will find them and tell you also where it found them: what was before and after this block. Thus: <ST 5> could occur in some files after <ST 4>, in others after <ST 3>, in others it could come first. Collate 2 will find all these and tell you also where <ST 5> stands in each file.

Secondly, Collate works on the principle that all the various witnesses (up to a hundred at once) should be collated against a single 'base text'. Every other collation program I know also accepts this concept. Many scholars have vigorously advocated 'base-free' collation, on the principle that presenting any one text as the 'base' or 'master' prejudices choices about the 'best text', etc. For simply practical reasons, however, it is very difficult to implement collation of many witnesses without choosing one of these as a base. The base may provide no more than a sequence of pegs from which all the variants hang. The ease in Collate 2 with which one can switch from one base text to another, by a single point-and-click operation in a dialogue box even during a collation, itself undermines the nominal authority of any one base-text. In Collate 2, there are as many possible base texts as there are witnesses to collate. If this is not enough, the Lineated Apparatus style comes close to doing away altogether with a base- text: the variants are here seen as stacked one above another in the various files, with little or no privilege for the base text.

Thirdly, Collate believes that the best collation will break the text into the shortest possible segments of variants. Imagine you are collating 'The very good wife of Oxford' against 'The verry GOOD man of Ox ford'. It is possible to see the whole phrase 'The verry GOOD man of Ox ford' as a variant on the whole phrase 'The very good wife of Oxford', and you could indeed instruct Collate to do this. But it is likely that a scholar will see each word in the phrases as a separate variants, thus:

           The] The
           very] verry
           good] GOOD
           wife] man
           of] of
           Oxford] Ox ford

Collate is, indeed, capable of exactly this collation. Once Collate has located the block of text to be collated in the various files (as here) it proceeds by comparing the two or more texts a word at a time. Firstly, it checks to see that the words are not identical. Here, it finds that the first word of each text, 'The', is identical and so it will output these as follows:

           The] The

Collate now moves to the next work in each text, 'very' and 'verry'. These two words are not identical. However, Collate has been programmed to test how different words are. If, as in this case, they differ only by one or two letters then it will find immediately that these words are variants of one another. This is what is called a 'Fuzzy Match Test', and depends on a mathematical calculation of just how alike any two words are. As 'very' and 'verry' differ in only one letter the program will see them as variants of one another:

           very] verry

It now proceeds to the next word in each text, 'good' and 'GOOD'. In this case, it will see that they are not identical. However, you can load a Fuzzy Match File into Collate which tells Collate that words which differ only by certain prescribed orthographic equivalencies (as in upper and lower case forms of the one letter, thus 'g' for 'G' etc.) are actually variants of one another. Here, we have loaded a Fuzzy Match File which explains that the forms 'g'/'G', 'o'/'O', and 'd'/'D' are orthographic equivalents and words which differ by just these letters are actually variants of one another. Now, the words 'good' and 'GOOD' differ only in this way. Collate will therefore see that these two words are variants of one another, thus:

           good] GOOD

Collate now moves to the next pair of words, 'wife' and 'man'. To identify the previous words in each text as variants Collate had only to look at the words themselves. But to the machine, 'wife' and 'man' have nothing in common. In this case Collate discovers that 'wife' and 'man' are variants by looking at the next word in each text. Here, it finds that the next word is 'of' in both texts. It then deduces that if the next word in each is identical (or a variant), then this word must be a variant. Collate uses this 'contextual' searching to determine cases of addition or omission, and replacements of phrases by phrases. Thus:

           wife] man

Finally, after dealing with 'of', Collate will calculate that 'Ox ford' and 'Oxford' are variants of one another by a process of concatenation. It sees that 'Ox' is much shorter than 'Oxford', and so it adds the next word 'ford' on to 'Ox' to get 'Oxford', and discovers that 'Ox ford' is a variant of 'Oxford':

           Oxford] Ox ford

This sketch shows that the identification of just what is a variant on what in Collate is actually a rather complex process. It can be quite difficult at times to determine exactly what routine in Collate has found what variant. This matters less than it might appear because usually Collate is very good at finding the right variant. However, it is (after all) a mindless computer program quite capable of finding the wrong variant, or presenting it the wrong way. This is why the program includes so many tools to allow you to control the way it works, how it presents its results, and even a tool ('SetVariants') which allows you to overrule a particular collation and impose your own choice of what should collate with what and how in the program.

2. Is Collate For You?

Before you set out on the considerable labour of preparing all your texts for processing by Collate, you should make sure that your texts are actually close enough or long enough for collation to be possible and useful.

The description of Collate in the previous sub-section showed that the program is designed for word-by-word collation of many versions of the one text. It is well suited for the situation often obtaining in medieval manuscript traditions, where many manuscripts descend by copying from a single original. It can cope with authorial or scribal revision, even differing recensions, as long as the variations are relatively local. In general: if it would make sense to collate a text by hand, Collate can do it.

Collate will not produce useful results if the versions vary so much that word-by-word collation is not reasonable. Radical authorial revisions, distinct translations of a foreign original, separate oral versions of a single tale, are among cases where Collate is unlikely to help. In general: if it would not make sense to collate a text by hand, Collate can not do it.

For example: Collate will cope readily with collation of the Ellesmere and Hengwrt manuscripts of the Canterbury Tales; collation of the three versions of Piers Plowman, or the two versions of King Lear, would present difficulties, but might be worthwhile at the points where the versions touch; collation of Pope's Iliad with Chapman's would be very dubious; collation of Dryden's All for Love with Shakespeare's Antony and Cleopatra would be senseless.

No assumptions about methods of textual criticism are built into Collate. It is a tool to find where witnesses agree or disagree with one another and with any given base text. How this tool is used is your choice.

The introduction of the Block Maps collation facility in Collate 2 greatly extends the variety of texts which Collate can process. Previous versions of Collate would start at the beginning of all the texts and work its way through them all, a block at a time, and then a word at a time within each block. For collation to be successful, all the blocks in all the computer files had to be in the same order. With Block Map collation, it does not matter what order they are in, and different files can have quite different selections of blocks in quite different orders. By using different base texts and the commands on the Set Bounds sub-menu, you can have Collate 2 single out from a text exactly which blocks you wish to collate. Thus, with Collate 2 you can identify in various longer texts exactly which parts are close enough to collate and have the program collate those parts and only those parts, wherever they occur.

Finally, Collate 2 imposes very few restrictions on the kinds of text it can collate. It can cope with prose or poetry, and with text divided into very short or very long units. The one theoretical restriction is that any one collateable block should contain no more than 32,768 words.

3. An Extremely Brief and Informal Walk-Through

Open the Collate 2 program. Choose the "Prepare" option and then create or load a prepare file by choosing either New Prepare File... or Load Prepare File... from the menu as appropriate. This file lets Collate know what it should process. Once you have a prepare file, specify the witnesses that need to be collated using the Make Witness List... function under the "Witness Files" subheading in the "Prepare" drop-down menu. Follow the onscreen prompts as they appear to complete this process.

Choose the Collate command in the "Collate" drop-down menu (it is the last option in the list). It is quite likely that the program will ask if it may discard some text. Select OK to continue the process (this will not compromise the integrity of the results) and your collation will be ready in moments.

There are several more advanced features governing the "fuzzy match" system, display options, and block mapping function. Please consult the help file for more information on these features.