The Voynich Manuscript: Non-random word sequences as a byproduct of hoaxing

By Gordon Rugg

This article shows how non-random sequences of words are likely to be produced as an unintended side-effect of the table and grille method for producing hoaxed text.

These mechanism can be expected to produce non-random correlations at the level of:

  • Sequences of consecutive words
  • Sequences of words within a line
  • Sequences of words within a page
  • Sequences of words within a multi-page section of the manuscript
  • Sequences of words between different multi-page sections of the manuscript

These effects would not need to be planned by the hypothetical hoaxer(s). They would arise as a side-effect of the table and grille mechanism, and would probably not have been noticed when the manuscript was produced.

The basic table and grille method

The table and grille method works by creating a large table of meaningless syllables, grouped into sets of three columns, of prefix,root and suffix syllables. The hoaxer then uses a card with holes cut into it to reveal a set of three syllables, which form a “word” for the hoax text. When this word has been written down onto the page, the hoaxer moves the card across to a different set of three columns. It’s important to note that the card has to be moved in a non-systematic way, so as to disrupt any patterns in sequences of syllables which would betray that the text had been produced as a hoax.

The illustration below shows a small section of table, with a card over it, revealing a word.


The advantage of this method is that the hoaxer can use different cards, each with a different pattern of holes, to produce different combinations of syllables from a single table. However, there’s a limit to the number of different card patterns that can be used on the same table. When this limit is reached, the hoaxer has to produce a new table, with the syllables in a different order.

The way in which the table is filled in has a significant effect on the properties of the text that is produced, as described in the article by Laura Aylward and myself summarised on this site.

If the process is highly structured (e.g. a “4o” syllable in every fourth prefix slot) then the words produced will be shorter on average, though with quite a few longer words, and a comparatively high proportion of “blank” words where the card shows only empty cells on the table. If the process is very loosely structured, then the converse effect occurs.

One way of keeping track, while keeping a fairly even distribution of a syllable across the entire batch of relevant columns, is by using basic arithmetic combined with a fair degree of structure. Suppose, for instance, that you want to put 50 instances of a particular prefix syllable into a table that contains 480 prefix cells. An easy way to do that would be to put that syllable into every tenth prefix cell (giving a total of 48) and then to add two more instances of that syllable in other, arbitrarily chosen prefix cells, to bring the total up to the desired 50. This means that there will be non-random regularities in the distribution of that syllable, which are likely to interact with non-random regularities in the distribution of other syllables, as described below.

It’s very easy to make mistakes when filling in a table that you are trying to populate with a list of pre-chosen syllables in chosen frequencies. For instance, it’s easy to lose track of how many times you’ve written a particular syllable, or to forget to include a particular syllable at all. So, the new table is likely to have different syllable frequencies from the previous table, as well as having a different set of syllable distributions within the columns.

We would therefore expect that a manuscript produced using this method would show noticeable differences in syllable frequencies between the sections produced using different tables. That’s just what does occur, as described in my article about textual structures in the Voynich Manuscript, on this site. The illustration below shows the frequencies in some common syllables in the Voynich Manuscript, which change abruptly at different places in the manuscript. Many of these changes are greater than the changes between English and German in terms of syllables occurring in both languages.

common syllables in VMS no lines

Word correlations

The process of filling in the hypothetical table was unlikely to be truly random at the time when the Voynich Manuscript was created, since the concept of randomness was unknown at the time, and since true randomness is hard to achieve even today. We would therefore expect non-random clusterings of some syllables. These clusterings would make some words much more likely to occur than would be expected by chance. The illustration below shows the section of a hand-generated table that I will use as a demonstration of concept.


The blue rectangle on the top left is a card, with three holes cut in it. The red, yellow and green sections in the top right show groups of three columns, where the first column in each section contains prefixes (column headed “P”). The second column in each section contains roots, and the third column in each section contains suffixes.

The illustration below shows a closeup of these three sections, with EVA transliterations of the Voynichese characters into Roman characters.

better image 4

The next illustration shows the relative frequencies of syllables within a single column. The lighter the shade, the more common the syllable. For instance, in the leftmost column, the two most common syllables are [qo] and an empty cell []. In the second column from the left, the syllables [ke] and [tch] occur once each, and the syllable [ch] occurs twice. In the third column, the syllable [y] occurs once, and the syllable [dy] occurs three times.


Consecutive word correlations

This has implications for which words are likely to be generated from this section of table, as described below.

The first set of three columns is disproportionately likely to produce the word “qochdy” and the word “chdy”. Note how the regular spacing of the “qo” and the blank cell in the first column correspond with the regular spacing of the “ch” in the second column, as a by-product of this table having been completed using a regular spacing as opposed to a random spacing.

The second set of columns is likely to produce a word beginning with “qosh” and equally likely to end with any of the four suffixes.

The third set of columns is likely to produce a word beginning with an empty cell and ending with “dy”. Any of the four root syllables is equally likely to occur in the middle of the word produced by this set.

This section of the table would therefore produce a weak correlation between successive words; for instance, the sequence “qochdy qocheealy kedy” will probably occur more often than the sequence “qokey keealy rokeor” across different pages, within the section of each page produced using this part of the table.

Within-page effects

At a higher level, there are likely to be several “hot spots” within a table where particular words are disproportionately likely to occur, as a result of non-random clustering of particular syllables, due to regularities in the structure of the table.

The illustration below shows some “hot spots” in the entire table.


The first of these, shown by the white rectangle, is disproportionaly likely to produce the words “qochdy” and “chdy”. The second is likely to produce the word “okeey”. The third, in the middle of the table, is likely to produce “chedy” and “chey”. The blue one in the bottom left is likely to produce “chedy” and the final green one, in the bottom right, is likely to produce “ky”.

Taken together, these effects mean that the pages produced from this table are all disproportionately likely to contain the following distributed sequence of words at successive points in the page:

  • qochdy or chdy
  • okeey
  • chedy or chey
  • chedy
  • ky

We would expect this correlation to occur across an entire section of the manuscript, where that section has been produced from the same table.

The Voynich dialects; a possible case of different tables

One unusual feature of the Voynich Manuscript is that it’s written in at least two different “dialects” known as Voynich A and Voynich B which broadly correspond to two different handwritings. This isn’t a clear-cut distinction between A and B; some sections show a mixture of features from both dialects.

This is hard to explain via the “natural language” hypothesis, but is easy to explain as an unintended side-effect of a hoaxer working with an assistant, where each of them has produced their own table to generate text, and they have ended up with different syllable frequencies in the two tables.

Because of the limited number of pages that can be produced from a single table before there’s a risk of repeating previously generated text sequences, one or both of these hypothetical hoaxers would need to generate more tables to complete the manuscript. My guesstimate is that about six to ten tables would be needed. The distribution of syllable frequencies in the image at the start of this article is consistent with text produced using about that number of tables, where the tables have varied in how consistently they have maintained the same number of instances of each syllable.

If a hoaxer and an accomplice were using variants of two initial tables, then it’s plausible that the text produced using variants of the same table would contain similar word sequences, as a side-effect of using the same list of syllables and of syllable frequencies to complete the tables (there are some noticeable differences between syllable frequencies in Voynich A and Voynich B). This would produce similarities and differences in word sequences between different sections of the output text, depending on which “dialect” of table had been used for each section.

Summary and conclusion

Because of the non-random nature of the table and grille method, we would expect to find above-chance correlated sequences of words within text produced using this method. Some of these sequences would be at the level of consecutive words; some would be at the level of an entire page. In the case of sequences spanning an entire page, we would also expect to see those sequences occurring across different pages produced using the same table.

It is highly unlikely that whoever created the manuscript would have noticed, or even understood, any such effects.

We would therefore expect to see non-random word sequences of the type described by Montemurro and Zanette as a likely and unintended side-effect of the table and grille hoaxing method.


7 thoughts on “The Voynich Manuscript: Non-random word sequences as a byproduct of hoaxing

  1. Pingback: Our main posts: An overview by topic | hyde and rugg

  2. Pingback: Toch verborgen boodschap in Voynich manuscript? - Kloptdatwel?

  3. What I don’t understand is why you think less than half the manuscript would be faked when the greater part of it – the imagery – is patently genuine. I can demonstrate the antiquity of the underlying sources, the presence of details reflecting the intermediate enviroment, and then the routes by which it evidently came into the Mediterranean again in the century from mid-twelfth to mid-thirteenth. Moreover, Panofsky’s first opinion seems amply justified, and I’ve yet to find a single detail suggesting the imagery ‘faked’ in any way. On the contrary, it has been copied so very carefully that the original (oldest) strata remain clear.

    So from my point of view, you’ve done the equivalent of demonstrating how monkeys might type Shakespeare, while ignoring the whole Shakespearean corpus.

    • I’ll be dealing with the issues of the images and their relationship to the text in a future post. It’s a complicated issue, that can’t be fully explored in a single comment.


  4. Pingback: Voynich articles overview | hyde and rugg

  5. Pingback: One hundred Hyde & Rugg articles, and the Verifier framework | hyde and rugg

  6. Pingback: The Rugg and Taylor “Cryptologia” article on the Voynich Manuscript | hyde and rugg

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s