By Gordon Rugg and Gavin Taylor

We’ve recently had an article published in Cryptologia about our work on the Voynich Manuscript, which was discussed in New Scientist. The Cryptologia article is behind a paywall, so in this article we’ve summarised the key points, for anyone who wants some more detail.

The background

Our involvement with the Voynich Manuscript started when Gordon needed a test of concept for the Verifier method that he had developed with Jo Hyde, for detecting errors in previous research into hard, unsolved problems.

The Voynich Manuscript is a book in a unique script, with odd illustrations, which had previously been believed to be an undeciphered text, either in a unidentified language or in an uncracked code. There were serious problems with both those explanations for the manuscript. If it was an unidentified language, then it was an extremely strange one. If it was an uncracked code, then it was either astonishingly sophisticated, or was based on a very unusual set of principles. The third main possibility, namely that the manuscript contained only meaningless gibberish, had been generally discounted, because there are numerous odd statistical regularities in the text of the manuscript, which everyone believed were much too complex to have been hoaxed.

Gordon’s work showed that this belief was mistaken, and that the most distinctive qualitative features of the Voynich Manuscript could be replicated using low-tech hoaxing methods. This resulted in an article in Cryptologia in 2004.

Gordon’s initial work, however, did not address the quantitative statistical regularities of the text in the manuscript.

Our recent article in Cryptologia addresses this issue, and shows how the most distinctive quantitative features of the VMS can be replicated using the same low-tech hoaxing methods as Gordon’s previous work. These features arise as unintended consequences of the technology being used, which produces statistical regularities as unplanned but inevitable side-effects.

Taken together, these two articles show that the key unusual features of the Voynich Manuscript can be explained as the products of a low-tech mechanism for producing meaningless gibberish.


The Voynich Manuscript: Background and overview

The Voynich Manuscript is usually described as having been found by Wilfried Voynich in 1912. It’s a hand-written book of about 240 pages (some pages appear to have been cut out, and others are fold-outs, so different definitions will result in different numbers of pages. The document is illustrated on most pages; the illustrations are usually described as a mixture of the prosaic (e.g. a water lily) and the bizarre. Radiocarbon dating gives a date of around 1420 for the vellum on which the manuscript was written.

All of the key points above are open to varying levels of debate.

The circumstances in which Voynich obtained the manuscript are debatable; he probably found it in a semestery in Italy, but there have been suspicions that this was just a cover story that he used to conceal his actual source, for commercial or other reasons.

The missing pages may well have been cut out because they featured striking artwork, but Gordon has argued in another article that they may have been removed as part of a hoax.

Many of the illustrations appear bizarre to most modern readers, but if you compare them with the illustrations in mediaeval manuscripts, particularly the unofficial illustrations in the margins of those manuscripts, then the Voynich Manuscript illustrations are actually not remarkably odd. The banner image above shows two examples of marginalia from other mediaeval manuscripts. Here are some more from the Liber Floridus, of about 1090 to 1110, juxtaposed with similar images from the Voynich Manuscript.


We’re not claiming that the Voynich Manuscript was a hoax based on the Liber Floridus, but we are pointing out that the Voynich Manuscript illustrations are far less unusual than is often claimed.

We don’t have any arguments with the carbon dating for the vellum, but for anyone trying to work out when the manuscript was produced, it’s important to understand that carbon dating gives a probable date with a range of uncertainty. It doesn’t give an absolute earliest possible date and latest possible date (e.g. “beween 1400 and 1440”). Instead, the further you move from the most likely date, the lower the chances that the item actually comes from that date; for instance, if the most likely date from carbon dating is 1420, then you can calculate how likely it is that the artefact actually dates from 1440, or 1460, or 1500, or whenever.

There’s also the issue of when the manuscript was actually created, which may have been long after the date when the vellum was produced, as Gordon has argued here.

That’s the brief background of the manuscript itself. There are numerous odd statistical features in the text of the manuscript, some of which Gordon addressed in his 2004 article. In that article, he also addressed the feasibility of producing the manuscript as a hoax. There’s more about that issue in this series of blog articles.

His conclusion was that a hoax was completely feasible, both logistically and financially. However, these articles did not address all of the statistical features of the text in the manuscript. Our recent article addresses those remaining features. The key points we made in the article are as follows.

The quantitative features addressed in our Cryptologia article

Gordon has blogged previously about some of the quantitative features of the Voynich Manuscript in the articles below.

Word structure, and statistical properties of words

This part of our Cryptologia article pulls together various evidence about how the word structure and statistical properties of text in the Voynich Manuscript could be hoaxed.

The word structure of [optional prefix] [optional root] [optional suffix] is similar to Latin and other Indo-European languages, apart from the odd feature of the root not always being present in Voynichese text.

In our article, we show how creating syllables semi-systematically by increasing their length will lead to a binomial distribution of word lengths, which occurs in Voynichese. For instance, the following syllables all quite commonly occur in Voynichese:

Prefix: o, ol, olo

Root: k, ke, kee

Suffix: y,dy, ldy

If you combine these syllables in all the possible permutations, the resulting distribution is the same, on a small scale, as the distribution of word lengths in Voynichese.

Line structure, and statistical properties of lines

This part of our article shows how different ways of populating a table with syllables will affect various statistical features of the text produced using the table. A key point to note is that the table and grille method does not produce truly random output. This is a key feature of the method, and addresses the long-recognised point that the text in the Voynich Manuscript is not a random assemblage.

Zipf’s law

A common criticism of Gordon’s previous work was that it did not address the way that text in the Voynich Manuscript follows Zipf’s law (a curve for word frequency which is followed by real languages).

In this part of our article, we were able to show via Gavin’s work that three different sets of meaningless gibberish produced using the table and grille technique all follow Zipf’s law, and are comparable to a range of real natural language texts in their Zipf’s law curves.

This is arguably the most significant part of the article. Zipf’s law had previously been regarded as strong evidence for the text of the Voynich Manuscript containing meaningful content. Our findings demonstrate that this argument is actually weak.

Distribution of syllables within the VMS

In this section, we demonstrate that the distribution of common syllables within the Voynich Manuscript is very different from the distribution of common syllables in real human languages, but very similar to what would be predicted from the production of meaningless gibberish produced using tables and grilles, with several changes of table for logistical reasons.

One of Gordon’s previous blog articles also discusses this feature.

Closing thoughts

In conclusion, we argue that all the most striking features of the Voynich Manuscript can be produced by using tables and grilles to produce meaningless gibberish.

This set of results does not prove that the Voynich Manuscript is a hoax, but it does show that a hoax would be easily feasible. Our estimate, based on Gordon’s use of the method to produce hand-written illustrated pages, is that the entire manuscript could have been produced in ten weeks by one person acting alone, or in about half that time working with an accomplice. Gordon examined the economic feasibility of a hoax in this blog article and concluded that it was well within the range of costs and expected payoffs from documented art hoaxes.

When was it made?

This method of producing gibberish has some implications for estimating the date when the Voynich Manuscript was produced.

The carbon date for the vellum of the manuscript is around 1420, with the usual range of confidence that comes with radiocarbon dates. However, this does not say when the manuscript itself was produced. Rich SantaColoma has shown that obtaining large quantities of old vellum is quite easy, and was easy in antiquity. It is therefore perfectly possible that the manuscript was written on already-old vellum, either as an innocent use of available material, or as part of a deliberate attempt to make the manuscript look aged.

The nature of the hoaxing method described here is consistent with a hoax significantly after the 1420s because of the features that the method is producing, such as some letters, some syllables and some words being more common than others.

These features are consistent with what a relatively sophisticated cryptographer would want to reproduce, which implies a date of 1470s or later. Cryptography in the 1420s was comparatively crude, so a hoaxer at the time when the vellum was produced would be unlikely to pay much attention to these features. In the 1470s, however, cryptography went through a first golden age, with significant advances driven by conflict between the Italian city states.

This mismatch between the carbon date and the features being produced might be due to a 1470s hoaxer using old vellum just because it was available. It might also, however, be a significantly later (e.g. 1580s) hoaxer using old vellum and textual features to make the manuscript look like something from the 1470s, and accidentally overshooting with the age of the vellum.

There are some other features of the manuscript which could throw light on whether or not it is likely to be a meaningless hoax. A key question is whether there are significant numbers of erasures and corrections in the text. If the text contains meaningful content, then we would expect there to be numbers of erasures and corrections comparable with other documents of the fifteenth or sixteenth century. If, however, it contains only meaningless gibberish, then we would expect few or no erasures and corrections. As far as we know, the pages that have so far been examined show no erasures or corrections, so the balance of evidence is consistent with a meaningless hoax. In anything relating to the Voynich Manuscript, however, few things stay constant for long…

Notes and links

We’ve used the images from the Liber Floridus under fair use terms, as low-resolution images that are already in the public domain, being used as part of an academic study.

Other resources:

  1. Good day!
    I don’t agree with Gordon Ruug.
    The manuscript is not written with letters and characters denoting letters of the alphabet one of the ancient languages. I picked up the key, which in the first section I could read the following words: hemp, wearing hemp; food, food (sheet 20 at the numbering on the Internet); to clean (gut), knowledge, perhaps the desire, to drink, sweet beverage (nectar), maturation (maturity), to consider, to believe (sheet 107); to drink; six; flourishing; increasing; intense; peas; sweet drink, nectar, etc. Is just the short words, 2-3 sign. To translate words with more than 2-3 characters requires knowledge of this ancient language. The fact that some signs correspond to two letters. Thus, for example, a word consisting of three characters can fit up to six letters of which three. In the end, you need six characters to define the semantic word of three letters. Of course, without knowledge of this language make it very difficult even with a dictionary.
    If you are interested, I am ready to send more detailed information, including scans of pages showing the translated words.

