Hoaxing the Voynich Manuscript, part 2: The attractions of a mysterious language

By Gordon Rugg

Imagine that you’ve gone back in time, and that you want to produce the Voynich Manuscript as a hoax. How could you do that, and what problems would you need to solve?

The previous article looked at possible motivations, and at the issues involved if you’re doing it primarily for the money. One major issue was making the hoax convincing enough to pass expert scrutiny. If someone is going to spend a lot of money on a mysterious manuscript, they’re going to have it checked by some relevant experts before they part with any hard cash, so you’ll need to fool those experts somehow.

This article looks at the problems you’d face with one particular set of experts, namely the experts on languages.

voynich repetitive text

Why experts on languages?

If you’re going to hoax a document for money, then you have several options available. Three common choices are:

  • Hoaxing a long-lost specific document known to have existed once, such as one of Aristotle’s lost books
  • Hoaxing an item of an established type, such as a fragment of an early biblical manuscript
  • Hoaxing something unique

All of these get you into problems with language. There are also a lot of potential problems with the content, which we’ll examine in a later article.

Hoaxing a long-lost real document

Here’s an example of a long-lost document that’s accepted as real.


It’s a page of a lost book by Archimedes, rediscovered in a palimpsest (a book written on re-used old parchment). The original text runs left to right; after the parchment had been washed, some of the original text remained, and darkened gradually with time until it became readable again. The palimpsest had been in a monastic library for centuries, so it has a well-documented source; Archimedes was known to have written a book matching its description, and everything about its style and its content is consistent with its being the lost book, so there’s no serious argument about its authenticity.

Hoaxing something like this is feasible, but difficult. You’ll need to speak the language in question fluently, and you’ll also need to to use the right dialect of that language. Suppose, for instance, that you wanted to hoax Aristotle’s lost book on comedy. Would his dialect use the word thalassa or the word thalatta to mean “sea”? You’d need to get that sort of detail right all the way through, and if you’re trying to hoax an entire book, then that’s a real challenge.

You’d also need to get his personal style right. It’s easy enough to copy his best-known phrases, but getting style right goes far beyond that. For instance, are there some phrasings that he deliberately avoids using on stylistic grounds, the equivalent of a pedantic English writer who hates split infinitives such as “to boldly go”? Getting those significant absences right needs a lot of knowledge, and it’s the sort of thing that experts on Aristotle know a lot about, so you’ll need to be good to fool them.

You’ll need to be good, but you don’t necessarily have to be perfect. If your hoax is good enough, then a classic human bias starts to work in your favour. It’s known as confirmation bias. This involves people selectively favouring the information that is consistent with their personal beliefs, and selectively ignoring or attacking information that is inconsistent with those beliefs.

So, if your hoaxed document is good enough to convince someone, then they’ll start looking for evidence supporting their belief that it’s genuine, and they’ll start looking for plausible explanations for any mistakes that you’ve made. The more they do this, the harder it is for them to change their mind and admit that they were wrong.

This issue is particularly visible with the next type of hoax.

Hoaxing something fictitious from a known category

This faces some similar problems to hoaxing a known item, but it gives you a bit more room for manoeuvre, because you don’t need to worry about copying the writing style of a known historical figure. In addition, you can include features that provide ready-made explanations for at least some of the mistakes you make.

The “known category” refers to things like “A soldier’s diary from the campaign of…” or “A fragment of a Roman census record” where the campaign or the census happened. The key feature is that the item isn’t being linked to a particular known person associated with that event, or a particular known missing text, unlike the previous type of hoax.

Here’s an example.


This is the Kensington Runestone, allegedly found in 1898 in Kensington, Minnesota.  Its text is a description of a Norse expedition by eight Geats and 22 Norwegians to North America in 1362. There are numerous authentic runestones in Scandinavia, and the Kensington Runestone appeared to fit into that well-established category, except for being in North America.

Experts in Scandinavian linguistics and historical runes have almost unanimously declared it to be a hoax, because of numerous odd features in its language and the form of the runic letters.

However, the nature of the runestone provides plenty of ready-made explanations to support the people who believe that it is genuine. It’s a short inscription, describing how ten men from the expedition were found dead from violence. In that context, it’s plausible that anyone carving a memorial stone might not pay as much attention to formal grammar and spelling as they might normally do. The inscription also mentions that two different groups of people were on the expedition, so any oddities in the language within the inscription could be attributed to the mix of dialects being spoken. Similarly, any oddities in the spelling or writing style or runic style could be attributed to the writer being a poorly-educated adventurer, rather than an academic.

Once people start looking for pieces of evidence consistent with the runestone being genuine, there’s a fair chance that they’ll turn up something. The world isn’t a tidy place, in which all the evidence is unequivocally pointing one way or the other. Where experts differ from amateurs in such cases, however, is that experts tend to look at the overall pattern of evidence, whereas amateurs are more likely to go with the handful of cherry-picked evidence that matches their pet theory, and to discount the bulk of the evidence that doesn’t match it. This is an issue that reappears in the next part of this article, which covers the linguistic features of the manuscript.

So, in summary, a lot of hoaxes of ancient texts use the following approach:

  • keep the text short, since it’s less work and also there’s less chance of making a major mistake
  • make it informal, so that any mistakes can be blamed on a poorly-educated writer; and
  • make the text appear to be evidence of something that a lot of people want

The Voynich Manuscript matches two of these criteria; it’s written in an informal style, rather than on neatly-drawn lines in beautiful handwriting, and it appears to be an ancient, mysterious text, which appeals to a lot of people. However, it’s very different in the other respect; it’s a long document, well over two hundred pages. That’s interesting, and it leads us into the third approach to hoaxing documents.

Hoaxing something unique

The Voynich Manuscript is unique. Unique items tend to be more valuable than non-unique items, and “more valuable” is a tempting phrase to anyone perpetrating a hoax to make money.

However, if it is a hoax, then why would anyone want to hoax something that’s over two hundred pages long, when they could produce something much shorter and simpler that could still sell for respectable amounts?

One possible answer involves simple economics. An entire mysterious book is likely to be worth more than a single mysterious page. If you can produce an entire book’s worth of text cheaply and quickly, then you’ll have a larger set of initial costs than for a small hoax, but you’ll more than outweigh that with the increased revenue from selling a big hoax.

Simple economics would also explain why the manuscript is written in what looks like an unknown language. A useful research tip is to turn a difficult problem round. In this case, imagine that the Voynich Manuscript was written in a known language. Now imagine trying to sell it to a rich collector, who could get the contents translated. How much would the manuscript be worth in that situation? Probably nowhere near as much as it would fetch while it was still a mystery. Mysteries were worth money in the past, just as they are now.

So, there’s a good financial incentive for a hoaxer to opt for producing an entire book in an apparently unknown language, if they can produce it cheaply enough.

This is where the chain of reasoning starts to get really interesting.

Hoaxing an unknown language

The language of the Voynich Manuscript is unique in several ways.

  • It looks like an unknown language
  • It looks as if it doesn’t belong to any known language family
  • It doesn’t have any of the features that linguists are on the look-out for when they encounter a language for the first time.The first two of these are good from a hoaxer’s point of view, but the third is much more problematic, as described below.

The first two of these are good from a hoaxer’s point of view, but the third is much more problematic, as described below.

Looking like an unknown language

This is a good strategy for a hoaxer because it means that they don’t need to worry about getting the grammar and spelling and content right. It also adds to the potential sale value of the document. It’s easy to see the advantages of this strategy, even without any knowledge of linguistics.

Looking like an unknown language family

This is also a good strategy for a hoaxer. However, understanding what it means, and why it’s a good strategy, requires some background knowledge about how languages work.

Most human languages can be grouped into language families. There are some exceptions, such as Basque and Burushaski, but the majority of languages belong to identifiable language families. In Europe, for instance, the Romance languages such as French and Spanish and Italian are all descendants of Latin. This common descent is reflected in systematic similarities in vocabulary. Within the Germanic family of languages, for instance, English has the word hound, German has Hund and Dutch has hond, all with similar meanings.

Most people have heard of language families, and know the general principle of languages within a family having similar vocabulary and other features.

What’s less well known is how systematic many of the similarities are within a language family, and between related language families.

One of the major discoveries of nineteenth century linguistics was the relationships between a geographically widespread set of families now grouped together as Indo-European languages. This work was based on identifying systematic correspondences between languages, as opposed to the occasional similarities in vocabulary that you get via coincidence or loanwords. So, for instance, there are systematic correspondences between the consonants that you see in Latin and the consonants that you see in the Germanic languages, to the point where you can predict fairly accurately what the Germanic word will be, if you know the Latin one. In fact, it’s often possible to predict that a particular word is a later loan word simply because it doesn’t follow these systematic trends.

Why would this matter to a hoaxer? It would matter to a modern hoaxer because if you tried to hoax a text that looked like a Germanic language, for instance, then you’d need to produce a set of systematic correspondences against other Indo-European languages that would look plausible to an expert in that field. That’s a lot more difficult than it looks. If you’re wondering why it’s difficult, then you could try looking at the discussions about the language of the Kensington Runestone, and see the level of linguistic detail that those go into, for a document less than a hundred words long. Now imagine the level of discussion that you’d get about a document over two hundred pages long…

So, if you’re trying to hoax something that looks like an unidentified language, then it’s a good idea to make it look as if it doesn’t belong to any known language family. That’s something that a present-day hoaxer would probably do as a conscious decision once they’d done some background reading. A hoaxer working before the nineteenth century almost certainly wouldn’t do this as a conscious decision, simply because the concept of language families wasn’t properly formulated before then. However, an early hoaxer might easily have produced the same effect by accident, as a side-effect of some hoaxing mechanisms which will be discussed in a later article.

That takes us into a third way in which the text of the Voynich Manuscript is unique.

Having unique linguistic features

Linguistics students learn a lot about features to look out for when they encounter a new language.

Some of those features help you work out whether that language belongs to a known language family. For instance, if you meet a language that forms plurals by changing the first syllable of the word, then it may be one of the Bantu language family that’s widespread in Africa, and it probably won’t be Indo-European. Conversely, if it forms plurals by changing the last syllable of a word, then it may be Indo-European, but it probably won’t be Bantu. As you’ve probably already guessed, the text of the Voynich Manuscript doesn’t contain any features that fit with any known language family. That’s unusual, but it’s far from unique. There are quite a few languages around the world that don’t belong to any known language family.

Other features occur across a range of language families. An example is distinctive use of tone, which occurs in a lot of East Asian languages, but also in some West African and central American languages, across a wide range of unrelated language families. That’s why the writing systems of languages like Vietnamese have so many diacritics (accents and twiddly symbols) above their vowels, as in these examples of tonal tongue twisters (from Wikipedia).

Pinyin (Chinese): māma mà mǎ de má ma?

Vietnamese: bấy nay bây bầy bẩy bẫy bậy

Many of the features you look out for are rarer. For example, a feature that crops up in several unrelated languages is vowel harmony. In these languages, there are restrictions on which vowels occur within the same word. A common pattern is that if a word contains an i or an e, then it can’t contain an o or a u.

As you might have already guessed, the text of the Voynich Manuscript doesn’t show any of these features that you learn to look out for.

That’s an interesting set of absences. Each one on its own wouldn’t be a major issue, but taken together, the absence of all these features is starting to look odd, to anyone familiar with linguistics. From a hoaxer’s point of view, that’s bad news, since it increases the risk of experts becoming suspicious.

What’s even more interesting is that it would be quite easy to hoax some of these features in a way that would look plausible to a sceptical expert, and reduce suspicion about a hoax. For instance, it’s very easy to hoax vowel harmony using the table and grille method that I described. So why didn’t that happen with the text in the Voynich Manuscript, if it’s a hoax?

One possible explanation is that the hypothetical hoaxer either didn’t know or didn’t care about these features. This will be a recurrent theme in later articles in this series.

Another explanation that has been proposed is that the text in the manuscript isn’t a hoax, but a real human language. However, that explanation doesn’t stand up well, because of another set of linguistic features that don’t occur in the Voynich Manuscript. These features are at the heart of the debate about whether the Voynich Manuscript contains a real, unidentified human language, or a code, or meaningless gibberish.

They’ll be the topic of the next article in this series.

Notes and links for further reading





5 thoughts on “Hoaxing the Voynich Manuscript, part 2: The attractions of a mysterious language

  1. Pingback: Voynich articles overview | hyde and rugg

  2. Pingback: Hoaxing the Voynich Manuscript, part 7: Producing the text | hyde and rugg

  3. Pingback: One hundred Hyde & Rugg articles, and the Verifier framework | hyde and rugg

  4. Pingback: 150 posts and counting | hyde and rugg

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.