Is the Voynich Manuscript written in an unidentified language? Part 1

By Gordon Rugg

The short answer to the question in the title: Almost certainly not.

Linguists have been identifying previously-undeciphered languages for a long time, and they’re pretty good at it now. This section looks at some methods that let you take an unidentified language and work out what it’s likely to be. When you apply those methods to the Voynich Manuscript, the results are very, very odd. In this article, I’ll give a brief overview of the methods. In the next article, I’ll look at what happens when you apply them to the Voynich Manuscript.

A simple example

Let’s imagine that a linguist in the far future is trying to make sense of the text below, recovered from the ruins of a long-abandoned city. Where would they start? I’ll work through the steps in some detail, since they involve some important points; if you’re already familiar with the principles, you might prefer to skip to the end of this section.

wolf detail

One quick and dirty place to start is what’s shown in the picture. Pictures aren’t always directly related to the text next to them, but they’re well worth trying as a starting place. In this example, the linguist gets off to a promising start.

The picture looks like a wolf, and the second word of the text beneath the picture is “wolf”. However, the rest of that text clearly isn’t in English, so we’re not dealing with an English text.

There’s also the problem that a lot of names for objects are borrowed across different and unrelated languages, so just because you can identify one or two words as being from language A, that doesn’t mean that the rest of the text is in that same language.

So what could the linguist do next? One useful rule of thumb is that small words in a language are often the most frequently used words, like “a” and “the” in English. Another useful principle is that languages can be grouped into families, where a lot of the vocabulary is pretty similar across languages within a family.

If we try the idea that the unidentified language is from the same language family as English, and that “is” in that language means the same as “is” in English, then we get the following sentence (original language in italics, English in normal face).

De wolf is de voorvader van de hond.

That looks plausible so far. For the next step, we can look at words that are similar to English, but not identical. Could “de” be the equivalent of English “the”? If our future linguist has a German dictionary, then they would find that German has several words corresponding to “the”. All of those German words start with “d” and one of them is “der”. So it’s plausible that our unidentified language has “de” for “the”.

That gives us “The wolf is the voorvader van the hond”.

The word “hond” looks like the English “hound” and there’s a German word “Hund” that means “dog”. That means that the similarity is unlikely to be a coincidence; more likely, we’re dealing with a language related to English and German, and the word really does mean something like “dog”.

If we plug in that provisional translation, we get: The wolf is the voorvader van the dog”.

Another check with the German dictionary gives us “von” for English “of” and “Vater” for English “father”, so we now have: “The wolf is the voorfather of the dog” and from that, we can get: “The wolf is the forefather of the dog”. That is a reasonable translation, and the language involved actually is related to English and German; it’s Dutch, and the image above is from the Dutch Wikipedia site.

I’ve laboured the point in this example deliberately, because it’s important, but easy to overlook. Each of the steps above involves cross-checking the provisional translations against a related language, to see whether we’re likely to be dealing with more than just accidental coincidences. Linguists are very wary of individual words in one language that resemble words in another language, because coincidences and borrowings are very common, and can lead the amateur horribly astray.

As an example of coincidences within related languages, there’s a German word “Mist” that looks identical to the English word “mist” but that actually means “manure”. As an example of borrowings, English has the word “taboo” and German has the equivalent “Tabu”. In both cases, the word is a borrowing from Polynesian, but that doesn’t mean that either German or English is a Polynesian language.

So, linguists are a lot more comfortable with language identifications that involve widespread patterns of similarities between languages than when they only find isolated words in common. In the case of English and German, for instance, there’s a widespread pattern of English “th” matching a German “t” or “d” as in “father” and Vater” – it’s not just something that occurs once or twice.

A harder example

Here’s another example, that will be a lot harder for anyone who only speaks English.

This is a short explanation; if you’d like to know more about the principles, I’ve blogged earlier and in more detail about how a linguist can identify features within this text that narrow down the range of likely languages involved.


In this particular case, the text appears to show a feature called vowel harmony, that occurs in a comparatively small number of languages, where there are restrictions on which vowels can occur within the same word. Another feature is that several of the words share the same final syllable (e.g. “ata” and “oinen”), which suggests that this is a language which uses the ends of words for grammatical purposes. When you put these two features together, the list of likely languages becomes even smaller, and one of the languages in that list is Finnish, which is in fact the language of this text.

Another harder example

To someone who only speaks English, it might appear obvious that languages use the ends of words for grammatical purposes (e.g. “walks” versus “walked” or “walking” in English). In fact, although that’s what happens in many languages, there are also a lot of languages which don’t do that. Here’s an example.


Here, the start of the word is being used for grammatical purposes – in this case, to show whether the word is singular or plural, as well as showing the grammatical class of the nouns. Other Bantu languages use a similar approach to this Swahili example.


Identifying a language is a well-understood task, and you can narrow down the possibilities pretty swiftly if you know what to look for. For instance, if the language shows vowel harmony, that gives you one shortlist of likely candidates; if it shows systematic patterns in the initial sylables of words, then that gives you another shortlist of likely candidates; if a lot of its vocabulary shows systematic correspondences with the vocabulary of a known language, then that gives you yet another shortlist, and so forth.

Even if you can’t get an exact match, you’ll probably be able to identify the likely language family, and you’ll also probably be able to rule out a whole batch of languages and language families.

When you use this approach on the Voynich Manuscript, the results quickly start to look odd. That’s the topic of my next article.


1 thought on “Is the Voynich Manuscript written in an unidentified language? Part 1

  1. Pingback: Voynich articles overview | hyde and rugg

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.