Is the Voynich Manuscript in an unidentified language? Part 2

By Gordon Rugg

In the first part of this pair of articles, I looked at the general principles that linguists use when trying to identify a previously unknown language.

In this article, I’ll look at what happens when you apply those principles to the Voynich Manuscript.

In brief, it doesn’t end well for the idea that the manuscript is written in an unidentified language. That idea was tried and rejected by the specialists decades ago, for very good reasons. Anyone trying to resurrect the “unidentified language” theory needs to show that they’ve found a convincing set of counter-arguments to those reasons for rejection. So far, nobody’s come close; instead, the recent theories simply ignore the show-stopping problems.

Here are some of those reasons.


(Image courtesy of the Beinecke Library.)

Looking for names

One obvious place to begin is the illustrations. It looks reasonable to assume that if the manuscript is a notebook in an unknown language, then the text on each page will relate to the illustration on that page. It’s also a fairly reasonable-looking assumption that on the pages showing a  plant, then the name of that plant will be at the start of the page.

It’s an obvious approach, and it was tried half a century ago by the earliest Voynich Manuscript researchers. They soon abandoned it. Here’s why.

The two images below are close-ups of the first few words of two often-reproduced pages.

voynich unidentified first lines

(Images courtesy of the Beinecke Library.)

In both cases, the first letter on the page is a tall, distinctive letter, though the letters are different between the two pages. They’re known among Voynich researchers as gallows letters.

If you look at the first words of the pages with pictures of plants on them, then you notice something odd, that Voynich researchers spotted decades ago. In half of the pages with plant illustrations, the first word on the page begins with one or other of those two gallows letters.

This means that either half of the plant names in the manuscript begin with one of just two letters, or that the first word on a plant page often isn’t the name of the plan, or some combination of those two features.

Even if we accept that sometimes a plant page won’t begin with a plant page, there’s still the question of why so many of the first words on those pages begin with one of those two letters. It isn’t because they begin with something equivalent to “The…” in English, because the opening words are different on each page.  We’re seeing different words starting each page, but half of those words begin with one of two letters. This is something that recent claims of decipherment fail to mention or explain.

In summary, the “first words are names” idea has some major problems.

Leaves and roots

Another reasonable idea is to assume that pages with pictures of plants will contain descriptions of plants, and of what those plants can be used for. You’d expect those pages to contain descriptions of leaves and flowers and roots and seeds; you’d expect to find the names of illnesses that the plants were supposed to cure. You’d expect to find those words repeatedly on the pages about plants, but nowhere near so frequently on the pages with other pictures, such as pictures of zodiacs or stars.

It’s a reasonable idea, but again, the early Voynich researchers tried it, and again, they abandoned it.

There just wasn’t any pattern of particular words showing up mainly within particular sections of the manuscript, in the way that you’d expect if the plant pages really were about plants, and so forth. Again, this is something that recent claims of decipherment don’t mention, other than as future work, even though it’s been done repeatedly years ago, and come up empty.

Unwelcome patterns

Many of the early Voynich researchers were world-class code breakers, and one of the things they were very good at was finding statistical regularities in texts. That’s a classic way of breaking a code. Within any given language, some letters occur more often than others in written text, so if you know those frequency patterns, you have a good chance of getting through a code.

When those researchers looked at the statistics in the Voynich Manuscript’s text, they soon found some very odd regularities.

We can see some of those regularities in the opening lines we looked at earlier.

voynich unidentified first lines

(Images courtesy of the Beinecke Library.)

In the upper image, four of the five lines begin with a gallows letter, but none of the following words within the image start with a gallows letter. There’s a similar pattern in the lower image, where none of the second words start with a gallows letter. You might wonder whether the gallows letters were simply the equivalent of capital letters, but then you see that there are a lot of words with a gallows character in the middle.

It soon became clear to the early researchers that within the Voynich Manuscript, the line breaks weren’t just arbitrary points where the writer ran out of space and started a new line. Instead, each line was a separate unit, and there were noticeable regularities in what happened within each line.

For example, some letters are very rare at the start of a line, but very common elsewhere within a line. There are some features of poetry that look superficially like that, such as alliteration in Old English poetry, but those resemblances are only superficial, for reasons that become very clear once you start looking at the detailed facts.

There’s a good description of this in a paper by Currier, reproduced on René Zandbergen’s site. Currier was Director of Research, Naval Security Group, within the US military.

Currier described the significance of this line effect as follows.

“The Line Is a Functional Entity.

“In addition to my findings about ‘‘languages’’ and hands, there are two other points that I’d like to touch on very briefly. Neither of these has, I think, been discussed by anyone else before. The first point is that the line is a functional entity in the manuscript on all those pages where the text is presented linearly. There are three things about the lines that make me believe the line itself is a functional unit. The frequency counts of the beginnings and endings of lines are markedly different from the counts of the same char-acters internally. There are, for instance, some characters that may not occur initially in a line. There are others whose occurrence as the initial syllable of the first ‘‘word’’ of a line is about one hundredth of the expected. This by the way, is based on large samples (the biggest sample is 15,000 ‘‘words’’), so that I consider the sample to be big enough so that these statistics are significant. “

(Currier, 1976/1992, on René Zandbergen’s site)

He went on to say the following:

“These Findings Should be Considered by Anyone Who Studies the Manuscript.

“These findings are definite enough, I think, to warrant much further study by anyone who is going to be involved in seriously attacking the text of the Voynich manuscript. I have no interpretations of them, by the way; I have no solutions. All I know is that they are significant — and damn significant. Anyone who attempts to work on the text without considering these, ignores them at his own peril. “

(Currier, ibid.)

There are a lot of other odd statistical regularities within the text of the Voynich Manuscript. This is a subject that has been studied in depth for decades. Those regularities are so complex that it’s hard to imagine any human managing to produce them deliberately in a written text. Many of them involve aspects of statistics that weren’t invented until centuries after the manuscript’s likely date of origin.

In other words, experts in cracking mysterious texts tried the straightforward “unidentified language” out in depth forty years ago, and found fatal flaws in it.

Once again, recent claims of decipherment simply don’t address this in anywhere near the depth that they need to.

Other routes

There are other ways to tackle an unidentified language, and the obvious ones were tried on the Voynich Manuscript long ago, and they also produced odd findings that are inconsistent with any known language.

Word clusters

One example is looking for groups or patterns of words. For instance, in English the phrase “on top” is usually followed either by the word “of” or by the end of the sentence. On a larger scale, English prefers to have adjectives before nouns, as in “black cat” whereas many other languages prefer to have adjecties after nouns. All known languages have rules about word order. Sometimes a writer will break one of those rules for literary effect, but only for the occasional sentence. However, the text in the Voynich Manuscript doesn’t appear to have any rules about word order.

Syllable frequencies

There are also oddities about syllable distributions across the different sections of the Voynich Manuscript.

If you plot the distribution of frequently occurring syllables within a normal human language text, those syllables remain constant in frequency throughout it.

macbeth common syllables

However, if you plot the distributions of four frequently occurring Voynich Manuscript syllables, their frequency varies dramatically across sections of the manuscript, often with very abrupt transitions.

common syllables in VMS no lines

It’s long been recognised that the Voynich Manuscript is written in at least two different “dialects”. These dialects are more different from each other in syllable distributions than English is from German. The illustration below shows the distributions of the syllables “er” and “de” within a German-language book that has endnotes in English. There’s a slight change in frequencies near the bottom of the image that marks the transition between languages. It’s much less of a transition than the transitions within the Voynich Manuscript.

german er de

This feature is very difficult to reconcile with the idea that the Voynich Manuscript is simply written in an unidentified language.

The lack of corrections

Another oddity is that the Voynich Manuscript does not appear to contain any corrections in the form of erasures, strikings-out or insertions. There are a couple of over-writings, but those might well be accidental side effects from the oak gall ink used to write the manuscript. Oak gall ink can be transparent when first applied to a page, so if you lose your place when writing, it’s possible to over-write what you’ve just written before it darkens and becomes visible.

For a book of over two hundred pages to have no corrections is unusual, to say the least.


The idea that the Voynich Manuscript is written in an unidentified language just doesn’t hold up to the evidence. That idea was tried decades ago, and it was abandoned because it failed to provide any explanation for numerous features of the manuscript.

Anyone claiming that the Voynich Manuscript is simply written in a previously unidentified language, without any encoding, has to produce good explanations for the odd linguistic features listed above, and for all the other odd features described in the Voynich literature, particularly the statistics, if they want to be taken seriously. Simply ignoring those oddities, and focusing only on the evidence that agrees with a pet theory, is like a tourist in Africa claiming that a fast-approaching animal is a cow rather than an elephant because the animal has four legs and a tail, just like a cow does, while ignoring the contrary evidence of the large tusks and the fact that the animal is ten feet high. In both cases, ignoring unwelcome facts is unlikely to end well…


To keep this article short and (I hope) clear, I’ve not gone into detail about the statistical oddities of the text in the Voynich Manuscript. There’s plenty of further material in the Voynich Manuscript literature. Most of that literature is online, but some key articles are behind paywalls, as is often unfortunately the case in academic research.

For further reading, René Zandbergen’s site is a good place to start. It contains links to the sites of the other main Voynich Manuscript researchers.

Currier, 1976/1992: Here are the details from the site. Because the document went through various forms, I’ve included the full opening rubric from the paper for clarity, rather than just giving a Harvard-style reference.

“Papers on the Voynich Manuscript

“Captain Prescott H. Currier

“These papers and statistical tabulations by Prescott Currier originally appeared in New Research on the Voynich Manuscript: Proceedings of a Seminar. This privately circulated typewritten manuscript, dated 30 November 1976, Washington, D.C., was edited by M. E. D’Imperio, who served as moderator at the seminar. Jacques Guy and Jim Reeds transcribed Currier’s work into its present form in January 1992.”


7 thoughts on “Is the Voynich Manuscript in an unidentified language? Part 2

  1. Pingback: Voynich articles overview | hyde and rugg

  2. Pingback: Neither researchers nor the media can put down the world’s most mysterious book – and it’s a problem for science – us

  3. Pingback: Neither researchers nor the media can put down the world’s most mysterious book – and it’s a problem for science – My Blog

  4. Pingback: Neither researchers nor the media can put down the world's most mysterious book – and it's a problem for science - All The News

  5. Pingback: | Neither researchers nor the media can put down the world’s most mysterious book – and it’s a problem for science

  6. Pingback: El misterio del manuscrito Voynich – enigmas

  7. Pingback: El misterio del manuscrito Voynich - enigmas

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.