Home Top   Prev Long tour
Map Pages Gloss Pics Refs

Short tour: analysis of the text

Transcription of the Voynich MS

All those who studied the Voynich MS realised the need to be able to convert the text to Roman characters. The only exception to this rule was Fr. Th. Petersen of the Catholic University of America, who made a complete hand transcription of the MS using the original alphabet. This was in the 1930's, well before the introduction of the computer.

The transcription of the Voynich MS into a computer-readable form has been addressed already in a previous page. The transcription alphabet used in these web pages is primarily the EVA alphabet, for which a more detailed description is given in a dedicated page.

The following figure was created using the "Voynich EVA Hand 1" font created by Gabriel Landini. The EVA text representing this script is given below it. It is a sample transcription of the start of folio 1r of the manuscript.

fachys.ykal.ar.ataiin.shol.shory.cthres.ykor.sholdy
sory.cthar.or.y.kair.chtaiin.shar.are.cthar.cthar.dan
syaiir.sheky.or.ykaiin.shod.cthoary.cthes.daraiin.sa
o'oiin.okeey.oteor.roloty.cth*ar.daiin.otaiin.or.okan
sair.y.chear.cthaiin.cphar.cfhaiin - ydaraishy

The choice of the transcription alphabet will have an impact on numerical analysis done on the Voynich MS text. This is most obvious for the calculation of the word length distribution, since the number of charcters to represent one 'glyph' of the Voynich MS text is different for each alphabet.

Analysis results

Introduction

It is difficult to present the multitude of analyses that have been performed on the Voynich MS text in an orderly fashion. The approach adopted here is to start at the character-level and advance through analysis of single-word properties to analyses of syntax. This includes the following topics:

  1. Character-level topics
    • Character frequencies
    • Character entropy
    • Vowel/consonant identification
  2. Word-level topics
    • Morphology of the Voynich MS words
    • Word frequencies
    • Word entropy
    • Zipf's law
    • Are word spaces significant?
  3. Syntax-level topics
    • Cluster analyses
    • Long-range correlations

It is necessary to introduce a few concepts first.

Currier languages

In a famous presentation in 1976, Prescott Currier showed that, on the basis of counts of character pairs and words, each page of the Voynich MS appears to be written in one of two distinct 'languages' which he called A and B. He was careful to point out that these are not necessarily different languages, but could be dialects, subject matter or different encryption.

The Herbal section appeared to be a mixture of A-language and B-language pages, and the distribution of these pages was strictly according to the bifolio, i.e. entire bifolia are written in one language. He also saw two different handwriting styles which were fully correlated with the type of language. This has already been shown in a previous page.

Entropy

This was first studied by Bennett and by Krischer. An explanation of the meaning of entropy will be included here. In addition, see the following contributions:

Word paradigms

Several authors have identified structures in the composition of words in the Voynich MS. These are, roughly in chronological order:

Tiltman's split in roots and suffixes
Tiltman observed that many words in the recipes section (which was the only sample he had available) were composed of two parts, which he called a root and a suffix, and he prepared a table of roots and suffixes.
Every combination of a 'root' and a 'suffix' results in a valid word.

Mike Roe's generic word
The following pattern was contributed to the Voynich MS mailing list by Mike Roe. It is represented here after a translation to the EVA alphabet. Each path from left to right represenst a valid word, and Mike suggested that this could perhaps present evidence of grammar of the Voynich language:

  
                      +- o  --+  +- r -+
 o   --+           +--+       +--+     +--+
       |  +- t -+  |  +- cho -+  +- l -+  |
 qo  --+--+     +--+                      |
       |  +- k -+  |  +- e ---+           |
 cho --+           |  |       |           |
                   |  +- ee --+           |
                   |  |       |           |
                   +--+- che -+-- y ------+------>
                   |  |       |           |
                   |  +- ch --+           |
                   |  |       |           |
                   |  +- sh --+           |
                   |  |       |           |
                   |  +-------+           |
                   |                      |
                   |  +- al ---+          |
                   |  |        |          |
                   +--+- am ---+----------+
                      |        |
                      +- ain --+
                      |        |
                      +- aiin -+

Robert Firth's split into odd and even groups
A different approach was taken by another list member, Robert Firth, who, ignoring the word spaces, was able to define two lists of characters and character groups, such, that the text in the Voynich MS consists of alternating items from these lists. The split is not entirely unambiguous but reportedly it works for most of the MS. It is explained in his >> Note Nr.24

Jorge Stolfi's analyses
Jorge Stolfi has progressively analysed the word structure, culminating in the definition of a word composition grammar, which is presented at his web site. It clearly demonstrates that the text of the Voynich MS does not behave like 'normal' European languages.

Vowel/consonant detection

Early list member Jacques Guy has experimented with a vowel/consonant detection algorithm of Sukhotin. While the separation was not as clear as in languages like Latin or Spanish, certain characters did come out as vowels, namely the characters (Eva) o, a, e, and y.

Quantitative studies

Character and word frequencies

This must still be included. The word frequencies are discussed also in some detail in the section about Zipf's law.

Entropy

Some discussions on the unusual entropy of the Voynich MS text:

Cluster analysis of the Currier languages

As a start, see the following contributions:

Zipf's laws

See:

Other approaches


Home Top   Prev Long tour
Map Pages Gloss Pics Refs
Copyright René Zandbergen, 2010
Comments, questions, suggestions? Your feedback is welcome.
Latest update: 2010/04/02