Home Top   Prev  
Map Pages Gloss Pics Refs

Analysis of the text

Transcription of the Voynich MS

All those who studied the Voynich MS realised the need to be able to convert the text to Roman characters. The only exception to this rule was Fr. Th. Petersen of the Catholic University of America, who made a complete hand transcription of the MS using the original alphabet. This was in the 1930's, well before the introduction of the computer, of course.

The transcription of the Voynich MS into a computer-readable form has been described in detail in a previous page. The transcription alphabet used in these web pages is primarily the EVA alphabet, for which a more detailed description is given in a dedicated page (temporary copy of a page by Gabriel Landini).

A short example will do here:
The following figure was created using the "Voynich EVA Hand 1" font created by Gabriel Landini. The EVA text representing this script is given below. It is a sample transcription of the start of folio 1r of the manuscript.

fachys.ykal.ar.ataiin.shol.shory.cthres.ykor.sholdy
sory.cthar.or.y.kair.chtaiin.shar.are.cthar.cthar.dan
syaiir.sheky.or.ykaiin.shod.cthoary.cthes.daraiin.sa
o'oiin.okeey.oteor.roloty.cth*ar.daiin.otaiin.or.okan
sair.y.chear.cthaiin.cphar.cfhaiin - ydaraishy

The choice of the transcription alphabet will have an impact on numerical analysis done on the Voynich MS text. This is most obvious for the calculation of the word length distribution, since the number of charcters to represent one 'glyph' of the Voynich MS text is different for each alphabet.

Analysis results

Introduction

This section is an overview of the analysis of the text of the Voynich MS that has been performed over the 90+ years since its discovery. It is not complete. A more detailed description is presented in the Analysis Section of this site, which equally cannot claim completeness.

It is difficult to present the multitude of analyses that have been performed on the Voynich MS text in an orderly fashion. The approach adopted here is to start at the character-level and advance through analysis of single-word properties to analyses of syntax, thus:

  1. Character-level topics
    • Character frequencies
    • Character entropy
    • Vowel/consonant identification
  2. Word-level topics
    • Morphology of the Voynich MS words
    • Word frequencies
    • Word entropy
    • Zipf's law
    • Are word spaces significant?
  3. Syntax-level topics
    • Cluster analyses
    • Long-range correlations

This does, however, require the introduction of a few concepts before anything else.

Currier languages

In a famous presentation in 1976, Prescott Currier showed that, on the basis of counts of character pairs and words, it appears that each page of the Voynich MS appears to be written in one of two distinct 'languages' which he called A and B. He was careful to point out that these are not necessarily different languages, but could be dialects, subject matter or different encryption. It should be pointed out in addition here, that this conclusion was based on his study of about half the pages of the MS.

The Herbal section appeared to be a mixture of A-language and B-language pages, and the distribution of these pages was strictly according to the bifolio, i.e. entire bifolia are written in one language. He also saw two different handwriting styles which were fully correlated with the type of language. This has already been shown in a previous page.

A >> complete copy in PostScript, including the tables has been prepared by J.Reeds (link refers to mirror by J.Stolfi).

Morphology, syntax, grammar

Several sources list 'odd features' of the MS text, in particular D'Imperio, Currier and Tiltman. The are presented in detail in the analysis section.

Word paradigms

Several authors have identified structures in the composition of words in the Voynich MS. These are, roughly in chronological order:

Tiltman's split in roots and suffixes
Tiltman observed that many words in the stars or recipes section (which was the only sample he had available) were composed of two parts, and he set up the following table:

Roots Suffixes
ok   of an   ain   aiin   aiiin
ot   op ar   air   aiir   aiiir
qok   qof al   ail   aiil   aiiil
qot   qop or
ch ol
Sh ey   eey   eeey
d edy   eedy   eeedy
s  

Every combination of a 'root' and a 'suffix' gives a valid word.

Mike Roe's generic word
The following pattern was contributed to the Voynich MS mailing list by Mike Roe. His system is represented here translated to the EVA alphabet. Each path represenst a valid word, and Mike suggested that this could perhaps present evidence of grammar of the Voynich language:

  
                      +- o  --+  +- r -+
 o   --+           +--+       +--+     +--+
       |  +- t -+  |  +- cho -+  +- l -+  |
 qo  --+--+     +--+                      |
       |  +- k -+  |  +- e ---+           |
 cho --+           |  |       |           |
                   |  +- ee --+           |
                   |  |       |           |
                   +--+- che -+-- y ------+------>
                   |  |       |           |
                   |  +- ch --+           |
                   |  |       |           |
                   |  +- sh --+           |
                   |  |       |           |
                   |  +-------+           |
                   |                      |
                   |  +- al ---+          |
                   |  |        |          |
                   +--+- am ---+----------+
                      |        |
                      +- ain --+
                      |        |
                      +- aiin -+

Robert Firth's split into odd and even groups
A slightly different approach was taken by another list member, Robert Firth, who, ignoring the word spaces, was able to define two lists of characters and character groups, such, that the text in the Voynich MS consists of alternating items from these lists. The split is not entirely unambiguous but reportedly it works for most of the MS. It is explained in his >> Note Nr.24

Jorge Stolfi's split into 'soft' and 'hard' characters
Jorge Stolfi discovered another fascinating structure in the words of the Voynich MS, by grouping all characters into 'soft' and 'hard' and showing that the vast majority of words consists of one, two or three groups, which he calls prefix, stem or midfix, and suffix. The first and last consist of 'soft' characters, and the stem or midfix of hard characters.

This is explained in detail at his >> web site

Jorge Stolfi's fine structure of words in the MS
This is also known as the 'OKOKOKO' paradigm.
This is also explained in detail at his >> web site

Vowel/consonant detection

Included here will be at least:

Quantitative studies

Character and word frequencies

This must still be included. The word frequencies are discussed also in some detail in the section about Zipf's law.

Entropy

This was first studied by Bennett and by Krischer. An explanation of the meaning of entropy will be included here. In addition, see the following contributions:

More to be included.

Cluster analysis of the Currier languages

As a start, see the following contributions:

More to be included

Zipf's laws

See:

Any other business

The remainder is still being written.

Home Top   Prev  
Map Pages Gloss Pics Refs
Copyright René Zandbergen, 2011
Comments, questions, suggestions? Your feedback is welcome.
Latest update: 2011/02/26