All those who studied the Voynich MS realised the need to be able to convert the text to Roman characters. The only exception to this rule was Fr. Th. Petersen of the Catholic University of America, who made a complete hand transcription of the MS using the original alphabet. This was in the 1930's, well before the introduction of the computer, of course.
The transcription of the Voynich MS into a computer-readable form has been described in detail in a previous page. The transcription alphabet used in these web pages is primarily the EVA alphabet, for which a more detailed description is given in a dedicated page (temporary copy of a page by Gabriel Landini).
A short example will do here:
The following figure was created using the "Voynich EVA Hand 1" font created by Gabriel Landini. The EVA text representing this script is given below. It is a sample transcription of the start of folio 1r of the manuscript.
fachys.ykal.ar.ataiin.shol.shory.cthres.ykor.sholdy sory.cthar.or.y.kair.chtaiin.shar.are.cthar.cthar.dan syaiir.sheky.or.ykaiin.shod.cthoary.cthes.daraiin.sa o'oiin.okeey.oteor.roloty.cth*ar.daiin.otaiin.or.okan sair.y.chear.cthaiin.cphar.cfhaiin - ydaraishy
The choice of the transcription alphabet will have an impact on numerical analysis done on the Voynich MS text. This is most obvious for the calculation of the word length distribution, since the number of charcters to represent one 'glyph' of the Voynich MS text is different for each alphabet.
This section is an overview of the analysis of the text of the Voynich MS that has been performed over the 90+ years since its discovery. It is not complete. A more detailed description is presented in the Analysis Section of this site, which equally cannot claim completeness.
It is difficult to present the multitude of analyses that have been performed on the Voynich MS text in an orderly fashion. The approach adopted here is to start at the character-level and advance through analysis of single-word properties to analyses of syntax, thus:
This does, however, require the introduction of a few concepts before anything else.
In a famous presentation in 1976, Prescott Currier showed that, on the basis of counts of character pairs and words, it appears that each page of the Voynich MS appears to be written in one of two distinct 'languages' which he called A and B. He was careful to point out that these are not necessarily different languages, but could be dialects, subject matter or different encryption. It should be pointed out in addition here, that this conclusion was based on his study of about half the pages of the MS.
The Herbal section appeared to be a mixture of A-language and B-language pages, and the distribution of these pages was strictly according to the bifolio, i.e. entire bifolia are written in one language. He also saw two different handwriting styles which were fully correlated with the type of language. This has already been shown in a previous page.
A >> complete copy in PostScript, including the tables has been prepared by J.Reeds (link refers to mirror by J.Stolfi).
Several sources list 'odd features' of the MS text, in particular D'Imperio, Currier and Tiltman. The are presented in detail in the analysis section.
Several authors have identified structures in the composition of words in the Voynich MS. These are, roughly in chronological order:
Tiltman's split in roots and suffixes
Tiltman observed that many words in the stars or recipes section (which was the only sample he had available) were composed of two parts, and he set up the following table:
|ok of||an ain aiin aiiin|
|ot op||ar air aiir aiiir|
|qok qof||al ail aiil aiiil|
|Sh||ey eey eeey|
|d||edy eedy eeedy|
Every combination of a 'root' and a 'suffix' gives a valid word.
Mike Roe's generic word
The following pattern was contributed to the Voynich MS mailing list by Mike Roe. His system is represented here translated to the EVA alphabet. Each path represenst a valid word, and Mike suggested that this could perhaps present evidence of grammar of the Voynich language:
+- o --+ +- r -+ o --+ +--+ +--+ +--+ | +- t -+ | +- cho -+ +- l -+ | qo --+--+ +--+ | | +- k -+ | +- e ---+ | cho --+ | | | | | +- ee --+ | | | | | +--+- che -+-- y ------+------> | | | | | +- ch --+ | | | | | | +- sh --+ | | | | | | +-------+ | | | | +- al ---+ | | | | | +--+- am ---+----------+ | | +- ain --+ | | +- aiin -+
Robert Firth's split into odd and even groups
A slightly different approach was taken by another list member, Robert Firth, who, ignoring the word spaces, was able to define two lists of characters and character groups, such, that the text in the Voynich MS consists of alternating items from these lists. The split is not entirely unambiguous but reportedly it works for most of the MS. It is explained in his >> Note Nr.24
Jorge Stolfi's split into 'soft' and 'hard' characters
Jorge Stolfi discovered another fascinating structure in the words of the Voynich MS, by grouping all characters into 'soft' and 'hard' and showing that the vast majority of words consists of one, two or three groups, which he calls prefix, stem or midfix, and suffix. The first and last consist of 'soft' characters, and the stem or midfix of hard characters.
This is explained in detail at his >> web site
Jorge Stolfi's fine structure of words in the MS
This is also known as the 'OKOKOKO' paradigm.
This is also explained in detail at his >> web site
Included here will be at least:
This must still be included. The word frequencies are discussed also in some detail in the section about Zipf's law.
This was first studied by Bennett and by Krischer. An explanation of the meaning of entropy will be included here. In addition, see the following contributions:
More to be included.
As a start, see the following contributions:
More to be included
The remainder is still being written.