![]() |
![]() |
Or use the browser's BACK button
A Well-kept Secret of Mediaeval Science:
the Voynich manuscript
Gabriel Landini (Lecturer in Analytical Pathology, School of Dentistry) and René Zandbergen (System Analyst and Consultant in the Space Sector, Darmstadt, Germany).
Note: this article first appeared in the July 1998 issue of 'Aesculapius', the journal of the University of Birmigham Medical and Dental Graduates Society, and has been published here with the kind permission of the editor.
Introduction
Imagine a book written in an unknown
alphabet, in an unknown language, at an unknown date and place.
Could such a book be read? Could one retrieve the information
it contains, if any? This is not a trivial question and it has
baffled historians and scientists alike for most part of this
century, in the case of a particular mediaeval document, called
"the Voynich manuscript".
Brief history
In 1912, Wilfrid M. Voynich (a rare
book collector) bought a number of mediaeval manuscripts from
an undisclosed source in Europe. Among these was a 235-page manuscript
written in an unknown script and what appears to be an unknown
language or a cipher. Understandably, Voynich wanted to have the
mysterious manuscript deciphered and provided photographic copies
to a number of experts. However, the book still remains unread.
Since then, the manuscript has been known as the "Voynich
manuscript".
It was eventually known that Voynich
bought the manuscript from the Jesuit college at the Villa Mondragone,
Frascati, in Italy and that it originally belonged to the Collegium
Romanum. Attached to the manuscript was a letter in Latin dated
1666 (or 1665) from Johannes Marcus Marci of Cronland, rector
of the University of Prague, to Athanasius Kircher S.J., a Jesuit
priest and scholar in Rome, offering the manuscript for decryption
and mentioning that it was bought by Emperor Rudolf II of Bohemia
(1552-1612) for 600 gold ducats. The letter also implied that
Kircher knew about the manuscript and had exchanged a letter and
some transcribed portions with the previous owner who did not
want to send him the whole manuscript. The letter further mentioned
that Roger Bacon (the Franciscan friar who lived from 1214 to
1294) had been considered as the author, but in any case the manuscript
had not been read.
A few pieces of circumstantial evidence
connect the manuscript with John Dee (1527-1608) who visited the
court in Prague in 1582-1586 and was an admirer of Bacon and collector
of his mGnuscripts. Dee stated in his diary that he had 630 ducats
in October 1586, (Keynes,1931) and his son remembered that while
in Bohemia, his father owned a "
booke containing
nothing butt Hieroglyphicks, which booke his father bestowed much
time upon: but I could not heare that hee could make it out"
(Browne, in Keynes, 1931). Could this be the manuscript? No
other known manuscript fits this description. Furthermore, according
to A. G. Watson (Roberts & Watson, 1990), the foliation numbering
is in Dee's hand.
On the lower margin of the first folio of the manuscript, under special illumination, the erased ownership signature of Jacobus de Tepenecz and the word "Prag" were found. Tepenecz was the director of Rudolf's botanical gardens and he must have owned the manuscript between 1608, when he received his title "de Tepenecz", and 1619 when he fled from Prague. There is uncertainty as to whether he owned it before or after emperor Rudolf (who abdicated in 1611).
At this point the story
is not very clear. The person who would not send the manuscript
to Kircher is almost certainly the little-known alchemist Georg
Barsch, who bequeathed his entire alchemical library to Marci.
It is not fully certain what happened with the manuscript between
the time of Marci's gift and 1912. Most probably Kircher, lacking
the time to study it, filed it with his other correspondence.
It was found by P. Beckx S. J. (1795-1887, general of the society
of Jesus from 1853 to 1883), who seems to have rescued it from
confiscation by Vittorio Emanuele's soldiers in 1870, together
with other valuable manuscripts (among which was Kircher's collected
correspondence). This collection, finally, was sold in 1912, in
part to the Bibliotheca Apostolica Vaticana and in part to W.
Voynich.
The book was then bought by H. P. Kraus
(a New York book antiquarian) in 1961 for the sum of $24,500 and
later valued at $160,000. Unable to find a buyer, he donated it
to Yale University in 1969, where it remains to date at the Beinecke
Rare Book and Manuscript Library with catalogue number MS 408.
Description of the manuscript
The manuscript counted at least 116 folios, of which 104 remain. The folio size is 6 by 9 inches, but some folios are two or three times that size and are folded. There is one large composite of six times this size (18 by 18 inches). Both the illustrations and the script of the manuscript are unique. As long as the script cannot be read, the illustrations are the only clue about the nature of the book. According to these illustrations, the manuscript would appear to be a scientific book, mostly an illustrated herbal with some additional sections.
What is commonly called the Herbal
section fills about half the volume. It consists of page-filling
drawings of single plants with short paragraphs of text written
beside them. Occasionally, two plants are shown on a single page.
The layout is similar to that of traditional illustrated herbals.
While some of the drawings do resemble existing plants, most of
the drawings would appear to be fantastic compositions. Figure
1 shows a typical example (fol. 53r).
Following is a section
with Astronomical and so-called Cosmological drawings.
The astronomical pages feature
drawings of circular design, with images of the sun, the moon
and arrangements of stars. Cosmological drawings have a similar
layout but include other more abstract features such as rosettas,
tubes and pipes. A section of the astronomical pages (which is
usually called the Astrological section) has illustrations
of the zodiac, surrounded by circles of mostly nude female figures
holding stars.
The next apparent section of the manuscript has been called Biological as it contains some odd, perhaps anatomical, drawings including pipes and tubes resembling blood vessels, together with
human figures, mostly nude females,
similar to the ones in the astrological section. There have been
suggestions that the illustrations represent medicinal baths.
Figure 2, showing fols. 77v and 78r, is from this section.
Following, there are a few more herbal
pages and a different section which has been called Pharmaceutical,
as it includes pictures of labelled containers and many small
parts of plants, mainly roots and leaves.
Finally, the manuscript closes with
what has been called the Recipes section, as it contains
many (324) short paragraphs, each with a star in the margin (on
average 15 per page).There have been suggestions that this section
was some sort of calendar (or almanac), although adding the two
missing folios the total number of paragraphs would probably be
higher than 360 or 365.
Figures 1 and 2 clearly show the style
of the illustrations and the script used in the Voniych Manuscript.
Some characters resemble those from the roman alphabet (a, o,
c, n, m), some are like numbers (2, 4, 8, 9) and others are similar
to symbols used as Latin abbreviations or in alchemy in the Middle
Ages. In addition there are a few instances of extraneous writing
(different from the main body of the manuscript), not in "Voynich
script" and perhaps added later, such as the names of the
months in the astrological section (in an unidentified Romance
language) and three incomprehensible lines on the last folio,
suggesting a key to decryption, or an attempted decryption by
one of the previous owners.
The character set is discussed in more
detail below.
|
Figure 1.
A photostat of folio 53r from the Friedman collection. Folio 53r
is part of the "herbal" section of the manuscript. Note
the similarity of some of the characters to roman letters (c,
a, e, i, o) and numbers (4, 8, 9). The number page (53) is supposedly
in John Dee's hand. Reproduction by courtesy of the Marshall foundation,
Lexington, Virginia.
|
|
Figure 2.
A photostat of folios 77v and 78r from the Friedman collection.
Folios 77v and 78r are part of the "biological" section.
The drawings have been associated with sketches of anatomical
organs: ovaries, uterus and blood vessels (top of f77v), intestinal
tract (f77v, right) and male reproductive organs (Guy) (left).
The numbering next to the text was added to the photostats probably
by Friedman's team and it does not appear in the original. Reproduction
by courtesy of the Marshall foundation, Lexington, Virginia.
|
Past attacks to the decipherment
When the manuscript was first shown
to expert cryptologists, they thought that solving it would be
easy as the text was composed of "words", some of which
were more frequent and occurred in certain combinations (Kahn,
1967). This soon turned out to be a mistake; the text could not
easily be converted into Latin, English, German or a host of other
languages which might possible be at the base of this document.
A first "solution" was announced
in 1919, by William Romaine Newbold (Newbold, 1921), who caused
a sensation by claiming that the manuscript did indeed contain
the work of Roger Bacon and that Bacon had known the use of the
compound telescope and microscope, seeing the spiral structure
of the Andromeda galaxy (!) only visible with modern telescopes
and cell structures unknown in the 13th Century. This
solution was finally disproved (Manly, 1931).
The attempts to crack the code, however,
were not over. In 1931, Mrs. Voynich took a photostat copy of
the manuscript to Catholic University in Washington where Fr.
Theodore Petersen reproduced it photographically and started a
complete hand transcription of the manuscript, with a card index
to the words, and lists of concordances. The transcription alone
was reported to have taken him 4 years. Unfortunately, it is not
known what conclusion, if any, he reached.
In 1944, Hugh O'Neill, a renowned botanist
at the Catholic University, identified various plants depicted
in the manuscript as New-World species, in particular an American
sunflower and a red pepper (O'Neill, 1944). This meant that the
dating of the manuscript should be placed after 1493, when Columbus
brought the first sunflower seeds to Europe. However, the identification
is not certain: the red pepper is coloured green and the sunflower
identification is equally contested.
Other people involved in the study of the manuscript were prominent cryptologists such as W. Friedman and J. Tiltman, who independently arrived at the hypothesis that the manuscript was written in an artificial, constructed language. This was based on the structure of the "words" as described below. Such artificial languages were devised at least a century after the probable date of the Voynich manuscript. Only the 'Lingua Ignota' of Hildegarde of Bingen (1098-1179) predates the Voynich manuscript by several centuries, but this language does not exhibit the structure observed by Friedman and Tiltman, and it provides only nouns and a few adjectives.
Friedman came to know Petersen who at
some time presented his hand transcription and other material
to him. After Friedman's death, all the material was moved to
the W.F. Friedman collection of the Marshall Foundation. Recently,
electronic versions of the transcriptions made by Friedman's groups
were produced from the typed sheets and made available on the
Internet (Reeds, 1995).
Later acclaimed solutions see in the
manuscript a simple substitution cipher which can only decode
isolated words (Feely, 1943), the first use of a more or less
sophisticated cipher (Strong, 1945; Brumbaugh, 1977), a text in
a vowel-less Ukrainian (Stojko, 1978) or the only surviving document
of the Cathar movement (Levitov, 1987). No acceptable plaintext
has ever been produced though.
Some interesting new insights into the
manuscript were provided in the 70's by Prescott Currier, presenting
some of his results at an informal Voynich manuscript symposium
at the National Security Agency in Washington (D'Imperio, 1978).
Basing his findings on the statistical properties of the text,
he showed that the manuscript is written in two distinct "languages"
which he simply called A and B. Each bifolio was written in one
of the two, and bifolios in the same "language" were
generally grouped together. Only in the herbal section there is
a mixture of A and B folios. Based on the characteristics of the
writing, he showed that the manuscript seems to have been written
in two distinct "hands", and he even suggested there
could be as much as five or even eight different hands. A significant
feature is that the hand and language used on each folio are fully
correlated. Currier's conclusion was that at least two people
were involved in writing the Voynich manuscript, (which he considered
a point against the "hoax theory" summarised below),
although alternatively, the manuscript could have been written
by one person, in two distinct periods.
Due to the lack of success in the decipherment,
a number of people have proposed that the manuscript is a "hoax".
The manuscript could either be a 16th century forgery,
to be sold for a hefty sum to emperor Rudolf II, who was interested
in rare and unusual items (Brumbaugh, 1977, deriving from earlier
unpublished theories), or a more recent one by W. Voynich himself
(Barlow, 1986). The latter is effectively excluded both by expert
dating of the manuscript, and by the evidence of its existence
prior to 1887.
One problem with the earlier hoax theory
is that, as will be shown, certain word statistics (Zipf's laws)
found in the manuscript are characteristic of natural languages.
In other words, it is unlikely that any forgery from 16th
century would "by chance" produce a text that follows
Zipf's laws (first postulated in 1935).
Since 1990, a multidisciplinary group
of varying size, generally between 100 and 200 individuals, dispersed
all around the globe and connected through the Internet, has maintained
an electronic mail forum on the decipherment of the Voynich manuscript.
This has led to a lively exchange of ideas and the definition
of two main goals: a machine readable transcription of the manuscript
text and the study of the text through numerical experiments.
The following sections relate to these issues.
Transcription
To be able to analyse the text with
modern tools, a machine-readable representation of the text is
needed. Some partial transcriptions were produced in the past,
by study groups set up by Friedman (in particular his so-called
First Study Group or FSG) and by Currier and D'Imperio. The authors
decided to proofread the existing transcriptions and add the missing
parts from the Petersen copy (with the kind permission of the
Marshall Foundation) using a newly designed transcription alphabet.
To facilitate the transcription process, this new alphabet (called
EVA) assigns roman letters to the different characters such that
the combinations that appear in the manuscript are mostly pronounceable.
Transcription of the Voynich manuscript
is made difficult by a few problems. Since the alphabet is not
known, one cannot be certain which are the basic components, i.e.
the single characters. For the same reason, it is never clear
whether two similar characters are really different, or in fact
the same but with a variation in the handwriting. Over-differentiation
leads to an excessively large character set and an extremely difficult
transcription exercise, especially since the variations in the
way characters are written is a continuum of possibilities and
no clearly-defined rules for distinguishing one from the other
can be given. Under-differentiation should, on the other hand,
obviously be avoided because once two different characters are
transcribed the same, the error can not be undone. Therefore,
the transcription alphabet must be carefully designed.
Another, smaller, problem is that the
text includes what are obvious "ligatures" of basic
characters, and some "rare" characters, which occur
only a few times in the entire text (which is about 250,000 characters
long). The EVA alphabet (designed by Zandbergen with the aid of
Guy, priv.comm.) tends to break the text into smaller components
than most of the previously existing alphabets. The basic alphabet
which allows the transcription of the vast majority of the manuscript
is displayed in Table 1, using a computer font designed by Landini.
It includes some Voynich characters which are most probably single
characters but represented in the transcription alphabet by several.
The rarer characters, some of which are embellishments or variations
of the basic characters, have been added as an extension to the
alphabet, such that it can fully reproduce the manuscript (see
the on-line document at http://web.bham.ac.uk/G.Landini/evmt/eva.htm).
Table 1. The transcription alphabet used by the authors.
Note: to display the characters of the Voynich script,
the TrueType font 'EVA Hand 1' needs to be available to your browser
Ligatures are indicated by grouping
several characters between parentheses, e.g. (c'y) or (ith), or
by using capitals for characters which connect to the right, in
these cases: C'y or ITh. These sequences look like
C'y and
ITh
respectively. Rare characters are coded with the aid of numerical
value contained between an ampersand and a semi-colon, e.g. (c&179;h)
which looks like c³h.
Words are separated by a full stop, while a comma is used when
the word break is uncertain.
To illustrate the use of the EVA alphabet, the first paragraph of folio 53r (Figure 1) when transcribed using the EVA alphabet, and displayed using the computer font reads:
|
Table 2.
kodam,chocThody.oty
The new transcription using the roman
alphabet: <f53r.1> kodam,chocthody.oty <f53r.2> dol.dain.s,cho,she.oty <f53r.3> sho,os.chokan.ody <f53r.4> ytchodaiin.yky.otchey.otod <f53r.5> oksh.otol.cfhy.cphodol.ykody.qokchod. -otcho.qot.oty <f53r.6> ykeodar.oqoor.o(cki).odor.chain.qokod. -ykchdy.chees.dal <f53r.7> sodar.otos.qoy.tchy.otey.chos.okod. -ykchody.qokchy <f53r.8> qotchol.dar.qoty.chtor.oltsho.(cto). -ykeeod.o.y,toyd <f53r.9> otol.chol.ctheees.os.orol.chod.qoty= |
Analysis of the text
In the past, analysis of the Voynich
manuscript text has been based either on visual inspection of
the manuscript pages or by numerical analysis of partial transcriptions.
Expert analysis of the manuscript has yielded surprisingly few
hard facts. This is of course due to the uniqueness of the manuscript
and the absence of the usual indications of provenance and age.
The current belief about the provenance of the manuscript locates
it in Central Europe while some experts narrow it down either
to Germany or Italy. The age of the manuscript is given as late
15th to early 16th century, while others
would not exclude the 13th or even the late 16th
century. Proposed writers for the manuscript are Roger Bacon,
an unknown North Italian or German alchemist or quack, John Dee,
Giordano Bruno or Anthony Askham.
It seems clear that the best hope of
solving the mystery of this manuscript is via a numerical analysis
of the text, such as character and word statistics, concordances
and correlations across the pages as well as identification of
the captions (single words or 'labels' ) which occur near many
of the illustrations in the manuscript. So far, there are some
established facts:
Character statistics
The alphabet size appears to be of the
order of 23-30, depending on what one calls a single character.
It has not been possible to identify Voynich characters which
obviously represent numbers. The frequency distribution of the
single characters is rather similar to that of 'normal' languages,
and it is possible to tentatively identify vowels and consonants
such that the text is mostly pronounceable. Repeated single characters
(doublets) are extremely rare, with the exception of i
and e
which usually occur two or three times in a row, and which perhaps
represent single characters. For example the double or triple
i
(in
and iin)
closely resemble a cursive n or m respectively.
Character entropy
Entropy, in information theory terms,
is a numerical measure of the uncertainty in a sequence or string
of characters. For example the sequence "aaaaaaaaaaaa..."
is very predictable (low entropy). There is a larger probability
that the next character will be "a".
The sequence "dkjtgarltsoy..."
is not (high entropy). Interestingly, all natural languages have
a certain degree of redundancy, which is also the case for the
manuscript text. However the so-called second order entropy (the
amount of uncertainty about the next character, given that the
current one is known) is too low when compared to other European
languages (Bennett, 1976). This is a reflection of the fact that
the character sequences in the words tend to follow set patterns
and it has been taken as an indication that the words are 'constructed'
in a way similar to some of the artificial languages of the 16th
century.
Word statistics
Assuming that the "spaces"
between words are truly word separators, the word length distribution
is somewhat shorter than that of Latin or English. There is a
large amount of repetition in the text, and the same word may
repeat itself up to four times. Additionally, words which only
differ by one character are often found in each other's vicinity.
It has been speculated that this sort of repetition could be due
to coding numerical values (five, five), to prayers (amen,
amen, etc.), or the presence of magical formulas or recipes.
Because of the short word length, there are theories that the
spaces do not separate words but rather syllables. Alternatively,
because of the restricted set of word-initial or word-final characters,
it is possible that the spaces are of an orthographic nature such
as is the case in the arabic script. Still, both the number of
different words and the number of words occurring only once in
the text are within reasonable bounds for a text of this length.
Word distribution
In 1935 Zipf described a number of relationships
in texts that he suggested were due to a "principle of least
effort" in the use of language. He showed that in normal
writing the frequency of a particular word is inversely proportional
to its rank, once all the words have been sorted by decreasing
frequency. Secondly that the number of infrequent words is inversely
proportional to their frequency and finally that the number of
syllables is inversely proportional to the frequency of use of
a word. The first (and most important) characteristic is present
in the Voynich manuscript text, as shown in Figure 3. The first
Zipf law is represented by a straight line with a slope of -1.
The characteristic small deviation from this line found in natural
languages is also observed for the Voynich manuscript language.
|
Figure 3.
A rank-frequency (Zipf's plot) of several texts: voynich (Friedman's
first study group transcription), roget (Roget's Thesaurus), republic
(Plato's Republic), Emma (Jane Austen), Alice (Alice in Wonderland,
Lewis Carroll). Note that all, (except the Roget's Thesaurus which
is a collection of very short definitions and cross referenced
words) follow the same trend of a slope of -1 in the log-log plot.
There are 2 more Zipf's "laws" which were also found
in the Voynich manuscript (available at: http://web.bham.ac.uk/G.Landini/evmt/zipf.htm
).
|
Conclusion
It is possible that, if deciphered,
the manuscript would reveal no more than the knowledge found in
other mediaeval herbals and alchemy books. However, if the document
is real it means that at least the people involved in the writing
were able to read it. The fact that it still remains unread is
quite unique and a challenge in cryptological terms. What was
the method of encoding? What is the original underlying language
? Who wrote it? Why? It would also be interesting to know what
kind of knowledge required such an amount of secrecy at the time
of writing. It could even contain a subject completely unrelated
to the drawings...
In any case, if a "readable"
text is produced after some processing of the manuscript text,
how can one be sure that the solution is correct? Could any of
the solutions which have been announced be correct? On which basis
to accept or reject a proposed a solution is quite a problem because
there is no date, author, country or language associated with
the manuscript. We will assume that:
Without all of these being true, there
seems to be no possibility of finding a solution.
No solution which fails to present a
detailed description of how the "encoding" was done
by the writer(s) of the manuscript can be accepted, and this is
where most of the proposed solutions fail. It is also clear that
neither the method of encoding nor the contents of the decoded
text may conflict with the context of a manuscript written in
medieval Europe (or elsewhere by a medieval European). Finally,
the solution should clearly explain the many odd statistical properties
found in the manuscript text, which could not be fully described
here, but which may be found in the literature on the subject
(Tiltman, 1967; Currier, 1976; D'Imperio, 1978).
It seems paradoxical that at the time
of rising concerns about the public use of "uncrackable"
security codes, a medieval manuscript probably cannot be read.
Let's hope that it is not for too long.
References
Following are some of the more relevant works about the Voynich manuscript, which have been mentioned in the text. D'Imperio (1978) has a long bibliography of subjects which are possibly related to the manuscript. Furthermore, the most recent developments in the study of the Voynich manuscript may be found on the World-Wide Web, via the following sites:
http://web.bham.ac.uk/G.Landini/evmt/evmt.htm
Note: this site contains instructions
on how to dowload the TrueType font 'EVA Hand 1' used in this article
http://www.geocities.com/Athens/Delphi/8956/
The most up to date bibliography about the Voynich manuscript is maintained on the WWW site of Reeds:
http://www.research.att.com/~reeds/voynich.html
Acknowledgements
The illustrations of the Voynich manuscript
contained in this article show pages from W. Friedman's photostatic
copy owned by the William and Elizebeth Friedman collection in
the George Marshall Foundation, Lexington, Virginia, and were
reproduced with the kind permission of the owner.
Most of the numerical analyses summarised
in this article were performed by members of the Internet Voynich
mailing list.
![]() |
![]() |
Or use the browser's BACK button