Voynich MS - Analysis Section ( 2/5 )
2. Character Statistics
Introduction
This page will first list a number of observations about the Voynich MS character
statistics, that may be found in the printed literature, and then concentrate on analyses
that have been performed by mailing lists members:
- Character frequencies
- Vowel/consonant identification (Sukhotin)
- Entropy of characters / diagraphs
- Cluster analysis of the Currier languages, by character (D'Imperio) or digraph (RZ)
- Was any other algorithm of Sukhotin used?
Observations in printed literature about character statistics
From D'Imperio
- The split gallows seem only to occur on first lines of paragraphs, and in labels.
- The same "word" may be repeated two, three or more times
- Many "words" differ by only one character and are found in each other's vicinity
-
Certain symbols occur characteristically at the beginnings, middles or ends of words,
and in certain preferred sequences
- Certain symbols appear very rarely, and only on certain pages
-
There are very few doublets. Primarily: e
or i and occasionally also
y, d
and o.
-
There are very few single-letter words in the running text, primarily
s and
y.
-
Prefix-like elements are found in front of certain "words" that also occur
commonly without them. Such prefix-like elements are:
qo, o
and y
- The symbol q almost always
precedes o, connected to it by an
extension of the crossbar of the 4. This combination is found almost always
at word start.
-
On most herbal folios, the first paragraph usually starts with
t, k,
p or f,
usually immediately followed by ch,
Sh, o,
y, aiin
or dy.
-
Labels very rarely start with t,
k, p
or f . Instead, they often start with
o, d,
y or sometimes
s or ch.
From Currier's symposium presentation:
List, to be included.
From Tiltman
(Note: Tiltman treats
f as a variant form of
k and
p as a variant form of
t. In the following, characters
or sequences in parentheses represent such variant forms).
-
cKh
(cFh) and
cTh
(cPh)
appear to be infixes of
k
(f) and
t
(p)
within
ch. The variant
symbol represented by
j appears most
commonly at the end of a line, rarely anywhere else.
-
Paragraphs nearly always begin with
k
(f) or
t
(p), most commonly in the second
variant forms, which also occur frequently in words in the top lines of
paragraphs where there is some extra space.
-
y
occurs quite frequently as the initial symbol of a line followed immediately
by a combination of symbols which seem to be happy without it in any
part of a line away from the beginning. Otherwise it occurs chiefly before
spaces very frequently preceded immediately by
d.
Hence my belief that these two have some separative or conjunctive function.
(I have to admit, however, that
y
also seems sometimes to take the place of
o before
k or
t (though rarely, if ever, after
q);
this is particularly noticeable in some of the captions to illustrations in the
astronomical section of the manuscript - these most commonly begin
ok
(of) or
ot
(op) and it is here that we
occasionally see
yk
(yf) or
yt
(yp).
-
Regarding the second type of suffix, some of the combinations are so
rare that I have been uncertain whether to take any account of them at all.
Some are very common indeed. It seems to me that each of these
combinations beginning with
a
has its own characteristic frequency which it maintains throughout the
MS and independent of context (except in caes where two or more
a groups
are together in series, as referred to later). These
a groups, e.g.
ar or
aiin,
frequently occur attached directly to "roots", particularly
ok
(of),
ot
(op),
d and
s.
okaiin
(ofaiin),
qokaiin
(qofaiin) and
daiin
rank high among the commonest words in the MS.
Character frequencies
This must still be included.
Vowel/consonant detection
Included here will be at least:
- Application of Sukhotin algorithm by Jacques Guy
- One or two pronouncible examples by Jacques
- On-line example of Mike Roe
Entropy
First treat Bennett,
then:
>>
Dennis Stallings: understanding the second-order entropies of
Voynich text (renew link)
then: the commas by Gabriel / Jim.
(Link lost here)
Then see the following contributions:
More to be included.
Any other business
-
Line-initial/final and word initial/final properties, to be included
-
Antoine Casanova's work, not sure if it belongs here or elsewhere
Rest being written
Copyright René Zandbergen, 2002
Comments, questions, suggestions? Your feedback is welcome.
Latest update: 2002/10/05