It is always a good practice to consider the transition from letter to letter, using fake letters for word boundaries.

Starting with the initial letter (transition from word-start to letter one, “tendency” denotes the percentage of occurrences of a specific letter in the first position):

Index Letter Count Frequency Tendency
 1 3 14 17.94 % 14/21 = 66.66 %
 2 A 13 16.66 % 13/19 = 68.42 %
 3 4 9 13.23 % 9/22 = 40.90 %
 4 N 5 6.41 % 5/15 = 33.33 %
X 5 6.41 % 5/10 = 50.00 %
Z 5 6.41 % 5/6 = 83.33 %
d 5 6.41 % 5/5 = 100 %
 8 U 4 5.12 % 4/7 = 57.14 %
L 4 5.12 % 4/16 = 25.00 %
 10 W 3 3.84 % 3/10 = 30.00%
2 3 3.84 % 3/25 = 12.00 %
 12 7 2 2.56 % 2/16 = 12.50 %
G 2 2.56 % 2/5 = 40.00 %
 14 q 1 1.28 % 1/4 = 25.00 %
M 1 1.28 % 1/7 = 14.28 %
c 1 1.28 % 1/7 = 14.28 %
b 1 1.28 % 1/15 = 6.66 %
Total 78 100 %

Things to note:

  • There is a word starting with c — cMN) on frame 2728 — which suggests that c is not a diacritic, or at least a diacritic that works in a different way than ( or );
  • There is a very strong tendency for 3 and A, two very common letters, to be in the onset of syllables;
  • The same as above is true for Z, that, as we have seen and will study better, is always found before L — the only case when Z is not the first letter is in dZL, which suggests a syllable structure d?Z? (I’m approximating regex notation, as I believe there will be more programmers than linguists reading this);
  • Any other assumption has to deal with the small population, but we should at least note that 4 does not present a tendency to be in the onset and that 2 and 7, the most common letter and a medium frequency one, have a clear tendency of not being in the onset;
  • Given that we have 78 words in the corpus, an equal distribution would have 3.25 occurrences for each letter (I’m considering c a letter); once more, while the population is small, we are allowed the hypothesis that the letters in the groups { g J 9 S 6 Q j } are not found in the onset of Beanish syllables (the same might be true for b which is found only in a single word “b” in frame 2728). The group is similar to the { g 6 Q j M Z } group from the previous post of letters that do not seem to take diacritics. This suggests that the first letter in a syllable must potentially take a diacritic, which makes more likely the hypothesis that diacritics are phonological marks. This two groups, and in particular their intersection { g 6 Q j }, will be useful in discovering the syllable structure and are probably consonants (assuming that Beanish phonology is similar to the phonology of most European languages). If b represents a single phoneme — we cannot rule out that the script is alphabetic — it might be a syllabic consonant, such as the final ‘m’ in English “bottom”.

We can perform the same analysis with the transition to the end symbol (“Count” excludes diacritics, “Pure Count” does not — see the case of d) as discussed below):

Index Letter Count Pure Count Frequency Tendency Pure Tendency
1 2 11  10 14.10 % 11/25 = 44.00% 10/25 = 40.00%
2 L 7  2 8.97 %  7/16 = 43.75%  2/16 = 12.50%
J 7  6 8.97 %  7/8 = 87.50%  6/8 = 75.00%
4 N 6  5 7.69 %  6/15 = 40.00%  5/15 = 33.33%
b 6  4 7.69 %  6/15 = 40.00%  4/15 = 26.66%
6 X 5  2 6.41 %  5/10 = 50.00%  2/10 = 20.00%
7 g 4  4 5.12 %  4/9 = 44.44%  4/9 = 44.44%
9 4  4 5.12 %  4/8 = 50.00%  4/8 = 50.00%
9 S 3  3 3.84 % 3/6 = 50.00%  3/6 = 50.00%
q 3  0 3.84 %  3/4 = 75.00%  0/4 = 0.00%
U 3  2 3.84 %  3/7 = 42.85%  2/7 = 28.57%
6 3  3 3.84 %  3/3 = 100%  3/3 = 100%
7 3  3 3.84 %  3/16 = 18.75%  3/16 = 18.75%
A 3  3 3.84 %  3/19 = 15.78%  3/19 = 15.78%
15 4 2  1 2.56 %  2/22 = 9.09%  1/22 = 4.54%
M 2  2 2.56 %  2/7 = 28.57%  2/7 = 28.57%
c 2  0 2.56 %  2/7 = 28.57%  0/7 = 0.00%
18 d 1  0 1.28 %  1/5 = 20.00%  0/5 = 0.00%
G 1  1 1.28 %  1/5 = 20.00%  1/5 = 20.00%
3 1  1 1.28 % 1/21 =  4.76%  1/21 = 4.76%
j 1  1 1.28 % 1/1 = 100%  1/1 = 100%
Total 78 100 %

Comments:

  • There is a single occurence of d in a final position (frame 2664), but in that case it has the diacritic ). It would seem to confirm that d is a consonant and that the ) diacritic is a vowel.
  • The high frequency of J in the final position is due to the word 42bJ (“water”), which is repeated many times.
  • We can make some new groups: first, the letters that can take a diacritic when in the coda but that usually do not: { 2 J N G 3 j}; second, the letters that can either take or not a diacritic in the coda: { L b X U 4 }; third, the letters that don’t seem to take diacritics when in the coda: { g 9 S 6 7 A M }; fourth, the letters that apparently must have a diacritic to figure in the coda (or that, perhaps, are the nucleus of the syllables and the diacritic serves as the coda): { q c d }.
  • The letter q, with a diacritic, seems strongly fixed in the final position: the only word where it is not at the very end is q9 , in frame 2728.
  • We are by now pretty certain that 6 is only found at the final position.
  • Among the most common letters, 2 is very common in the final position, A is somewhat common and 4 and 3 are not very common. This might confirm that 2 is a vowel, the most common vowel in the language, and that 4 and 3 are consonants, in a language that might favor a standard CV syllable structure. It is impossible not be tempted to apply the letter frequency from English (etaoin shrdlu, anyone?) and guess that 2 is /e/, 4 is /t/ and 3 is /s/, but it is just a wild guess (not to mention the fact that I am working under the assumption that the Beanish script is phonological, or at least more like Spanish and Italian than English or French — does anyone have a frequency list of phonemes in these languages, i.e., not letters? Might be time to scrap Wikidictionary…)