Statistics 3 – Letters | Deciphering Beanish

It is always a good practice to consider the transition from letter to letter, using fake letters for word boundaries.

Starting with the initial letter (transition from word-start to letter one, “tendency” denotes the percentage of occurrences of a specific letter in the first position):

Index	Letter	Count	Frequency	Tendency
1	3	14	17.94 %	14/21 = 66.66 %
2	A	13	16.66 %	13/19 = 68.42 %
3	4	9	13.23 %	9/22 = 40.90 %
4	N	5	6.41 %	5/15 = 33.33 %
	X	5	6.41 %	5/10 = 50.00 %
	Z	5	6.41 %	5/6 = 83.33 %
	d	5	6.41 %	5/5 = 100 %
8	U	4	5.12 %	4/7 = 57.14 %
	L	4	5.12 %	4/16 = 25.00 %
10	W	3	3.84 %	3/10 = 30.00%
	2	3	3.84 %	3/25 = 12.00 %
12	7	2	2.56 %	2/16 = 12.50 %
	G	2	2.56 %	2/5 = 40.00 %
14	q	1	1.28 %	1/4 = 25.00 %
	M	1	1.28 %	1/7 = 14.28 %
	c	1	1.28 %	1/7 = 14.28 %
	b	1	1.28 %	1/15 = 6.66 %
	Total	78	100 %

Things to note:

There is a word starting with c — cMN) on frame 2728 — which suggests that c is not a diacritic, or at least a diacritic that works in a different way than ( or );
There is a very strong tendency for 3 and A, two very common letters, to be in the onset of syllables;
The same as above is true for Z, that, as we have seen and will study better, is always found before L — the only case when Z is not the first letter is in dZL, which suggests a syllable structure d?Z? (I’m approximating regex notation, as I believe there will be more programmers than linguists reading this);
Any other assumption has to deal with the small population, but we should at least note that 4 does not present a tendency to be in the onset and that 2 and 7, the most common letter and a medium frequency one, have a clear tendency of not being in the onset;
Given that we have 78 words in the corpus, an equal distribution would have 3.25 occurrences for each letter (I’m considering c a letter); once more, while the population is small, we are allowed the hypothesis that the letters in the groups { g J 9 S 6 Q j } are not found in the onset of Beanish syllables (the same might be true for b which is found only in a single word “b” in frame 2728). The group is similar to the { g 6 Q j M Z } group from the previous post of letters that do not seem to take diacritics. This suggests that the first letter in a syllable must potentially take a diacritic, which makes more likely the hypothesis that diacritics are phonological marks. This two groups, and in particular their intersection { g 6 Q j }, will be useful in discovering the syllable structure and are probably consonants (assuming that Beanish phonology is similar to the phonology of most European languages). If b represents a single phoneme — we cannot rule out that the script is alphabetic — it might be a syllabic consonant, such as the final ‘m’ in English “bottom”.

We can perform the same analysis with the transition to the end symbol (“Count” excludes diacritics, “Pure Count” does not — see the case of d) as discussed below):

Index	Letter	Count	Pure Count	Frequency	Tendency	Pure Tendency
1	2	11	10	14.10 %	11/25 = 44.00%	10/25 = 40.00%
2	L	7	2	8.97 %	7/16 = 43.75%	2/16 = 12.50%
	J	7	6	8.97 %	7/8 = 87.50%	6/8 = 75.00%
4	N	6	5	7.69 %	6/15 = 40.00%	5/15 = 33.33%
	b	6	4	7.69 %	6/15 = 40.00%	4/15 = 26.66%
6	X	5	2	6.41 %	5/10 = 50.00%	2/10 = 20.00%
7	g	4	4	5.12 %	4/9 = 44.44%	4/9 = 44.44%
	9	4	4	5.12 %	4/8 = 50.00%	4/8 = 50.00%
9	S	3	3	3.84 %	3/6 = 50.00%	3/6 = 50.00%
	q	3	0	3.84 %	3/4 = 75.00%	0/4 = 0.00%
	U	3	2	3.84 %	3/7 = 42.85%	2/7 = 28.57%
	6	3	3	3.84 %	3/3 = 100%	3/3 = 100%
	7	3	3	3.84 %	3/16 = 18.75%	3/16 = 18.75%
	A	3	3	3.84 %	3/19 = 15.78%	3/19 = 15.78%
15	4	2	1	2.56 %	2/22 = 9.09%	1/22 = 4.54%
	M	2	2	2.56 %	2/7 = 28.57%	2/7 = 28.57%
	c	2	0	2.56 %	2/7 = 28.57%	0/7 = 0.00%
18	d	1	0	1.28 %	1/5 = 20.00%	0/5 = 0.00%
	G	1	1	1.28 %	1/5 = 20.00%	1/5 = 20.00%
	3	1	1	1.28 %	1/21 = 4.76%	1/21 = 4.76%
	j	1	1	1.28 %	1/1 = 100%	1/1 = 100%
	Total	78		100 %

Comments:

There is a single occurence of d in a final position (frame 2664), but in that case it has the diacritic ). It would seem to confirm that d is a consonant and that the ) diacritic is a vowel.
The high frequency of J in the final position is due to the word 42bJ (“water”), which is repeated many times.
We can make some new groups: first, the letters that can take a diacritic when in the coda but that usually do not: { 2 J N G 3 j}; second, the letters that can either take or not a diacritic in the coda: { L b X U 4 }; third, the letters that don’t seem to take diacritics when in the coda: { g 9 S 6 7 A M }; fourth, the letters that apparently must have a diacritic to figure in the coda (or that, perhaps, are the nucleus of the syllables and the diacritic serves as the coda): { q c d }.
The letter q, with a diacritic, seems strongly fixed in the final position: the only word where it is not at the very end is q9 , in frame 2728.
We are by now pretty certain that 6 is only found at the final position.
Among the most common letters, 2 is very common in the final position, A is somewhat common and 4 and 3 are not very common. This might confirm that 2 is a vowel, the most common vowel in the language, and that 4 and 3 are consonants, in a language that might favor a standard CV syllable structure. It is impossible not be tempted to apply the letter frequency from English (etaoin shrdlu, anyone?) and guess that 2 is /e/, 4 is /t/ and 3 is /s/, but it is just a wild guess (not to mention the fact that I am working under the assumption that the Beanish script is phonological, or at least more like Spanish and Italian than English or French — does anyone have a frequency list of phonemes in these languages, i.e., not letters? Might be time to scrap Wikidictionary…)

Deciphering Beanish

~ ᖉ, ᖆᐣᖚᔭ,ᐦ

Statistics 3 – Letters

Comment Cancel reply

Condividi:

Related

Comment Cancel reply