(note: I am posting part 2 before part 1… Part 1 and 3, alphabetic and syllabic guesses, are far harder and I don’t know if I’ll be able to finish them soon — real life knocking at the door)

In the previous post, I tried, without much success or confidence, to map Beanish glyphs to phonemes, assuming it is an alphabet. I used frequency tables, some linguistic knowledge, my ear (“it sounds good enough”) and, mostly, wild guesses. As I stated, the biggest problem are the diacritics: we can be more or less flexible regarding potential Beanish phonotactic restrictions, but the diacritics (with the possible exception of the “comma” one) do not work like the other glyphs (i.e., they are not letters) but don’t seem to work well as phonetic traits either. I tried to map them to some phonetic features nonetheless, but nobody should be pleased with my suggestions (I certainly am not).

One idea that has been debated in the XKCD fora since the time Time was playing was to treat it as an abugida. The diacritics are probably, once more, to blame, but in a lot of ways it does make sense: they could be very well vowel-marks (we can even try to think of them as a graphical representation based on the point of articulation in the mouth, very loosely like Korean) and the biggest objection is that the mean word lenght is a bit too long. Not that the abugida solution solves every single difficulty regarding Beanish: the transition probabilities among glyphs do suggest an alphabet more than an abugida (assuming the grammar isn’t terribly strict) and the number of glyphs is a bit too large for a “plausible” language. A third possibility is that the script is indeed a sillabary (remember that Randall used Linear A as an example), which does not exclude the possibility of the diacritics being vowel marks; we shall investigate this later.

Anyway, we have four diacritics in the Beanish script: the “middle dot” ᐧ , the “c” ᑦ , the “inverted c” ᐣ and the “comma” ,. Our major difficulty is that they can be combined, particularly the comma, in words such as ᖉᑦ, (but we also have the complex word ᓭᑦᐧ). If the diacritics are vowels, this could mean that vowels can sometimes be combined: in particular, the “comma” could be a glide (the most obvious being the palatal approximant /j/). We are left with ᓭᑦᐧ which, among other hypothesis, could be a diphtong (the only one we have so far) or the mark for a rare vowel. This is what I will assume.

Considering the three diacritics we have left, the fact that one of them looks graphically “neutral” (probably the most common vowel, such as /a/ or /ə/) and the fact that the other two seem to mirror/negate themselves, it is a good guess to consider the middle dot as an /a/, the “inverted c” as /e/ (possibly with allophones such as /ɛ/), the “c” as /o/ (possibly with allophones such as /ɔ/), the “comma” the /j/ glide and the combined diacritic ᑦᐧ just /oa/ or, even better, /oə/.

And now, let’s tabulate everything to find both the default vowel for each consonant and a guess of what consonant it is (based in the consonant frequency of both Beanish and English, plus two dorsals not found in English but common in other languages). Everything assumes that the syllable structure is V+C, and we are solving the isolated diacritic in ᑦᘈᖽᐣ (it would just be a word starting with /a/, the only one in our corpus: /asaʤe/).

Glyph Count /a/ /e/ /o/ Probable base-vowel Guess consonant
29 0 3 0 /a/ ? /p/
27 0 1 2 /a/ /b/
24 8 + 0.5 (ᓭᑦᐧ) 0 2 + 0.5 (ᓭᑦᐧ) /e/ /t/
21 0 7 0 /a/ /d/
17 5 2 0 /o/ /k/
17 0 0 1 /a/ ? /g/
16 0 1 3 /a/ /ʧ/
15 0 1 0 /a/ ? /ʤ/
13 0 2 2 /a/ /f/
11 0 1 0 /a/ ? /v/
10 0 0 0 /a/ ? /θ/
10 0 3 2 /a/ /ð/
10 0 0 0 /a/ ? /s/
7 0 1 0 /a/ ? /z/
7 0 0 0 /a/ ? /ʃ/
7 0 0 1 /a/ ? /ʒ/
6 0 3 0 /a/ ? /m/
5 0 0 0 /a/ ? /n/
5 0 0 0 /a/ ? /l/
4 0 0 2 /a/ ? /r/
3 0 0 0 /a/ ? /ŋ/
1 0 0 0 /a/ ? /ʎ/
3 0 0 0 /a/ ? /ɲ/

Which is great, because 1. There is no glyph with at least one occurence for every diacritic and 2. While a bit extensive, the size of the phonetic catalog is very reasonable (no need to use ejectives or the like, as in the guessed alphabet of part 1 of this post).

If you are still puzzled, this means that (completely made up words) ᘊᓭ should be read with the default vowel for each glyph, here /a/ and /e/ and thus /pate/; if the vowel is not the standard, you add the corresponding diacritic, and thus /pote/ would be written as ᘊᑦᓭ and /pato/ as ᘊᓭᑦ. The “comma” is a semivowel /j/ added after the vowel, and thus ᘊᓭ, would be /patej/ and ᘊᑦ,ᓭ would give us /pojte/.

The abugida hypothesis is at least plausible, even though, as I said, the words are a bit longer than I’d like and my score at guessing the consonants probably isn’t much better than a random choice. We can later try better guesses using the vocabulary we have decoded so far, such as “water” and “sea”, hoping they are related to some known language (phonosymbolism, anyone?)

But at least ᓭᘖᔭᓄ as /tebagava/ for “water”, while very unlikely, sounds better then the pronountiation I derived in the previous post, the “alphabetic guess”