hou tu pranownse binish – part 2

(note: I am posting part 2 before part 1… Part 1 and 3, alphabetic and syllabic guesses, are far harder and I don’t know if I’ll be able to finish them soon — real life knocking at the door)

In the previous post, I tried, without much success or confidence, to map Beanish glyphs to phonemes, assuming it is an alphabet. I used frequency tables, some linguistic knowledge, my ear (“it sounds good enough”) and, mostly, wild guesses. As I stated, the biggest problem are the diacritics: we can be more or less flexible regarding potential Beanish phonotactic restrictions, but the diacritics (with the possible exception of the “comma” one) do not work like the other glyphs (i.e., they are not letters) but don’t seem to work well as phonetic traits either. I tried to map them to some phonetic features nonetheless, but nobody should be pleased with my suggestions (I certainly am not).

One idea that has been debated in the XKCD fora since the time Time was playing was to treat it as an abugida. The diacritics are probably, once more, to blame, but in a lot of ways it does make sense: they could be very well vowel-marks (we can even try to think of them as a graphical representation based on the point of articulation in the mouth, very loosely like Korean) and the biggest objection is that the mean word lenght is a bit too long. Not that the abugida solution solves every single difficulty regarding Beanish: the transition probabilities among glyphs do suggest an alphabet more than an abugida (assuming the grammar isn’t terribly strict) and the number of glyphs is a bit too large for a “plausible” language. A third possibility is that the script is indeed a sillabary (remember that Randall used Linear A as an example), which does not exclude the possibility of the diacritics being vowel marks; we shall investigate this later.

Anyway, we have four diacritics in the Beanish script: the “middle dot” ᐧ , the “c” ᑦ , the “inverted c” ᐣ and the “comma” ,. Our major difficulty is that they can be combined, particularly the comma, in words such as ᖉᑦ, (but we also have the complex word ᓭᑦᐧ). If the diacritics are vowels, this could mean that vowels can sometimes be combined: in particular, the “comma” could be a glide (the most obvious being the palatal approximant /j/). We are left with ᓭᑦᐧ which, among other hypothesis, could be a diphtong (the only one we have so far) or the mark for a rare vowel. This is what I will assume.

Considering the three diacritics we have left, the fact that one of them looks graphically “neutral” (probably the most common vowel, such as /a/ or /ə/) and the fact that the other two seem to mirror/negate themselves, it is a good guess to consider the middle dot as an /a/, the “inverted c” as /e/ (possibly with allophones such as /ɛ/), the “c” as /o/ (possibly with allophones such as /ɔ/), the “comma” the /j/ glide and the combined diacritic ᑦᐧ just /oa/ or, even better, /oə/.

And now, let’s tabulate everything to find both the default vowel for each consonant and a guess of what consonant it is (based in the consonant frequency of both Beanish and English, plus two dorsals not found in English but common in other languages). Everything assumes that the syllable structure is V+C, and we are solving the isolated diacritic in ᑦᘈᖽᐣ (it would just be a word starting with /a/, the only one in our corpus: /asaʤe/).

Glyph Count /a/ /e/ /o/ Probable base-vowel Guess consonant
29 0 3 0 /a/ ? /p/
27 0 1 2 /a/ /b/
24 8 + 0.5 (ᓭᑦᐧ) 0 2 + 0.5 (ᓭᑦᐧ) /e/ /t/
21 0 7 0 /a/ /d/
17 5 2 0 /o/ /k/
17 0 0 1 /a/ ? /g/
16 0 1 3 /a/ /ʧ/
15 0 1 0 /a/ ? /ʤ/
13 0 2 2 /a/ /f/
11 0 1 0 /a/ ? /v/
10 0 0 0 /a/ ? /θ/
10 0 3 2 /a/ /ð/
10 0 0 0 /a/ ? /s/
7 0 1 0 /a/ ? /z/
7 0 0 0 /a/ ? /ʃ/
7 0 0 1 /a/ ? /ʒ/
6 0 3 0 /a/ ? /m/
5 0 0 0 /a/ ? /n/
5 0 0 0 /a/ ? /l/
4 0 0 2 /a/ ? /r/
3 0 0 0 /a/ ? /ŋ/
1 0 0 0 /a/ ? /ʎ/
3 0 0 0 /a/ ? /ɲ/

Which is great, because 1. There is no glyph with at least one occurence for every diacritic and 2. While a bit extensive, the size of the phonetic catalog is very reasonable (no need to use ejectives or the like, as in the guessed alphabet of part 1 of this post).

If you are still puzzled, this means that (completely made up words) ᘊᓭ should be read with the default vowel for each glyph, here /a/ and /e/ and thus /pate/; if the vowel is not the standard, you add the corresponding diacritic, and thus /pote/ would be written as ᘊᑦᓭ and /pato/ as ᘊᓭᑦ. The “comma” is a semivowel /j/ added after the vowel, and thus ᘊᓭ, would be /patej/ and ᘊᑦ,ᓭ would give us /pojte/.

The abugida hypothesis is at least plausible, even though, as I said, the words are a bit longer than I’d like and my score at guessing the consonants probably isn’t much better than a random choice. We can later try better guesses using the vocabulary we have decoded so far, such as “water” and “sea”, hoping they are related to some known language (phonosymbolism, anyone?)

But at least ᓭᘖᔭᓄ as /tebagava/ for “water”, while very unlikely, sounds better then the pronountiation I derived in the previous post, the “alphabetic guess”

Regarding ᘝᓄᘈᖉᐣ

Yet another hyphotesis: while our corpus is small and most of the words I am using for this hypothesis seem to be related (“water”, “sea”…), there is a strong tendency for the glyph ᓄ to be found only at the end of words (mostly nouns).

The exception is ᘝᓄᘈᖉᐣ, a somewhat unusual word that many suppose is the name of the Beanie city. Maybe its name is actually a compund word, ᘝᓄ and ᘈᖉᐣ? An even wilder guess: ᘝᓄ or, more likely given the syntax, ᘈᖉᐣ could mean “new” (as in “New York”).

Do you carry these people?

I have decided to study Big Hair’s speech in English, as people have pointed that it might be a “key”. Just had my first insight:

ImmagineIn frame 2897, she (supposedly) says “Do you carry these people with you?”. She probably intends “Did you bring any of those people with you?”, referring to the Forty.

We could make hypothesis about the reason for the past-mark-dropping, but I want to focus in the verb “to carry”. While it may sound very weird to native English speakers (for some people in the forum, it was undecipherable at first), it could be expected error from the speaker of a language that makes a different distinction between to carry/to bring, such as Italian and French. We know we are in current-day France and Randall said that Beanish was “plausible”, not to mention the fact that all Big Hair’s numbers “are too small”… maybe Beanish has French features?

Thirteen regular expressions to rule (almost) them all

Maybe it is time to get back to work. In this post I present 13 regular expressions (Python syntax) that cover most of the words in the Beanish corpus.

Immagine

 

The goal was to have a way to test and group the words, not to actually perform regular expression pattern matching or substitutions. If you are familiar with regular expressions, you can probably tell this by the fact the syntax and the grouping do not make much sense. I wanted to make it easier to spot groups, raise hypothesis and find the most unusual words. In a way, this is a form of data compression, of entropy reduction. Words could be grouped in different ways and, if one wanted to have full coverage, longer patterns could match all words.

All patterns exclude what we safely assume to be final punctuation (which can be added with a ur'[ᐨᐦᐤ]?$’).

Pattern 1

ur'(ᖆᐣ?|ᘛᖆ|ᘖᐣ)(ᑕᑦ)?[ᖽᖚᖗ](ᔭ,?|[ᒣᘈᑫ])?'

Words covered: ᖆᖚᔭ,ᐨ / ᖆᐣᖚᔭᐦ / ᖆᐣᖽ / ᘛᖆᖚᘈᐤ / ᖆᐣᑕᑦᖚᑫ / ᘖᐣᖗᔭ, / ᖆᖽᒣ

  • ᖆ is usually followed by the diacritic ᐣ when it is an initial, a feature it seems to share with ᘖ
  • While, for this group, ᑕ and ᑦ are always grouped, there is no indication that they are dependent
  • [ᖚᖗ] and [ᘈᑫ] seem to be two different groups; it is also possible that ᖽ belongs to the first group and ᒣ to the second one (as suggested by the following patterns)

Pattern 2

ur'ᖆ(ᓄ?ᘈ|ᕬ)(ᖉᐣ?|ᘖ)ᔭ?'

Words covered: ᖆᘈᘖᐦ / ᖆᕬᖉᔭ / ᖆᘈᘖ / ᖆᓄᘈᖉᐣ

  • The second of the three patterns for words starting with ᖆ, which looks extremely frequent and prolific (if it were English and this is an alphabet, likely a vowel)
  • Not much in common among these words, syntactically
  • ᓄ and ᘈ probably belong to different categories
  • ᖉ and ᘖ probably belong to the same category

Pattern 3

ur'ᖆ(ᘊᓭᒣ|ᔭ)ᖊᐣ?[ᖗᖽ]'

Words covered: ᖆᔭᖊᖽ / ᖆᘊᓭᒣᖊᐣᖗᐨ

  • Last pattern for words starting with ᖆ
  • Not much can be said, but the words could be related if Beanish uses infixes
  • ᖗ and ᖽ are probably in the same category
  • ᘊ, ᓭ and ᒣ are once more seen together; if it is alphabet, one of them is likely a vowel and the other two are consonants, possibly a fricative/plosive and a liquid

Pattern 4

ur'ᘊ?[ᒣᓭᑫᖊ]+[ᐧ,ᑦ]?[ᖚᘊᘖᘈᑲᓄᑕᖊᔭᖽ]*[ᑦ,ᐧ]?[ᔭᖉᔕᘖᖆ]?ᓄ?'

Words covered: ᘊᒣᓭᐧᖊᔕ / ᘊᓭᘖᑦᓄᐨ / ᖊ,ᘖ / ᓭᘈ / ᓭᘖᑦ / ᘊᓭᐧᑲ / ᖊᘊᐤ / ᒣᖉ / ᘊᓭᘖᔭᓄᐤ / ᓭᘖᔭᓄᐨ / ᓭᘖᔭᓄᐦ / ᘊᓭᑦᑕᖉ / ᓭᐧᘖ / ᓭᑦᐧ / ᘊᓭᘖᔭᓄᐨ / ᘊᖊᑦᓄ / ᓭᔭᑦᘖ / ᘊᓭᐧᑲᐤ / ᑫᘊᘊ / ᘊᒣᑦᖽᖆᐨ / ᘊᓭᐧᑲᐨ / ᓭᘊᘊ / ᒣᓭᐧᖊᔕᐨ / ᓭᐧᖚ / ᘊᓭᑦᑕᖉᐨ

  • The most complex and performing pattern, covers most of what are supposed to be nouns
  • ᘊ- looks indeed as a prefix
  • ᒣ, ᓭ, ᑫ and ᖊ would likely be vowels, allowing diphtongs and the diacritics would thus be applied to vowels
  • [ᖚᘊᘖᘈᑲᓄᑕᖊᔭᖽ] looks like a big bag of consonants, confirming some of my previous assumptions; however, the diacritics can be applied to some of them too
  • The final -ᓄ could be a suffix, or the indication of a strict word phonology

Pattern 5

ur'ᘛᐣ?[ᔭ]?'

Words covered: ᘛᔭᐤ / ᘛ / ᘛᐣ

  • Covers most one/two-symbol words
  • ᘛ is probably a vowel, or at the very least a sonorant, and ᔭ a consonant (guess not supported by evidence: a fricative)

Pattern 6

ur'ᒣ?[ᖉᔪ],?(ᑦ,)?'

Words covered: ᖉᑦ,ᐨ / ᖉᑦ,ᐦ / ᖉ, / ᔪ, / ᒣᖉ

  • Covers most of the words that seem associated with the ideas of “yes, positive, affermative, good”
  • Just like ᘛ, ᖉ and ᔪ are probably vowels/sonorants

Pattern 7

ur'ᔪ?ᕒᖚᐧ?(ᑫᕋ,)?'

Words covered: ᕒᖚᑫᕋ,ᐨ / ᔪᕒᖚᐧ / ᕒᖚᐧ

  • Covers the (ᕒ)ᖚ group, where ᕒ- is likely a question mark (is it just a CU/QU /k/ of Romance languages? or perhaps a WH- from English?)
  • If it is an alphabet, ᕒᖚ looks like a Consonant+Vowel; given that ᑫ is likely a vowel, the rare ᕋ would likely be a rare consonant, and a word like ᕒᖚᑫᕋ would sound something like /kwəX/, where /X/ is the rare consonant

Pattern 8

ur'ᔪ[ᖆᑕ](ᓄᐧ)?'

Words covered: ᔪᖆᓄᐧ / ᔪᑕᐨ

  • There is no clear indication that the two words covered by this regex are related
  • ᓄ is confirmed in its common final position

Pattern 9

ur'ᖽ(ᘛ|ᔕᐣ)(ᕋᑦ|ᘖ)'

Words covered: ᖽᘛᕋᑦ / ᖽᔕᐣᘖ

  • Once more, there is no indication that these words are related
  • Given that ᘛ is likely a vowel, ᔕ would be a vowel too and ᖽ a consonant

Pattern 10

ur'ᘊ(ᘖ[ᑫᒣ])*ᐣᖚ'

Words covered: ᘊᘖᑫᘖᒣᐣᖚ

  • An interesting word, with apparently no consensus on probable translations and because it contratics or make less plausible some of my hypothesis
  • However, it confirms that ᑫ and ᒣ could be vowels and ᘊ, ᘖ and ᖚ a consonant, giving something like TRIROS, FLALEP, PNENUV, etc. (just to make it clear: only to evidence the pattern, I am not suggesting that the symbols correspond to there letters)

Pattern 11

ur'ᖚ(ᒣᑕ)?ᑫ[ᓭᘖ]'

Words covered: ᖚᒣᑕᑫᓭ / ᖚᑫᘖ

  • Another pattern with no clear indication of relation between the words it covers (unless, as stated before, Beanish uses infix morphology and zero-morphemes…)
  • ᒣᑕ could be a Vowel+Consonant

Pattern 12

ur'[ᘈᑕ]ᘊᐣ?[ᘖᒣ]'

Words covered: ᘈᘊᘖᐨ / ᑕᘊᐣᒣ

  • ᘊ can take a diacritic and is probably a common consonant (a liquid?)
  • ᘈ and ᑕ are probably consonants too

Pattern 13

ur'(ᖚᐣ|ᖉᔭᒣᘊᐣ|ᔪᖉᔭ)[ᘖᖗᑫ]+'

Words covered: ᖚᐣᘖᖗᑫ / ᖉᔭᒣᘊᐣᘖᑫᖗ / ᔪᖉᔭᑫ

  • ᘖ, ᖗ and ᑫ seem to constitute a group like ᘊ, ᓭ and ᒣ: probably one vowel and two consonants

Words not covered

Words: ᔭ / ᘖᖆᒣᘛᐨ / ᘖᓄᘈᖉᐣ / ᕋᖗ / ᖊᐣᖽ / ᑦᘈᖽᐣ

  • Equally important, this six uncovered words
  • While I suspected that ᔭ would be a consonant, it can form a word of its own; while possible, this could indicate that we are not dealing with an alphabet
  • ᘖᖆᒣᘛ is one of the words with no agreement on the translation; ᘖᓄᘈᖉᐣ has an uncommon ᓄ in middle position but it is likely a toponym; ᕋᖗ is followed (frame 2728) by the other very strange ᖆᕬᖉᔭ word, ᖊᐣᖽ could a transcription error or might be related to ᖊ,ᘖ and the strange ᑦᘈᖽᐣ is from the same long speech in frame 2728.

Maybe it is now time to go back to the comic and to the blotched English of Big Hair; paying attention to the strange words that could be the key, being toponyms.

The glyph transitions (at last) — Part II

Here is the table that was missing from the previous post, it shows the transitions between glyphs, left-to-right. It is far more important than the previous table, if the script is actually written left-to-right as most agree.

I plan to generate a graph showing the main transitions, stay tuned.

Symbol (from) Transition (to) Occurrences/Total (percentage)
1/14 (7.14%)
1/14 (7.14%)
1/14 (7.14%)
1/14 (7.14%)
1/14 (7.14%)
1/14 (7.14%)
2/14 (14.29%)
4/14 (28.57%)
1/14 (7.14%)
} 1/14 (7.14%)
3/9 (33.33%)
, 1/9 (11.11%)
2/9 (22.22%)
1/9 (11.11%)
} 2/9 (22.22%)
2/7 (28.57%)
1/7 (14.29%)
1/7 (14.29%)
1/7 (14.29%)
} 2/7 (28.57%)
2/8 (25.00%)
, 1/8 (12.50%)
1/8 (12.50%)
1/8 (12.50%)
2/8 (25.00%)
1/8 (12.50%)
1/3 (33.33%)
} 2/3 (66.67%)
1/5 (20.00%)
1/5 (20.00%)
} 3/5 (60.00%)
1/18 (5.56%)
1/18 (5.56%)
1/18 (5.56%)
2/18 (11.11%)
1/18 (5.56%)
1/18 (5.56%)
2/18 (11.11%)
2/18 (11.11%)
} 7/18 (38.89%)
1/6 (16.67%)
1/6 (16.67%)
1/6 (16.67%)
1/6 (16.67%)
} 2/6 (33.33%)
1/12 (8.33%)
1/12 (8.33%)
2/12 (16.67%)
2/12 (16.67%)
1/12 (8.33%)
3/12 (25.00%)
} 2/12 (16.67%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
2/11 (18.18%)
} 2/11 (18.18%)
2/9 (22.22%)
1/9 (11.11%)
1/9 (11.11%)
1/9 (11.11%)
} 4/9 (44.44%)
1/5 (20.00%)
1/5 (20.00%)
, 1/5 (20.00%)
1/5 (20.00%)
1/5 (20.00%)
1/13 (7.69%)
1/13 (7.69%)
, 2/13 (15.38%)
2/13 (15.38%)
1/13 (7.69%)
1/13 (7.69%)
} 5/13 (38.46%)
, 1/7 (14.29%)
} 6/7 (85.71%)
1/8 (12.50%)
1/8 (12.50%)
1/8 (12.50%)
1/8 (12.50%)
1/8 (12.50%)
} 3/8 (37.50%)
? 1/1 (100.00%)
1/17 (5.88%)
2/17 (11.76%)
2/17 (11.76%)
2/17 (11.76%)
2/17 (11.76%)
5/17 (29.41%)
} 3/17 (17.65%)
2/8 (25.00%)
1/8 (12.50%)
1/8 (12.50%)
} 4/8 (50.00%)
1/3 (33.33%)
, 1/3 (33.33%)
1/3 (33.33%)
2/17 (11.76%)
3/17 (17.65%)
3/17 (17.65%)
1/17 (5.88%)
1/17 (5.88%)
2/17 (11.76%)
1/17 (5.88%)
} 4/17 (23.53%)
3/3 (100.00%)
1/5 (20.00%)
1/5 (20.00%)
1/5 (20.00%)
1/5 (20.00%)
} 1/5 (20.00%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
, 1/11 (9.09%)
1/11 (9.09%)
2/11 (18.18%)
1/11 (9.09%)
} 2/11 (18.18%)
1/9 (11.11%)
2/9 (22.22%)
1/9 (11.11%)
1/9 (11.11%)
1/9 (11.11%)
} 3/9 (33.33%)
1/16 (6.25%)
4/16 (25.00%)
1/16 (6.25%)
5/16 (31.25%)
1/16 (6.25%)
1/16 (6.25%)
2/16 (12.50%)
} 1/16 (6.25%)
1/1 (100.00%)
} 1/1 (100.00%)
{ 10/60 (16.67%)
3/60 (5.00%)
1/60 (1.67%)
3/60 (5.00%)
3/60 (5.00%)
4/60 (6.67%)
3/60 (5.00%)
2/60 (3.33%)
5/60 (8.33%)
1/60 (1.67%)
2/60 (3.33%)
? 1/60 (1.67%)
8/60 (13.33%)
1/60 (1.67%)
2/60 (3.33%)
1/60 (1.67%)
1/60 (1.67%)
1/60 (1.67%)
8/60 (13.33%)

Aside

While I still think that the new template for the blog looks good, it might have hidden from general view some good work that has been showing up in the comments.

I still want to check some things about Lojban before writing an answer to greb, but first I want to share some thoughts by J.:

Anyway, I have been working on Beanish for a bit before reading this blog, and you have helped me fill in a lot of blanks in my notes. It’s actually starting to make sense! Thank you. Coming from a different direction I’ve notice a couple differences between your corpus and mine, but the majority is almost exactly the same.

One important feature I think you are missing is word-structure. I’m going from the assumption that ‘Beanish’ is a synthetic language (as opposed to an isolating one), and one with a rigid templatic structure and semi-fluid morpheme boundaries. With that in mind, I want to propose a few morphemes:

ᖆᐣᖽ – To (Preposition)

ᖆᐣᖽ (To, x3)
ᔪᖆᓄᐧ (What to?)
ᖆᖽᒣ (Up?)
ᖆᐣᖚᔭ (Today? [Fudging a bit here])
ᖆᐣᑕᑦᖚᑫ (UKN)
ᖆᔭᖊᖽ (UKN)
*ᖆᕬᖉᔭ (Not an occurrence of the morpheme?)
*ᖆᓄᘈᖉᐣ (Not an occurrence of the morpheme?)
*ᘊᒣᑦᖽᖆ (Not an occurrence of the morpheme)
*ᖆᘈᘖ (Not an occurrence of the morpheme)

(ᖽ)ᘛ – You (2nd Person)

ᘛᔭ (You are)
ᖽᘛᕋᑦ (You Possessive x2) *This changes corpus line 26*
ᘛ (You)
ᘛᖆᖚᘈ (You Journeyed?) *This is a minor change to corpus line 24*
ᘛᐣ (You, [3 plural?])
*ᘖᖆᒣᘛ (Not an occurrence of morpheme)

ᕒᖚᐧ – Where

ᔪᕒᖚᐧ (Where from)
ᕒᖚᐧ (Where x4) *Line 25?*
ᕒᖚᑫᕋ, (UKN)

Also, if ᖉᑦ, means yes or good, then ᖉ, ᖆᐣᖚᔭ, would mean good day, with the ᖆᐣ prefix meaning something .. but that bit is eluding me.

Notice, for all these morphemes, than when the morpheme is placed before (or sometimes next to) a word, the phonemes ᐣ, ᖽ, ᘖ, & ᐧ will occasionally drop. This gives us a ‘core’ phoneme or two for each morpheme (ᖆ, ᘛ, ᕒᖚ, ᖉ, and ᔕ for we). I also propose that ᑫ is a core for a possessive morphemic suffix, and ᘊ is a core for a importance-marker type morphemic prefix.

Finally, this is the word structure template that I can work out with these morphemes:

(Q-Word)/PERSON (opt?) – (PREPOSITION) – (Importance marker) – word – (POSSESSION)

If you have any insights on this approach, or know of a better place to put this, let me know! Hopefully we can get this cracked.

I don’t think that ᖆ- is a morpheme in the way J. suggests, as the hypothesis of having it as a determiner or a semantic morpheme for “bigger, larger” still sounds more plausible to me. I also find it unlikely that morphemes and phonemes are joined the way described (but I am not sure I completely understood it), especially after  considering the frequencies of glyph to glyph to transition that seem to disprove it (not to mention that, if what we have been calling “diacritics” are indeed diacritics, it might be needed to remove them from our analysis).

While I am not really sure abput the differences in his/her reading of the script, the suggestions for ᘛ and ᕒᖚᐧ make a lot of sense and ᖉ, ᖆᐣᖚᔭ, has probably been nailed down (I had translated it as “good morning” by observing the conversational clues, but I am now surprised about how I could have missed the ᖉ as “good”, which now seems so obvious! great work, J.!). What is really important, however, is that the freshness of the rigid synthetic paradigm he/she suggests. While some comments on the OTT during the comic run had similar hypothesis, and I think I posted something along these lines in one of my first posts, it is the first time I am seeing a true hypothesis for the word structure template of Beanish.

We are finally getting to the point that we can draw some hypothesis and test them.

The glyph transitions (at last)

I wrote a simple Python script to finally present the transitions from glyph to glyph (including word boundaries, denoted with { and }). You can find it at the same GitHub repository, please fork, modify, correct, extend it if you want.

Without further ado, here are the tables of transitions from glyph to glyph. I will discuss them in a future post, possibly with some graphical representation.

The first table indicates the occurrences of glyphs in the first column when following by those in the second one:

Symbol Transition from Occurrences/Total (percentage)
1/14 (7.14%)
1/14 (7.14%)
1/14 (7.14%)
1/14 (7.14%)
{ 10/14 (71.43%)
2/9 (22.22%)
1/9 (11.11%)
1/9 (11.11%)
1/9 (11.11%)
1/9 (11.11%)
{ 3/9 (33.33%)
1/7 (14.29%)
1/7 (14.29%)
2/7 (28.57%)
1/7 (14.29%)
1/7 (14.29%)
{ 1/7 (14.29%)
1/8 (12.50%)
2/8 (25.00%)
1/8 (12.50%)
1/8 (12.50%)
{ 3/8 (37.50%)
2/3 (66.67%)
1/3 (33.33%)
1/5 (20.00%)
1/5 (20.00%)
2/5 (40.00%)
1/5 (20.00%)
1/18 (5.56%)
1/18 (5.56%)
, 1/18 (5.56%)
2/18 (11.11%)
3/18 (16.67%)
1/18 (5.56%)
2/18 (11.11%)
4/18 (22.22%)
{ 3/18 (16.67%)
1/6 (16.67%)
1/6 (16.67%)
{ 4/6 (66.67%)
1/12 (8.33%)
1/12 (8.33%)
3/12 (25.00%)
3/12 (25.00%)
1/12 (8.33%)
{ 3/12 (25.00%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
2/11 (18.18%)
1/11 (9.09%)
1/11 (9.09%)
{ 2/11 (18.18%)
2/9 (22.22%)
1/9 (11.11%)
1/9 (11.11%)
5/9 (55.56%)
{ 5/5 (100.00%)
1/13 (7.69%)
3/13 (23.08%)
1/13 (7.69%)
2/13 (15.38%)
1/13 (7.69%)
2/13 (15.38%)
1/13 (7.69%)
1/13 (7.69%)
{ 1/13 (7.69%)
, 1/7 (14.29%)
1/7 (14.29%)
1/7 (14.29%)
2/7 (28.57%)
1/7 (14.29%)
1/7 (14.29%)
1/8 (12.50%)
1/8 (12.50%)
1/8 (12.50%)
2/8 (25.00%)
1/8 (12.50%)
{ 2/8 (25.00%)
? { 1/1 (100.00%)
1/17 (5.88%)
1/17 (5.88%)
1/17 (5.88%)
1/17 (5.88%)
2/17 (11.76%)
1/17 (5.88%)
1/17 (5.88%)
1/17 (5.88%)
{ 8/17 (47.06%)
2/8 (25.00%)
1/8 (12.50%)
2/8 (25.00%)
? 1/8 (12.50%)
2/8 (25.00%)
1/3 (33.33%)
1/3 (33.33%)
{ 1/3 (33.33%)
4/17 (23.53%)
2/17 (11.76%)
2/17 (11.76%)
1/17 (5.88%)
1/17 (5.88%)
1/17 (5.88%)
1/17 (5.88%)
1/17 (5.88%)
1/17 (5.88%)
2/17 (11.76%)
1/17 (5.88%)
1/3 (33.33%)
{ 2/3 (66.67%)
1/5 (20.00%)
1/5 (20.00%)
1/5 (20.00%)
1/5 (20.00%)
{ 1/5 (20.00%)
1/11 (9.09%)
1/11 (9.09%)
2/11 (18.18%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
1/11 (9.09%)
2/11 (18.18%)
{ 1/11 (9.09%)
1/9 (11.11%)
2/9 (22.22%)
3/9 (33.33%)
1/9 (11.11%)
1/9 (11.11%)
{ 1/9 (11.11%)
2/16 (12.50%)
5/16 (31.25%)
1/16 (6.25%)
{ 8/16 (50.00%)
1/1 (100.00%)
1/1 (100.00%)
} 1/60 (1.67%)
2/60 (3.33%)
2/60 (3.33%)
2/60 (3.33%)
3/60 (5.00%)
7/60 (11.67%)
2/60 (3.33%)
2/60 (3.33%)
2/60 (3.33%)
4/60 (6.67%)
5/60 (8.33%)
, 6/60 (10.00%)
3/60 (5.00%)
3/60 (5.00%)
4/60 (6.67%)
4/60 (6.67%)
1/60 (1.67%)
2/60 (3.33%)
3/60 (5.00%)
1/60 (1.67%)
1/60 (1.67%)

It is clear that some glyphs are far more promiscuous than others, but its even clearer that our corpus is too limited for any general assumption.

The following table is as important as the previous: while the first gives us the transitions of glyphs towards others, this one gives us the transitions of glyph from others (in other words, it indicates the occurrences of glyphs in the second column when following those in the first one):

(Sorry, the table was wrong — I’ll fix and post it later)

On word lengths

It is soon to discuss word length in Beanish, as we have just started studying the glyphs (btw, the corpus on GitHub has already been corrected and improved — the joy of the crowd!). But I can’t help discussing the particular word length distribution in Beanish.

Words in Beanish have a strong tendency to have between 3 to 5 glyphs; if we consider words between 2 to 5 glyphs, they represent 70% of Beanish words. Compare with the word-length distribution in English from Peter Norvig webpage (http://norvig.com/mayzner.html):

oIt is not only a matter of different mean length (3.784 glyphs), but of very small standard deviation (1.427 glyphs). We will end up discussing it in the future; for the time being, I take this as a further suggestion that the script is a consonantal abugida,  even though the glyph distribution doesn’t strongly suggest it. (I know some people disagree, please discuss it in the comments, it’s the purpose of this post 😉 )

Update II

I have put the Beanish corpus in GitHub, so you can all clone it and play with it: https://github.com/tresoldi/beanish/blob/master/corpus.txt

It is just a copy of the previous transliteration; if I don’t get more comments on the suggestions proposed by edo, I will change the glyphs soon. Each line is a single Beanish sentence (some repeated ones are missing, I will add them later: it will be up to each research to remove duplicates), followed by a # character and a description of the sentence (currently, the frame where it is found and the probable English translation). Everything in UTF-8.

So many people have been commenting on the ᘊ- prefix, suggesting it as a determinative, an augmentative, a superlative, a comparative of majority, etc. that I wasn’t able to discover who first suggested it on the OTT. User Newfur suggested in a comment to this blog that it could be an honorific, as the Japanese o-. While not strongly supporting it, I find the translation as “big/large” with an implied comparative (meaning “bigger/larger in regard to X than the standard”) the most probable: ᘊᒣᓭᐧᖊᔑ could mean “leopard”, literally “big cat/animal” (derived from an unseen *ᒣᓭᐧᖊᔑ, “cat, animal”), ᘊᒣᓭᐧᖊᔑ could be the modifier in the “cream for-healing” (probably related to “health”), ᘊᓭᘖᔭᓄ could indeed be “big water”, in the sense of either “flood” or “sea”, and ᘊᓭᘖᑦᓄ could be “leader” as “great person” or even, literally, “Big Hair” (and thus we would have a new word, *ᓭᘖᑦᓄ “hair(s)”).

Finally, as I posted in the OTT, I am thinking about a new thread in the XKCD fora, one mostly about Beanish. What do you think? (btw, I probably have missed some questions on the OTT, please send me a private message or post a comment here if you want to say/ask something).