Maybe it is time to get back to work. In this post I present 13 regular expressions (Python syntax) that cover most of the words in the Beanish corpus.



The goal was to have a way to test and group the words, not to actually perform regular expression pattern matching or substitutions. If you are familiar with regular expressions, you can probably tell this by the fact the syntax and the grouping do not make much sense. I wanted to make it easier to spot groups, raise hypothesis and find the most unusual words. In a way, this is a form of data compression, of entropy reduction. Words could be grouped in different ways and, if one wanted to have full coverage, longer patterns could match all words.

All patterns exclude what we safely assume to be final punctuation (which can be added with a ur'[ᐨᐦᐤ]?$’).

Pattern 1


Words covered: ᖆᖚᔭ,ᐨ / ᖆᐣᖚᔭᐦ / ᖆᐣᖽ / ᘛᖆᖚᘈᐤ / ᖆᐣᑕᑦᖚᑫ / ᘖᐣᖗᔭ, / ᖆᖽᒣ

  • ᖆ is usually followed by the diacritic ᐣ when it is an initial, a feature it seems to share with ᘖ
  • While, for this group, ᑕ and ᑦ are always grouped, there is no indication that they are dependent
  • [ᖚᖗ] and [ᘈᑫ] seem to be two different groups; it is also possible that ᖽ belongs to the first group and ᒣ to the second one (as suggested by the following patterns)

Pattern 2


Words covered: ᖆᘈᘖᐦ / ᖆᕬᖉᔭ / ᖆᘈᘖ / ᖆᓄᘈᖉᐣ

  • The second of the three patterns for words starting with ᖆ, which looks extremely frequent and prolific (if it were English and this is an alphabet, likely a vowel)
  • Not much in common among these words, syntactically
  • ᓄ and ᘈ probably belong to different categories
  • ᖉ and ᘖ probably belong to the same category

Pattern 3


Words covered: ᖆᔭᖊᖽ / ᖆᘊᓭᒣᖊᐣᖗᐨ

  • Last pattern for words starting with ᖆ
  • Not much can be said, but the words could be related if Beanish uses infixes
  • ᖗ and ᖽ are probably in the same category
  • ᘊ, ᓭ and ᒣ are once more seen together; if it is alphabet, one of them is likely a vowel and the other two are consonants, possibly a fricative/plosive and a liquid

Pattern 4


Words covered: ᘊᒣᓭᐧᖊᔕ / ᘊᓭᘖᑦᓄᐨ / ᖊ,ᘖ / ᓭᘈ / ᓭᘖᑦ / ᘊᓭᐧᑲ / ᖊᘊᐤ / ᒣᖉ / ᘊᓭᘖᔭᓄᐤ / ᓭᘖᔭᓄᐨ / ᓭᘖᔭᓄᐦ / ᘊᓭᑦᑕᖉ / ᓭᐧᘖ / ᓭᑦᐧ / ᘊᓭᘖᔭᓄᐨ / ᘊᖊᑦᓄ / ᓭᔭᑦᘖ / ᘊᓭᐧᑲᐤ / ᑫᘊᘊ / ᘊᒣᑦᖽᖆᐨ / ᘊᓭᐧᑲᐨ / ᓭᘊᘊ / ᒣᓭᐧᖊᔕᐨ / ᓭᐧᖚ / ᘊᓭᑦᑕᖉᐨ

  • The most complex and performing pattern, covers most of what are supposed to be nouns
  • ᘊ- looks indeed as a prefix
  • ᒣ, ᓭ, ᑫ and ᖊ would likely be vowels, allowing diphtongs and the diacritics would thus be applied to vowels
  • [ᖚᘊᘖᘈᑲᓄᑕᖊᔭᖽ] looks like a big bag of consonants, confirming some of my previous assumptions; however, the diacritics can be applied to some of them too
  • The final -ᓄ could be a suffix, or the indication of a strict word phonology

Pattern 5


Words covered: ᘛᔭᐤ / ᘛ / ᘛᐣ

  • Covers most one/two-symbol words
  • ᘛ is probably a vowel, or at the very least a sonorant, and ᔭ a consonant (guess not supported by evidence: a fricative)

Pattern 6


Words covered: ᖉᑦ,ᐨ / ᖉᑦ,ᐦ / ᖉ, / ᔪ, / ᒣᖉ

  • Covers most of the words that seem associated with the ideas of “yes, positive, affermative, good”
  • Just like ᘛ, ᖉ and ᔪ are probably vowels/sonorants

Pattern 7


Words covered: ᕒᖚᑫᕋ,ᐨ / ᔪᕒᖚᐧ / ᕒᖚᐧ

  • Covers the (ᕒ)ᖚ group, where ᕒ- is likely a question mark (is it just a CU/QU /k/ of Romance languages? or perhaps a WH- from English?)
  • If it is an alphabet, ᕒᖚ looks like a Consonant+Vowel; given that ᑫ is likely a vowel, the rare ᕋ would likely be a rare consonant, and a word like ᕒᖚᑫᕋ would sound something like /kwəX/, where /X/ is the rare consonant

Pattern 8


Words covered: ᔪᖆᓄᐧ / ᔪᑕᐨ

  • There is no clear indication that the two words covered by this regex are related
  • ᓄ is confirmed in its common final position

Pattern 9


Words covered: ᖽᘛᕋᑦ / ᖽᔕᐣᘖ

  • Once more, there is no indication that these words are related
  • Given that ᘛ is likely a vowel, ᔕ would be a vowel too and ᖽ a consonant

Pattern 10


Words covered: ᘊᘖᑫᘖᒣᐣᖚ

  • An interesting word, with apparently no consensus on probable translations and because it contratics or make less plausible some of my hypothesis
  • However, it confirms that ᑫ and ᒣ could be vowels and ᘊ, ᘖ and ᖚ a consonant, giving something like TRIROS, FLALEP, PNENUV, etc. (just to make it clear: only to evidence the pattern, I am not suggesting that the symbols correspond to there letters)

Pattern 11


Words covered: ᖚᒣᑕᑫᓭ / ᖚᑫᘖ

  • Another pattern with no clear indication of relation between the words it covers (unless, as stated before, Beanish uses infix morphology and zero-morphemes…)
  • ᒣᑕ could be a Vowel+Consonant

Pattern 12


Words covered: ᘈᘊᘖᐨ / ᑕᘊᐣᒣ

  • ᘊ can take a diacritic and is probably a common consonant (a liquid?)
  • ᘈ and ᑕ are probably consonants too

Pattern 13


Words covered: ᖚᐣᘖᖗᑫ / ᖉᔭᒣᘊᐣᘖᑫᖗ / ᔪᖉᔭᑫ

  • ᘖ, ᖗ and ᑫ seem to constitute a group like ᘊ, ᓭ and ᒣ: probably one vowel and two consonants

Words not covered

Words: ᔭ / ᘖᖆᒣᘛᐨ / ᘖᓄᘈᖉᐣ / ᕋᖗ / ᖊᐣᖽ / ᑦᘈᖽᐣ

  • Equally important, this six uncovered words
  • While I suspected that ᔭ would be a consonant, it can form a word of its own; while possible, this could indicate that we are not dealing with an alphabet
  • ᘖᖆᒣᘛ is one of the words with no agreement on the translation; ᘖᓄᘈᖉᐣ has an uncommon ᓄ in middle position but it is likely a toponym; ᕋᖗ is followed (frame 2728) by the other very strange ᖆᕬᖉᔭ word, ᖊᐣᖽ could a transcription error or might be related to ᖊ,ᘖ and the strange ᑦᘈᖽᐣ is from the same long speech in frame 2728.

Maybe it is now time to go back to the comic and to the blotched English of Big Hair; paying attention to the strange words that could be the key, being toponyms.