Update

Not much free time to work on Beanish today, and unfortunately the main site on Lojban is offline (I only hope that it wasn’t my fault, as hundreds of people clicked the link I posted… Wish I were able to say “I’m sorry” in Lojban, guys!).

I have been considering some suggestions on the glyphs used in the transliteration, given by edo (which is responsible for many, many initial insights on Beanish, as I found out today when I finally blitzed the OTT), and will upgrade the corpus accordingly.

I have also considered from a Lojban point-of-view part of what as already been discussed. User marchlight pointed in the forum that the ᘊ- affix could be related to the bra- prefix in Lojban, used when referring to an object that is bigger/larger than its standard. Beanish might end up not being to Lojban, after all, but semantically his/her suggestion is very plausible considering the examples we have.

But what is more important in terms of the Lojban/Beanish relationship is the usage of numbers by Rosetta, which at first made me think of French. I’ll quote marchlight here:

Also, Lojban only has numbers 0 through 9. 40 would be said as four zero, which is why Rosetta didn’t know the word. She could probably work out “five times eight” though, based on the reference that shows this example (emphasis mine):

the-number three plus four times five
equals the-number two-three
3 + 4 x 5 = 23

When all you have is a hammer, everything looks like a nail, but one particular Lojban feature could have been adopted or adapted in Beanish. The ᔪ- affix, which only seems to be used in questions, could work as a mark of something unknown (unknown-to-me/us), making Beanish questions very different from English. For example, instead of actually asking “Where are you from?” or “Who attacked you?”, with interrogative pronouns (wh- words), the affix could mark what is unknown in a sentence: “You are from (unknown-to-me-)land?” and “You were attacked by (unknown-to-me-)being”. It could explain the mysterious ᙐᖚᐧ in Rosetta sentence, but I was unable to find a suitable English translation, and favor an SOV word order, at least in questions.

Finally, some people are too kind: http://gigaom.com/2013/08/06/how-one-iconic-comics-superfans-are-working-together-to-decode-a-mysterious-language/ , thank you in name of all superfans. (btw, this made me notice that my linguistic vocabulary is a bit abstruse, I will try to make it simpler from now on; but if you don’t understand something, feel free to ask).

And, finally, if you are a Lojban speaker and reading this, please find the time to help us! 🙂

Advertisements

Lojban

I have been thinking of Lojban since I started this blog. Randall knows about Lojban, and  might speak it (http://xkcd.com/191/, the title-text is “zo’o ta jitfa .i .e’o xu do pendo mi”, loosely translated as “that’s not true (wink); will you be my friend?”).

He described Beanish as both “very different from English” and “plausible future language”. Now, Lojban is by all accounts different from English, but I want to focus in the fact that its vocabulary, computationally built to be close to Mandarin, English, Spanish, Hindi, Russian and Arabic, could be what he means with “plausible”: a future language derived from the most common languages today, a world-wide creole.

Randall could have developed Beanish from Lojban if he wanted people to focus in the last one: Beanish would eventually be cracked, and Lojban would be in the news. Not only that, but in his blag post he thanks a certain “Dan”, which I believe is the linguist that helped him developing Beanish. The Wikipedia article on Lojban has a quote from a certain Dan Parmenter, apparently a linguist interested in Lojban. This is, of course, pure speculation.

Now, the hard part. I don’t know Lojban (I studied it for a whole afternoon about ten years ago), but many features of Beanish don’t strike me as clearly Lojban-based (the only real exception is the dot which could be come from the Lojban stop, but it is a pet peeve of mine that, if Lojban would be used in a daily basis, its stop would be the first feature to change). Even more important, the vocabulary does not match Lojban words: using “The Logical Language Group Online Dictionary Query”, a multilingual Lojban dictionary at http://jbovlaste.lojban.org ), I get that “water” is “djacu”, “drink” is “pinxe”, “sea” is “xamsi”, “hello” is “coi” (but greetings work in a very different way in Lojban). In short, there isn’t a clear link between the script and current Lojban words. I said “current” only to stress tjat the Lojban of the future spoken by the Beanies would have been subject to changes.

Still, another theory to investigate.

Beanish glyphs: a force-directed graph based on the result of Maximum-Likelihood classifiers

Ok, the title is a joke on the title of most academic papers (with the obligatory colon), but, as with most of them, you might actually find the results useful, or at least interesting.

Immagine

This force-directed graph visually translates most of the relationships among Beanish glyphs. It was made with Graphviz, with a source .dot file generated by a Python script. I used the results of the Maximum-Likelihood classifiers I ran yesterday, from 2 to 8 groups, adding or subtracting a score (the value of the uniform distribution for that classifier, for example 0.25 for the one with four classes) from each glyph to glyph edge. Only positive edges are shown.

The size of each glyph indicates its relative frequency. The proximity between glyphs indicate their statistical proximity in terms of grouping, which may or may not mirror a linguistic proximity: the closer the glyphs, the more alike they are. Glyphs not connected by edges do not present much similarity: for example, ᖽ and ᘝ are part of the same group on the left, but, as there is no edge linking them (i.e., the final score was not positive), they don’t seem to be related. The colors for each node (i.e., glyph) are meaningless, but glyphs that were grouped together in the 7-class classifier (which I guess to be the one that more closely mirrors linguistic features) share the same color.

Some initial comments:

  • We immediately note three separate groups. It would be perfect if they mirrored three linguistic groups (for example, vowels, semivowels, consonants), but a quick check at the corpus confirms that, unfortunately, this is not the case. Still, there are strong static indications that there are, in fact, three different groups of glyphs.
  • The diacritics are probably, indeed, a group or sub-group of glyphs, but the comma (the small, lunar lower diacritic) probably is not part of it. Maybe it is something like a subscripted iotta in Greek? (i.e., an alternative representation for a glyph).
  • The group on the left is far more common than the other two.

Glyph classes by a Maximum-Likelihood-Criterion

I have finished my review of the corpus transliterated with the Canadian Aboriginal Syllabics; there were some inconsistencies, but I believe that it is now correct. I included a new glyph (the question mark, [?]) for the first glyphs of what we suppose is the Ionian Sea, as likely there is a single glyph missing.

I used the reviewed corpus to divide the glyphs into classes with a Maximum-Likelihood-Criterion; I adapted the corpus in order to use “mkcls”, which is part of the Giza++ package frequently used in statistical machine translation (http://code.google.com/p/giza-pp/). The glyphs are divided in groups from 2 to 10 classes, my comments are below.

Please remember that these are not linguistic categories, phonological, morphological, syllabic, whatever. They are classes based, essentially, on the frequency and context where each glyph is found in the corpus; while the might mirror real categories, they are to be understood as statistical properties, not linguistic ones (not only because they were built by a statistical classifier, but also because, as I’ve been whining since the first post, the corpus we have is very limited).

Division into 2 classes of glyphs

  • Class 1: , ᐣ ᐧ ᑦ ᑫ ᒣ ᓄ ᔑ ᔭ ᖗ ᖚ ᘈ ᘖ ᙉ
  • Class 2: ? ᑕ ᓭ ᔪ ᕋ ᖉ ᖊ ᖽ ᘊ ᘛ ᘝ ᙐ

Division into 3 classes of glyphs

  • Class 1: , ᓄ ᔑ ᖗ ᖚ ᘖ
  • Class 2: ? ᑕ ᒣ ᓭ ᔪ ᕋ ᖉ ᖊ ᖽ ᘊ ᘛ ᘝ ᙐ
  • Class 3: ᐣ ᐧ ᑦ ᑫ ᔭ ᘈ ᙉ

Division into 4 classes of glyphs

  • Class 1: ᑕ ᒣ ᔪ ᖽ ᘊ ᘛ ᙉ
  • Class 2: , ᓄ ᔑ ᖗ ᖚ ᘖ
  • Class 3: ? ᓭ ᕋ ᖉ ᖊ ᘝ ᙐ
  • Class 4: ᐣ ᐧ ᑦ ᑫ ᔭ ᘈ

Division into 5 classes of glyphs

  • Class 1: , ᓄ ᖗ ᖚ ᘖ
  • Class 2: ᐣ ᑦ
  • Class 3: ? ᓭ ᔑ ᕋ ᖉ ᘝ ᙐ
  • Class 4: ᐧ ᑫ ᔭ ᘈ
  • Class 5: ᑕ ᒣ ᔪ ᖊ ᖽ ᘊ ᘛ ᙉ

Division into 6 classes of glyphs

  • Class 1: ᐣ ᐧ ᑦ ᑫ ᔭ ᘈ ᙉ
  • Class 2: ᒣ ᘊ
  • Class 3: , ᓄ ᔑ ᖗ
  • Class 4: ? ᔪ ᘛ ᘝ ᙐ
  • Class 5: ᕋ ᖉ ᖚ ᖽ ᘖ
  • Class 6: ᑕ ᓭ ᖊ

Division into 7 classes of glyphs

  • Class 1: ᒣ ᘊ
  • Class 2: ᑕ ᖽ ᘖ
  • Class 3: , ᓄ ᖗ ᖚ
  • Class 4: ᓭ ᖊ
  • Class 5: ? ᔪ ᕋ ᖉ ᘛ ᘝ ᙉ ᙐ
  • Class 6: ᐣ ᑦ
  • Class 7: ᐧ ᑫ ᔑ ᔭ ᘈ

Division into 8 classes of glyphs

  • Class 1: ᑕ ᖚ ᖽ ᘖ
  • Class 2: ᓭ ᖊ ᙐ
  • Class 3: ᒣ ᘊ
  • Class 4: , ᓄ ᖗ
  • Class 5: ᐣ ᑦ
  • Class 6: ᐧ ᔑ
  • Class 7: ? ᔪ ᕋ ᖉ ᘛ ᘝ
  • Class 8: ᑫ ᔭ ᘈ ᙉ

My comments:

  • Assuming there are no unseen glyphs, the missing glyph at the beginning of “Ionian Sea” is probably one in the group ᔪ ᕋ ᖉ ᘛ ᘝ.
  • The same group ᔪ ᕋ ᖉ ᘛ ᘝ probably mirrors a true, linguistic group; if the script is alphabetic, they very likely are a group of related consonants.
  • The ML classifier suggests that the diacritics are indeed a separate category of glyphs, an in particular that ᐣ and ᑦ are very similar (one is probably, as per the graphical representation, the inverse of the other). There are, however, doubts regarding the lower diacritic (the comma) and, to a lesser extent, the vertically centered dot.
  • As the affix study had suggested, ᒣ andᘊ are probably very alike, which probably is also true for between ᓭ and ᖊ.

The corpus, upgraded

(I am currently reviewing and correcting the transliterations, expect changes)

Here is my raw upgraded data. It build upon the Canadian Aboriginal Syllabics by Renil, the enhanced contrast map by Zorin_75 and waveney transcriptions, all from the OTT, besides everything I had previously taken from the OTT and the Wiki. I have extended, corrected and modified the transliteration; in particular, I’ve chosen ᑕ for what was [G].

There certainly are errors, please help me proofread if you can.

You should also note that I removed the many repeated occurences of “water”, “cream”, “good morning” and “hello”, as they was skewing the data (the very frequent ᓄ in final word position was due only to the “water” word). I have also revised the translations and removed those that were pure speculation.

Frame Beanish Transcription Translation
2663 ᔪᙐᖚᐧ ᘛᔭᐤ “Where are you from?”
2664 ᔪᘝᓄᐧ ᔪ, ᒣᖉ ᖊᐣᖽ ᖽᘛᕋᑦᐤ “What happened to your leg?” or “Are you injured?”
2668 ᓭᘈ ᘊᒣᓭᐧᖊᔑ ᘝᘊᓭᒣᖊᐣᖗᐨ “Get/Fetch/Bring  (me/us) cream for-healing.” or “We need cream for-healing.” (or “Cream for-healing is needed.”)
2671 ᙐᖚᐧ ᘊᘖᑫᘖᒣᐣᖚ ᘛ ᓭᑦᐧ ᙐᖚᑫᕋ,ᐨ “Where… .” (affirmative sentence)
2676 ᘊᒣᓭᐧᖊᔑ ᘝᘊᓭᒣᖊᐣᖗᐨ “Cream for-healing.”
2697 ᖚᑫᘖ ᓭᐧᖚ “(something) you-wildling”
2706 ᓭᘖᔭᓄᐨ “Water.”
2708 ᖉᑦ,ᐦ ᓭᘖᔭᓄᐦ “Yes! Water.”
2728 ᔪᖉᔭᑫ ᙐᖚᐧ ᘊᓭᐧᖚᐤ ᑦᘈᖽᐣ ᔭ ᘝᖽᒣ ᓭᘖᑦ ᖊᘊᐤ ᕋᖗ ᘝᙉᖉᔭ ᘖᐣᖗᔭ, ᘊᓭᘖᔭᓄᐤ “(…) your land? (…)? (…) waters?”
2734 ᖽᔑᐣᘖ ᖚᐣᘖᖗᑫ ᘝᐣᖽ ᘊᒣᑦᖽᘝᐨ “We will go to the castle.”
2797 ᒣᓭᐧᖊᔑᐨ “Cream.”
2802 ᑕᘊᐣᒣ ᘊᓭᑦᑕᖉᐨ ᑕᘊᐣᒣ ᘊᓭᘖᔭᓄᐨ ᓭᘖᔭᓄᐦ “This lands. This seas. Water!”
2806 / ᘝᘈᘖ ᖽᔑᐣᘖ ᖚᒣᑕᑫᓭ ᘝᐣᖽ ᘊᒣᑦᖽᘝᐨ / ᖉᑦ,ᐨ “JohnDoe/Comrade We shall go to the castle (now).” / “Ok.”
2821 ᘝᓄᘈᖉᐣᐨ “(name of the city)”
2823 ᖉ, ᘝᐣᖚᔭ,ᐨ “Good morning.”
2827 ᘊᒣᑦᖽᘝᐨ “The castle.”
2836 ᖽᔑᐣᘖ ᖚᐧᘖᖗᑫ ᘝᐣᖽ ᘊᓭᘖᑦᓄᐨ “We should go to the leader.”
2841 ᘈᘊᘖᐨ ᖉ, ᘝᐣᖚᔭ,ᐨ “Hello.” “Good morning.”
2842 ᘝᘈᘖᐦ ᖉ, ᘝᐣᖚᔭᐦ “JohnDoe/Comrade! Good morning!”
2865 ᙐᖚᐧ ᓭᔭᑦᘖ ᘛᘝᖚᘈᐤ “Where they come from?”
2866 ᘝᐣᑕᑦᖚᑫ ᓭᐧᘖ ᙐᖚᐧ ᘊᓭᐧᖚᐨ / ᖉᔭᒣᘊᐣᘖᑫᖗ ᖽᘛᕋᑦᐨ “They came/are from the wildlands.” / “One of them is injured.”
2880 / ᘛᐣ ᘝᔭᖊᖽ ᖊᑦᘖ ᘖᘝᒣᘛᐨ / ᔪᑫᐨ “You can leave now.” “Ok.”
2906 ᘊᖊᑦᓄ ᘊᓭᐧᖚ “Wildlands/Balearic Sea”
2906 ᘊᖊᑦᓄ ?ᓄᐣᔭ “Ionian Sea”
2906 ᑫᘊᘊ Gibraltar
2906 ᘊᓭᑦᑕᖉ ?
2906 ᘖᓄᘈᖉᐣ ?
2906 ᓭᘊᘊ ?

Commented changelog:

  • In frame 2663, the question is probably “Where are you from?”, because ᔪᘊᖚᐧ seems to mean “where” (the initial ᔪ might be mandatory when using it in questions, but we have at least one exception). This would make ᘛᔭ a verb, “(you pl.) are”, and in fact ᘛ seems to frequent in sentences/words where we expect the verb “to be”.
  • The meaning of the question in frame 2664 is far from clear, but the word ᖽᘛᕋ is probably a noun that refers to Megan’s leg or to the scratches. If our assumption about ᔪᘊᖚᐧ is right, the first word in this frame might be derived from *ᘝᓄᐧ (*”what”).
  • The sentence in frame 2671 is one of the most obscure, but if our assumptions are right it is an affirmative sentence starting with a “where” ᘊᖚᐧ. The ᘛ in the middle of the sentence could be from the verb “to be”, and the final word ᘊᖚ,ᕋ might be morphologically related to the first. If the language does use an infix morphology for verbs, as per one of my hypothesis, the complex word ᘊᘖᑫᘖᒣᐣᖚ would likely be a verb or a nominal form of a verb [A past participle, perhaps? The following ᘛ could then be, indeed, a verb, an auxiliary verb for the past].
  • Regarding frame 2697, my previous guesses are probably wrong. I’ve noticed that the last word is ᓭᐧᖚ, which not only is similar to ᘊᓭᐧᖚ, the name of the sea near Cuegan home (and presumably the name of their people in Beanish), but seems to confirm the hypothesis of the ᘊ- prefix being a morphological mark, maybe a plural mark. This could make ᓭᐧᖚ something like “wildling” and ᘊᓭᐧᖚ its plural.
  • In frame 2728, once more we have a word starting with ᔪ-, ᔪᖉᔭᑫ, which might indicate a non-interrogative version ᖉᔭᑫ.
  • Frame 2806, read along with frame 2842, strongly suggests that ᘝᘈᘖ is a vocative, either a name, a title or a form of address.
  • In frame 2865, ᙐᖚᐧ strangely lacks a ᔪ- prefix, even though it is certainly a question.

Time to organize

I would love to finish the transition tables and move into full-time Beanish deciphering, but there is no point in rushing. I will organize the data we (people in OTT) have collected (especially the transliteration with Canadian Aboriginal Syllabics), write a decent Python script and place it in GitHub and just accept that I don’t have all the free time I’d need. But please let me know if you know of any good Beanish research position. 😉