On word lengths

09 Friday Aug 2013

It is soon to discuss word length in Beanish, as we have just started studying the glyphs (btw, the corpus on GitHub has already been corrected and improved — the joy of the crowd!). But I can’t help discussing the particular word length distribution in Beanish.

Words in Beanish have a strong tendency to have between 3 to 5 glyphs; if we consider words between 2 to 5 glyphs, they represent 70% of Beanish words. Compare with the word-length distribution in English from Peter Norvig webpage (http://norvig.com/mayzner.html):

It is not only a matter of different mean length (3.784 glyphs), but of very small standard deviation (1.427 glyphs). We will end up discussing it in the future; for the time being, I take this as a further suggestion that the script is a consonantal abugida, even though the glyph distribution doesn’t strongly suggest it. (I know some people disagree, please discuss it in the comments, it’s the purpose of this post 😉 )

7 thoughts on “On word lengths”

J____ said:

9 August 2013 at 06:40

I’m not quite sure where to put this, so I hope somebody sees this!

Anyway, I have been working on Beanish for a bit before reading this blog, and you have helped me fill in a lot of blanks in my notes. It’s actually starting to make sense! Thank you. Coming from a different direction I’ve notice a couple differences between your corpus and mine, but the majority is almost exactly the same.

One important feature I think you are missing is word-structure. I’m going from the assumption that ‘Beanish’ is a synthetic language (as opposed to an isolating one), and one with a rigid templatic structure and semi-fluid morpheme boundaries. With that in mind, I want to propose a few morphemes:

ᖆᐣᖽ – To (Preposition)

ᖆᐣᖽ (To, x3)
ᔪᖆᓄᐧ (What to?)
ᖆᖽᒣ (Up?)
ᖆᐣᖚᔭ (Today? [Fudging a bit here])
ᖆᐣᑕᑦᖚᑫ (UKN)
ᖆᔭᖊᖽ (UKN)
*ᖆᕬᖉᔭ (Not an occurrence of the morpheme?)
*ᖆᓄᘈᖉᐣ (Not an occurrence of the morpheme?)
*ᘊᒣᑦᖽᖆ (Not an occurrence of the morpheme)
*ᖆᘈᘖ (Not an occurrence of the morpheme)

(ᖽ)ᘛ – You (2nd Person)

ᘛᔭ (You are)
ᖽᘛᕋᑦ (You Possessive x2) *This changes corpus line 26*
ᘛ (You)
ᘛᖆᖚᘈ (You Journeyed?) *This is a minor change to corpus line 24*
ᘛᐣ (You, [3 plural?])
*ᘖᖆᒣᘛ (Not an occurrence of morpheme)

ᕒᖚᐧ – Where

ᔪᕒᖚᐧ (Where from)
ᕒᖚᐧ (Where x4) *Line 25?*
ᕒᖚᑫᕋ, (UKN)

Also, if ᖉᑦ, means yes or good, then ᖉ, ᖆᐣᖚᔭ, would mean good day, with the ᖆᐣ prefix meaning something .. but that bit is eluding me.

Notice, for all these morphemes, than when the morpheme is placed before (or sometimes next to) a word, the phonemes ᐣ, ᖽ, ᘖ, & ᐧ will occasionally drop. This gives us a ‘core’ phoneme or two for each morpheme (ᖆ, ᘛ, ᕒᖚ, ᖉ, and ᔕ for we). I also propose that ᑫ is a core for a possessive morphemic suffix, and ᘊ is a core for a importance-marker type morphemic prefix.

Finally, this is the word structure template that I can work out with these morphemes:

(Q-Word)/PERSON (opt?) – (PREPOSITION) – (Importance marker) – word – (POSSESSION)

If you have any insights on this approach, or know of a better place to put this, let me know! Hopefully we can get this cracked.

Reply
- tresoldi said:
  
  9 August 2013 at 11:16
  
  The best places to discuss are probably here and on the XKCD forum thread devoted to the comic. We will probably end up creating a new thread devoted mostly to Beanish.
  
  Could you point the differences between your corpus and the one in GitHub? If you know how to use git, please fork it and correct it — in fact, it is already a product of many hands, not just mine.
  
  Now, I haven’t gone as far as word structure yet, had only thought about it, but what you point does make a lot of sense, especially regarding the ᖆ- prefix. I will read it in more detail later today.
  
  By the way, as you mostly refer to morphemes when discussing glyphs, I assume you agree that the script is an abugida, right? I mean, in opposition to a syllabary.
  
  Thank you for the good work! 🙂
  
  Reply
  - J_____ said:
    
    9 August 2013 at 19:46
    
    Judging by my morpheme length I would lean in favor of an abugida. But I am going at this from a different direction, and hoping that the orthography of the language becomes obvious once I understand its underlying structure. Thanks for the forum, I’ll post something a bit more understandable there. Keep up the good work!
  - tresoldi said:
    
    9 August 2013 at 20:51
    
    Great, another abugida-supporter — but see the comment from Vokietis, he/she is right about a number of vowels lower than expected (or a frequence of the standard vowel higher than expected, you can look at it from both sides).
    
    The transition tables and the graphics I intend to generated later (showing the most common glyph transitions) will probably help us.
Vokietis said:

9 August 2013 at 15:53

I wonder if you ever considered the script to be an abjad – I mean,

if it’s an abugida, there are rather few vowels and many syllables with standard vowels. Furthermore, your analysis indicated that the diacritics do not necessarily share properties such as +Vowel.

If it’s a syllabary, there are way to few possible syllables: We have 22 main symbols, which makes something like 5 contoids combined with 4 vocoids plus two extra symbols (semivowels or syllabic consonants), some of which can be modified by diacritics to maybe voice the consonants or raise the vowel sounds, but it remains a rather restricted phoneme inventory, so I think we can safely disregard syllabaries.

If it’s an alphabet, then the diacritics could really indicate voicing or devoicing (such as Japanese dakuten), presence or absence of glottal sounds in the onset (as Greek spiritus lenis and asper did), they could even modify the articulation of sounds in some less predictable ways (such as old Irish lenition mark or German umlaut dots, or like the dots in arabic letters that distinguish such unrelated phonemes like [b], [n], [j], [t], [θ]) – which would give us a rather complex inventory of sounds, reminding me of some Khoisan languages.

If it’s an abjad, ommitting most of the vowels and marking only the long ones, we’d have a rather average inventory of 22 consonants. The diacritics could mark the long vowels (given that the long vowel symbols are not part of the 22 glyph basic inventory) or be used to solve ambiguities. Try to imagine English written without short vowels (I’ll use the apostrophe for vowel onsets): ’t’s rathr ’eas ’s lng ’s “cts ’nd dgs” wll mean “cats and dogs” instead of “cuts and digs”. For the latter case you could include the vowels.

Reply
- tresoldi said:
  
  9 August 2013 at 16:48
  
  Thank you for your comment, Vokietis. Your reasoning is very detailed, you explained some points much better and briefer than I’d done — I’ll probably quote you later.
  
  Regarding the diacritics, the statistical analysis indeed suggests a pure vowelness. Even if we consider Randall statement that the language was designed to be very different from English (which could mean just about anything, for example that there are lots of syllabic consonants), it is difficult to “close” the system (not to mention that the diacritics seem to work in very distinct ways, and only ᐣ and ᑦ are clearly related, possibly along with the ᐧ ).
  
  Regarding syllabaries, I investigated the possibility when Randall’s interview on Linear A and B was published. I agree with you that the number of syllables is smaller than expected, and the glyph transitions don’t suggest they are syllable either (of course he could have a created a very rigid conlang in terms of phonology, perhaps even subscribing some phonosymbolic theory, but I find it extremely unlikely.
  
  Regarding alphabets, you explained something I was meaning to write on the blog, as I said I’ll probably quote you. I actually think that the diacritics work the way you postulate and similar glyphs like ᔪ and ᔭ might be related (even though, it this particular case, there is no statical suggestion that they are). Maybe the characters and the diacritics, as many people have already suggested in the XKCD forum, are related to similar IPA glyphs: besides the more common features you mention (voicing/devoicing, place of articulation) they could, for example, indicate that the previous consonant is syllabic or that the following sound is pharyngealized. In fact, speaking of Proto-Indo-European, I was thinking if one of diacritics or even the glyphs aren’t the indication of some “coloring”.
  
  I really hadn’t developed an idea as you propose in the last paragraph, I’ll think about it.
  
  Once more, thank you very much for you comment, it was really helpful. I hope you’ll keep around! 🙂
  
  Reply
André Rhine-Davis said:

14 August 2014 at 23:28

haha ninja’d, just as I finished reading the original post, I was going to post a comment saying to consider the possibility of an abjad 🙂

Reply

Deciphering Beanish

~ ᖉ, ᖆᐣᖚᔭ,ᐦ

On word lengths

7 thoughts on “On word lengths”

Leave a reply to J____ Cancel reply

Condividi:

Related

7 thoughts on “On word lengths”

Leave a reply to J____ Cancel reply