Sunday, November 30, 2014

What Makes a Language "Weird"

To a native English speaker, a weird language might be one that contains features not found in the English language such as agglutination, a process that combines morphemes together to form complex words, tone and pitch accent to distinguish lexical or grammatical meaning, and different sounds such as click consonants. But just because a certain language might sound unusual to a certain group of people, that doesn’t necessarily make that language “weird.”

Idibon, an organization that uses natural language processing to help companies understand language data, has developed a method for determining what makes languages weird from a not English-centric point of view. They used the World Atlas of Language Structures (WALS) which evaluates thousands of different languages based on 192 different linguistic features. So instead of comparing each language to English, they evaluated each language by how unusual it is for each of those features. The first step was to figure out which of those traits were actually found in a significant amount of languages and to knock out languages that did not have enough features to be analyzed, cutting down the number of languages to 1693.  Also, since “weirdness” should be based on distinct characteristics and there is some redundancy in the ones listed in WALS (e.g. one for subject–object–verb order and then others for subject-verb and object-verb), the features to be analyzed had to be even further narrowed down. In the end, 21 features were left, including “Order of Negative Morpheme and Verb,” “Position of Tense-Aspect Affixes,” and “Fixed Stress Locations.”

Then, the relative frequencies for the values of each feature were computed, and the Weirdness Indexes of the languages were calculated by subtracting one from the harmonic mean of their frequencies (so that a higher index means more weird). Even though this method has some blatant flaws such as an equal weight distribution for features of different importances and the fact that languages are way too complex to be described by just 21 characteristics, it does compare languages in a meaningful way and it leads to some very interesting results. If we consider only the languages that have a value filled in for at least 14 out of the 21 features, we find that English places within the 14th percentile of weirdest languages with a Weird Index of 0.756. Chalcatongo Mixtec (spoken in parts of Mexico) comes in first as weirdest language with an index of 0.972, and Hindi comes in last as the least weird language with an index of 0.087. We can also look at how a certain feature of a language compares to the rest. For instance, the word order switching that English uses in yes or no questions can only be found in about 1.4% of the languages, many of them coming from Europe: German, Czech, Dutch, Swedish, Norwegian, Frisian, English, Danish, and Spanish.


Ultimately, a noteworthy way to determine the weirdness of a language is to compare its features with the rest of a set of languages as does the technique which Idibon proposed. Though not a flawless method, it can provide us with insightful comparisons between languages worldwide.

1 comment:

  1. Online Gaming at JamBase Casino
    Online Gaming at JamBase Casino offers 인천광역 출장샵 a full-service casino 삼척 출장마사지 experience to everyone. Whether you are 전라남도 출장마사지 looking for 시흥 출장안마 slots, table games, video poker or  Rating: 4 · ‎7 부천 출장안마 reviews

    ReplyDelete