By Andries W. Coetzee
Nov 03, 2012
Many Chinese words have what has come to be known as "elastic length" - they have both a short and a long form. This is similar to the English custom of using "stats" for "statistics". However, this phenomenon is much more widespread and regular in Chinese. Elastic word length has recently received quite a bit of attention in the Chinese linguistics literature, in no small part due to the work done by our very own San Duanmu on this topic.
In Phondi this Friday, graduate student, Yan Dong will present some of her recent research on this topic. Yan's main focus is on how the shorter form is created - is the short form taken from the left or right side of the long form? She explores this question by looking at "family size" (related to the concept of "lexical neighborhood"), using data from Google's Ngram . The title and abstract of Yan's presentation is given below. Phondi meets in Friday (11/9) at 1 pm, in Lorch Hall 403.
Family size and elastic word length in Chinese
Many Chinese words have elastic length (long and short forms), such as mei (coal) and meitan (coal-coal), ya (duck) and yazi (duck-affix), dian (electricity) and dianshi (electricity-vision). Short forms can be either the right or the left-side member in the compounds. For example, rong (melt) is both the short form for ronghua (melt-melt) and jinrong (gold-melt).
This raises a question: what determines which member is deleted in the short form? I pursue the hypothesis that “informativity” influences the choice of the short form and investigated one measure of informativity, family size, in large corpora. Family size refers to compounds that share the same left or right member, such as ronghua, ronghe which share the left-side member; fanrong, guangrong which share the same right-side member. Since words with smaller family size carry less information of the compound, I expect that words with smaller family size to be deleted in the short form. To investigate this hypothesis, I extracted all senses that have equivalent disyllabic forms from 3000 senses of monosyllabic words in Modern Chinese Dictionary (2005). Family size was first calculated based on the character, then recalculated based on the pronunciation for comparison.
Another type of family size that may be crucial for member selection is the family size of the members at the time of the creation of the compound. This could be checked with Google Ngram that provides the first occurrence of the members and the compound, as well as their frequency by year. This calculation is expected to be more accurate in predicting the choice of the short form.