Congrats to Drs. Tyler, Zhang, & Szymanski
May 01, 2012
Title: Discourse Prosody in Production and Perception
A well-formed discourse is more than just a series of well-formed sentences. While often left implicit, this structure to discourse is sometimes overtly cued. And though most attention in this area has focused on lexicalized cues like discourse markers, prosody can also convey information about the structure of discourse. This dissertation explores the relationship between prosody and discourse in production and perception, helping to identify what information about the structure of discourse is in speakers’ prosody and what prosodic variation listeners use in discourse interpretation.
First, a production study examines prosodic correlates of discourse structure in readings of a newspaper article. Prosodic measures of pause duration, pitch, intensity and speech rate were correlated with discourse structural measures of boundary size, discourse coordination/subordination, and their interaction. Results showed significant correlations between the prosodic measures and both structural measures and their interaction. This interaction shows that the effect of boundary size on an utterance’s prosody often depends on whether that utterance is coordinated or subordinated, and vice versa.
Then, a series of perception studies examine the ability of synthesized manipulations of prosody to bias the interpretation of ambiguous discourse. For example, the discourse “I sat in on a history class. I read about housing prices. And I watched a cool documentary” could be interpreted as describing three separate, independent events (coordinated interpretation) or that the events of the second and third sentences took place during the event of the first (subordinated interpretation). Results show rising pitch at the end of the first sentence led to more coordinated interpretations compared to falling pitch.
These results are taken to suggest that one meaning for rising pitch is as a marker of discourse coordination. This proposal is motivated by research on listing intonation. The potentially contradictory claim by Pierrehumbert & Hirschberg (1990) that high terminal pitch indicates elaboration, a subordinating relation, is discussed and re-analyzed to bring their data in line with these results. Then, these results are discussed with respect to prosodic disambiguation of syntax, and comparisons are made between prosodic disambiguation of syntactic and discourse structures.
Title: A Comparison of Cue-Weighting in the Perception of Prosodic Phrase Boundaries in English and Chinese
Title: Morphological Inference from Bitext for Resource-Poor Languages
In my dissertation I describe an automated method for bitext discovery and a novel algorithm for morphological inference from grammatically-tagged word tokens. The goal of this work is to enable computational research on resource-poor languages: languages for which electronic data is scarce and for which natural language processing
(NLP) software is non-existent. To overcome data scarcity, I present an automated technique to detect and extract bitext from electronic documents using statistical methods for language identication and word alignment. These word alignments can then be combined with existing NLP tools to assign rich grammatical tags to the word tokens in a foreign language. My approach to morphology induction is based on minimizing the size of a paradigm-based morphological grammar of the language of study. The algorithm simultaneously segments wordforms into their component morphemes and organizes stems and affixes into a paradigmatic structure. Because it uses tagged tokens as its input, the morphemes that are produced by this induction method consist of strings paired with meanings, as represented by a set of features, overcoming one limitation of previous algorithms for unsupervised morphology based on monolingual text, which treat morphemes as strings of letters. Combined, these methods for collecting and analyzing bitext data offer a pathway for the automatic creation of richly-annotated corpora for resource-poor languages, requiring minimal amounts of data and minimal manual analysis.