Computational linguistics, also known as natural language processing, encompasses three distinct areas of study: (1) human language technologies, such as machine translation, information extraction, or spoken language dialogue systems; (2) computational models of language users, either human (computational psycholinguistics) or artifactual (artificial intelligence); and (3) digital linguistics, which is the use of computation in support of language documentation and linguistic research.
For all three areas of study, there is a common body of fundamentals. This class introduces those fundamentals. One focus is the processing pipeline involved in natural language understanding, particularly part of speech tagging, parsing, and semantic interpretation. A second focus is the extraction of linguistic information from text corpora; we will touch on collocations, language models, regular expressions, and text classification.
The approach will be very hands-on. We will use the Natural Language Toolkit, in the Python programming language.
Intended Audience:
The course does not assume prior experience with Python, nor does it assume a computer science background. It is particularly intended for language, linguistics, psychology, and cognitive science students. However, some prior experience with programming is essential - this course is not an introduction to programming.