In this course we will study the use of corpora in computational linguistics. We will start with a general introduction to the field of corpus linguistics and corpus based linguistics, including linguistic annotations and annotation schemas. We will then analyze different ways to extract information from corpora, such as collocation or keyword extraction, using both statistical and linguistic based approaches. In the end of the course we will study the XML language for corpous annotation. During the course the student will work with corpus in several languages.
Syllabus
Introduction to Corpus Linguistics
Corpus characteristics and types
Corpus examples
Corpus annotation
1. Usual marks and analysis levels
2. standards for linguistic representation (TEI, NAF, AWA)