Erasmus Mundus Master in Language
and Communication Technologies (LCT)

ooo

Language & communication technologies

University of the Basque Country

Corpus linguistics

In this course we will study the use of corpora in computational linguistics. We will start with a general introduction to the field of corpus linguistics and corpus based linguistics, including linguistic annotations and annotation schemas. We will then analyze different ways to extract information from corpora, such as collocation or keyword extraction, using both statistical and linguistic based approaches. In the end of the course we will study the XML language for corpous annotation. During the course the student will work with corpus in several languages.

Syllabus

Introduction to Corpus Linguistics
Corpus characteristics and types
Corpus examples
Corpus annotation
1. Usual marks and analysis levels
2. standards for linguistic representation (TEI, NAF, AWA)
XML

← program

Hizkuntzaren Azterketa eta Prozesamendua

Erasmus Mundus Master in Language and Communication Technologies (LCT)

Corpus linguistics

Syllabus

Erasmus Mundus Master in Language
and Communication Technologies (LCT)