Itzulpengintza Automatikoa

Identification and translation of verb+noun multiword expressions: a Spanish-Basque study

This is a summary of the PhD thesis written by Uxoa Iñurrieta under the supervision of Dr. Gorka Labaka and Dr. Itziar Aduriz. Full title of the PhD thesis in Basque: "Izena+aditza Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala". The defense was held in San Sebastian on November 29, 2019. The doctoral committee was integrated by Ricardo Etxepare (Centre National de la Recherche Scientifique), Margarita Alonso (Universidad de Coruña) and Miren Azkarate (University of the Basque Country).

Verb+Noun Multiword Expressions: A linguistic analysis for identification and translation

Multiword Expressions (MWEs) are combinations of words which exhibit some kind of idiosyncrasy. Due to their idiosyncratic nature, they pose several problems to Natural Language Processing (NLP). In this PhD, two of the most challenging tasks concerning MWE processing are addressed: the automatic identification of MWE occurrences in corpora and their translation in Machine Translation (MT).

Aditza+izena Unitate Fraseologikoak gaztelaniatik euskarara: azterketa eta tratamendu konputazionala

Unitate Fraseologikoak (UFak) hizkuntzek bere-bereak dituzten hitz-konbinazio idiomatikoak dira. Hizkuntzaren Prozesamenduko (HPko) tresnek kalitatezko emaitzak izan ditzaten, beharrezkoa da halakoak ondo tratatzea, baina lan horrek hainbat zailtasun ditu; besteak beste, hitzez hitzeko itzulgarritasun eza. Tesi-lan honetan, aditza+izena motako UFen azterketa linguistiko bat egin dugu, halakoek HPren alorrean sortzen dituzten bi arazo garrantzitsuri aurre egiten laguntzeko: batetik, corpusetan UFak automatikoki identifikatzeari, eta bestetik, UF horiek gaztelaniaren eta euskararen

Leveraging SNOMED CT terms and relations for machine translation of clinical texts from Basque to Spanish

We present a method for machine translation of clinical texts without using bilingual clinical texts, leveraging the rich terminology and structure of the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT), which is considered the most comprehensive, multilingual clinical health care terminology collection in the world. We evaluate our method for Basque to Spanish translation, comparing the performance with and without using clinical domain resources.

Orriak

RSS - Itzulpengintza Automatikoa-rako harpidetza egin