Measuring Language Distance of Isolated European Languages

Phylogenetics is a sub-field of historical linguistics whose aim is to classify a group of
languages by considering their distances within a rooted tree that stands for their historical evolution.
A few European languages do not belong to the Indo-European family or are otherwise isolated
in the European rooted tree. Although it is not possible to establish phylogenetic links using basic
strategies, it is possible to calculate the distances between these isolated languages and the rest using
simple corpus-based techniques and natural language processing methods. The objective of this
article is to select some isolated languages and measure the distance between them and from the
other European languages, so as to shed light on the linguistic distances and proximities of these
controversial languages without considering phylogenetic issues. The experiments were carried out
with 40 European languages including six languages that are isolated in their corresponding families:
Albanian, Armenian, Basque, Georgian, Greek, and Hungarian.

Authors (IXA members): 
Authors: 
Pablo Gamallo José Ramom Pichel and Iñaki Alegria

Publication topic:

Year: 
2020
Publication place: 

MDPI Information 2020, 11(4), 181 doi:10.3390/info11040181

ISBN: 
doi:10.3390/info11040181

Publication type:

Publication clasification:

HiTZeko zein jakintza arlotako argitalpena izango litzazteke?: