Towards the Extraction of Language Features for Assessing the Quality of Students' Academic Texts

Constructing academic texts poses a significant challenge, particularly for students in their first year of university education. It is equally challenging for educators to identify the strengths and weaknesses within these texts and to have some data to help in the evaluation process. This research aims to explore language complexity measures that can be employed for grading scientific texts written in Spanish by undergraduate students and to probe if these measures could be reliable for grading automatically these texts. To achieve this, we conducted a comprehensive analysis of conference proceedings produced by 223 undergraduate students enrolled at the University of the Basque Country (UPV/EHU) in the subject "Development of Communicative Competence" between the academic years 2020 and 2023. After the creation and annotation of the corpus, we conducted an automatic analysis of 330 language features using the Spanish version of CTAP and applied these features to WEKA to investigate various predictive models for the search of language measures and the automatic grading of student texts. The latter allowed us to utilize the best 10 language features selected through attribute evaluators. Our evaluation focused on two annotated key variables: lexicon grade and overall grade. The nominal data-based results ranged from 56.5% to 76.79%, while numeric data-based results varied from 0.235 to 0.533. Notably, the 10 most relevant features improved the results across various machine learning algorithms, outperforming the results obtained with 330 features. This research contributes to a deeper understanding of the most pertinent language complexity features for enhancing writing instruction.
Egileak (ixakideak): 
Egileak: 
Mari Mar Boillos, Unai Atutxa, Mikel Iruskieta
Urtea: 
2024
Artikuluaren erreferentzia: 
Wärnsby, A., & Nolan, J. S. (2024). Research Literacy Development in Teacher Education Programmes in the Nordic-Baltic Region. In SIG Writing Conference, 26-28 June 2024, Paris, France..

Argitalpen mota fina (argitalpen_sailkapen_ohia):

Kongresuaren balorazioa: