Are pre-trained text representations useful for multilingual and multi-dimensional language proficiency modeling?

02/25/2021
by   Taraka Rama, et al.
0

Development of language proficiency models for non-native learners has been an active area of interest in NLP research for the past few years. Although language proficiency is multidimensional in nature, existing research typically considers a single "overall proficiency" while building models. Further, existing approaches also considers only one language at a time. This paper describes our experiments and observations about the role of pre-trained and fine-tuned multilingual embeddings in performing multi-dimensional, multilingual language proficiency classification. We report experiments with three languages – German, Italian, and Czech – and model seven dimensions of proficiency ranging from vocabulary control to sociolinguistic appropriateness. Our results indicate that while fine-tuned embeddings are useful for multilingual proficiency modeling, none of the features achieve consistently best performance for all dimensions of language proficiency. All code, data and related supplementary material can be found at: https://github.com/nishkalavallabhi/MultidimCEFRScoring.

READ FULL TEXT

page 6

page 7

research
04/03/2020

Testing pre-trained Transformer models for Lithuanian news clustering

A recent introduction of Transformer deep learning architecture made bre...
research
03/24/2021

Czert – Czech BERT-like Model for Language Representation

This paper describes the training process of the first Czech monolingual...
research
09/15/2022

ÚFAL CorPipe at CRAC 2022: Effectivity of Multilingual Models for Coreference Resolution

We describe the winning submission to the CRAC 2022 Shared Task on Multi...
research
04/18/2022

Exploring Dimensionality Reduction Techniques in Multilingual Transformers

Both in scientific literature and in industry,, Semantic and context-awa...
research
06/20/2022

Square One Bias in NLP: Towards a Multi-Dimensional Exploration of the Research Manifold

The prototypical NLP experiment trains a standard architecture on labele...
research
07/06/2023

Performance Comparison of Pre-trained Models for Speech-to-Text in Turkish: Whisper-Small and Wav2Vec2-XLS-R-300M

In this study, the performances of the Whisper-Small and Wav2Vec2-XLS-R-...
research
04/06/2017

MRA - Proof of Concept of a Multilingual Report Annotator Web Application

MRA (Multilingual Report Annotator) is a web application that translates...

Please sign up or login with your details

Forgot password? Click here to reset