MuLVE, A Multi-Language Vocabulary Evaluation Data Set

01/17/2022
by   Anik Jacobsen, et al.
0

Vocabulary learning is vital to foreign language learning. Correct and adequate feedback is essential to successful and satisfying vocabulary training. However, many vocabulary and language evaluation systems perform on simple rules and do not account for real-life user learning data. This work introduces Multi-Language Vocabulary Evaluation Data Set (MuLVE), a data set consisting of vocabulary cards and real-life user answers, labeled indicating whether the user answer is correct or incorrect. The data source is user learning data from the Phase6 vocabulary trainer. The data set contains vocabulary questions in German and English, Spanish, and French as target language and is available in four different variations regarding pre-processing and deduplication. We experiment to fine-tune pre-trained BERT language models on the downstream task of vocabulary evaluation with the proposed MuLVE data set. The results provide outstanding results of > 95.5 accuracy and F2-score. The data set is available on the European Language Grid.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/04/2022

Vocabulary Transfer for Medical Texts

Vocabulary transfer is a transfer learning subtask in which language mod...
research
05/24/2023

The Role of Output Vocabulary in T2T LMs for SPARQL Semantic Parsing

In this work, we analyse the role of output vocabulary for text-to-text ...
research
12/01/2022

CultureBERT: Fine-Tuning Transformer-Based Language Models for Corporate Culture

This paper introduces supervised machine learning to the literature meas...
research
02/27/2019

How Large a Vocabulary Does Text Classification Need? A Variational Approach to Vocabulary Selection

With the rapid development in deep learning, deep neural networks have b...
research
10/12/2020

Load What You Need: Smaller Versions of Multilingual BERT

Pre-trained Transformer-based models are achieving state-of-the-art resu...
research
04/16/2021

Broccoli: Sprinkling Lightweight Vocabulary Learning into Everyday Information Diets

The learning of a new language remains to this date a cognitive task tha...
research
07/27/2023

What Makes a Good Paraphrase: Do Automated Evaluations Work?

Paraphrasing is the task of expressing an essential idea or meaning in d...

Please sign up or login with your details

Forgot password? Click here to reset