The futility of STILTs for the classification of lexical borrowings in Spanish

09/17/2021
by   Javier de la Rosa, et al.
0

The first edition of the IberLEF 2021 shared task on automatic detection of borrowings (ADoBo) focused on detecting lexical borrowings that appeared in the Spanish press and that have recently been imported into the Spanish language. In this work, we tested supplementary training on intermediate labeled-data tasks (STILTs) from part of speech (POS), named entity recognition (NER), code-switching, and language identification approaches to the classification of borrowings at the token level using existing pre-trained transformer-based language models. Our extensive experimental results suggest that STILTs do not provide any improvement over direct fine-tuning of multilingual models. However, multilingual models trained on small subsets of languages perform reasonably better than multilingual BERT but not as good as multilingual RoBERTa for the given dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2019

Multilingual Named Entity Recognition Using Pretrained Embeddings, Attention Mechanism and NCRF

In this paper we tackle multilingual named entity recognition task. We u...
research
04/27/2021

Named Entity Recognition and Linking Augmented with Large-Scale Structured Data

In this paper we describe our submissions to the 2nd and 3rd SlavNER Sha...
research
03/24/2021

Are Multilingual Models Effective in Code-Switching?

Multilingual language models have shown decent performance in multilingu...
research
04/29/2021

Let's Play Mono-Poly: BERT Can Reveal Words' Polysemy Level and Partitionability into Senses

Pre-trained language models (LMs) encode rich information about linguist...
research
02/22/2021

Evaluating Contextualized Language Models for Hungarian

We present an extended comparison of contextualized language models for ...
research
01/22/2021

Multilingual Pre-Trained Transformers and Convolutional NN Classification Models for Technical Domain Identification

In this paper, we present a transfer learning system to perform technica...
research
10/13/2020

X-FACTR: Multilingual Factual Knowledge Retrieval from Pretrained Language Models

Language models (LMs) have proven surprisingly successful at capturing f...

Please sign up or login with your details

Forgot password? Click here to reset