GLUECoS : An Evaluation Benchmark for Code-Switched NLP

04/26/2020
by   Simran Khanuja, et al.
0

Code-switching is the use of more than one language in the same conversation or utterance. Recently, multilingual contextual embedding models, trained on multiple monolingual corpora, have shown promising results on cross-lingual and multilingual tasks. We present an evaluation benchmark, GLUECoS, for code-switched languages, that spans several NLP tasks in English-Hindi and English-Spanish. Specifically, our evaluation benchmark includes Language Identification from text, POS tagging, Named Entity Recognition, Sentiment Analysis, Question Answering and a new task for code-switching, Natural Language Inference. We present results on all these tasks using cross-lingual word embedding models and multilingual models. In addition, we fine-tune multilingual models on artificially generated code-switched data. Although multilingual models perform significantly better than cross-lingual models, our results show that in most tasks, across both language pairs, multilingual models fine-tuned on code-switched data perform best, showing that multilingual models can be further optimized for code-switching tasks.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 4

page 7

03/24/2021

Are Multilingual Models Effective in Code-Switching?

Multilingual language models have shown decent performance in multilingu...
04/18/2021

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive ...
05/09/2020

LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation

Recent trends in NLP research have raised an interest in linguistic code...
09/18/2019

Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition

In countries that speak multiple main languages, mixing up different lan...
09/29/2021

Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models

Practical needs of developing task-oriented dialogue assistants require ...
07/21/2021

The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding

While recent benchmarks have spurred a lot of new work on improving the ...
03/17/2021

SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference

The ability of Transformers to perform with precision a variety of tasks...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.