GLUECoS : An Evaluation Benchmark for Code-Switched NLP

by   Simran Khanuja, et al.

Code-switching is the use of more than one language in the same conversation or utterance. Recently, multilingual contextual embedding models, trained on multiple monolingual corpora, have shown promising results on cross-lingual and multilingual tasks. We present an evaluation benchmark, GLUECoS, for code-switched languages, that spans several NLP tasks in English-Hindi and English-Spanish. Specifically, our evaluation benchmark includes Language Identification from text, POS tagging, Named Entity Recognition, Sentiment Analysis, Question Answering and a new task for code-switching, Natural Language Inference. We present results on all these tasks using cross-lingual word embedding models and multilingual models. In addition, we fine-tune multilingual models on artificially generated code-switched data. Although multilingual models perform significantly better than cross-lingual models, our results show that in most tasks, across both language pairs, multilingual models fine-tuned on code-switched data perform best, showing that multilingual models can be further optimized for code-switching tasks.



There are no comments yet.


page 4

page 7


Are Multilingual Models Effective in Code-Switching?

Multilingual language models have shown decent performance in multilingu...

mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs

Multilingual T5 (mT5) pretrains a sequence-to-sequence model on massive ...

LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation

Recent trends in NLP research have raised an interest in linguistic code...

Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition

In countries that speak multiple main languages, mixing up different lan...

Call Larisa Ivanovna: Code-Switching Fools Multilingual NLU Models

Practical needs of developing task-oriented dialogue assistants require ...

The Effectiveness of Intermediate-Task Training for Code-Switched Natural Language Understanding

While recent benchmarks have spurred a lot of new work on improving the ...

SML: a new Semantic Embedding Alignment Transformer for efficient cross-lingual Natural Language Inference

The ability of Transformers to perform with precision a variety of tasks...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.