GLUECoS : An Evaluation Benchmark for Code-Switched NLP

04/26/2020
by   Simran Khanuja, et al.
0

Code-switching is the use of more than one language in the same conversation or utterance. Recently, multilingual contextual embedding models, trained on multiple monolingual corpora, have shown promising results on cross-lingual and multilingual tasks. We present an evaluation benchmark, GLUECoS, for code-switched languages, that spans several NLP tasks in English-Hindi and English-Spanish. Specifically, our evaluation benchmark includes Language Identification from text, POS tagging, Named Entity Recognition, Sentiment Analysis, Question Answering and a new task for code-switching, Natural Language Inference. We present results on all these tasks using cross-lingual word embedding models and multilingual models. In addition, we fine-tune multilingual models on artificially generated code-switched data. Although multilingual models perform significantly better than cross-lingual models, our results show that in most tasks, across both language pairs, multilingual models fine-tuned on code-switched data perform best, showing that multilingual models can be further optimized for code-switching tasks.

READ FULL TEXT

page 4

page 7

research
03/24/2021

Are Multilingual Models Effective in Code-Switching?

Multilingual language models have shown decent performance in multilingu...
research
04/29/2022

Polyglot Prompt: Multilingual Multitask PrompTraining

This paper aims for a potential architectural breakthrough for multiling...
research
03/19/2021

MuRIL: Multilingual Representations for Indian Languages

India is a multilingual society with 1369 rationalized languages and dia...
research
09/18/2019

Hierarchical Meta-Embeddings for Code-Switching Named Entity Recognition

In countries that speak multiple main languages, mixing up different lan...
research
08/27/2021

Code-switched inspired losses for generic spoken dialog representations

Spoken dialog systems need to be able to handle both multiple languages ...
research
12/14/2014

Recurrent-Neural-Network for Language Detection on Twitter Code-Switching Corpus

Mixed language data is one of the difficult yet less explored domains of...
research
05/09/2020

LinCE: A Centralized Benchmark for Linguistic Code-switching Evaluation

Recent trends in NLP research have raised an interest in linguistic code...

Please sign up or login with your details

Forgot password? Click here to reset