Switch Point biased Self-Training: Re-purposing Pretrained Models for Code-Switching

11/01/2021
by   Parul Chopra, et al.
5

Code-switching (CS), a ubiquitous phenomenon due to the ease of communication it offers in multilingual communities still remains an understudied problem in language processing. The primary reasons behind this are: (1) minimal efforts in leveraging large pretrained multilingual models, and (2) the lack of annotated data. The distinguishing case of low performance of multilingual models in CS is the intra-sentence mixing of languages leading to switch points. We first benchmark two sequence labeling tasks – POS and NER on 4 different language pairs with a suite of pretrained models to identify the problems and select the best performing model, char-BERT, among them (addressing (1)). We then propose a self training method to repurpose the existing pretrained models using a switch-point bias by leveraging unannotated data (addressing (2)). We finally demonstrate that our approach performs well on both tasks by reducing the gap between the switch point performance while retaining the overall performance on two distinct language pairs in both the tasks. Our code is available here: https://github.com/PC09/EMNLP2021-Switch-Point-biased-Self-Training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/31/2019

Leveraging Pretrained Word Embeddings for Part-of-Speech Tagging of Code Switching Data

Linguistic Code Switching (CS) is a phenomenon that occurs when multilin...
research
08/29/2023

Shared Lexical Items as Triggers of Code Switching

Why do bilingual speakers code-switch (mix their two languages)? Among t...
research
03/25/2019

A Survey of Code-switched Speech and Language Processing

Code-switching, the alternation of languages within a conversation or ut...
research
10/06/2021

PSG@HASOC-Dravidian CodeMixFIRE2021: Pretrained Transformers for Offensive Language Identification in Tanglish

This paper describes the system submitted to Dravidian-Codemix-HASOC2021...
research
10/11/2022

Checks and Strategies for Enabling Code-Switched Machine Translation

Code-switching is a common phenomenon among multilingual speakers, where...
research
04/13/2021

Multilingual Transfer Learning for Code-Switched Language and Speech Neural Modeling

In this thesis, we address the data scarcity and limitations of linguist...
research
05/26/2023

Code-Switched Text Synthesis in Unseen Language Pairs

Existing efforts on text synthesis for code-switching mostly require tra...

Please sign up or login with your details

Forgot password? Click here to reset