Improving Zero-shot Cross-lingual Transfer between Closely Related Languages by injecting Character-level Noise

09/14/2021
by   Noëmi Aepli, et al.
6

Cross-lingual transfer between a high-resource language and its dialects or closely related language varieties should be facilitated by their similarity, but current approaches that operate in the embedding space do not take surface similarity into account. In this work, we present a simple yet effective strategy to improve cross-lingual transfer between closely related varieties by augmenting the data of the high-resource parent language with character-level noise to make the model more robust towards spelling variations. Our strategy shows consistent improvements over several languages and tasks: Zero-shot transfer of POS tagging and topic identification between language varieties from the Germanic, Uralic, and Romance language genera. Our work provides evidence for the usefulness of simple surface-level noise in improving transfer between language varieties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2023

Utilizing Lexical Similarity to Enable Zero-Shot Machine Translation for Extremely Low-resource Languages

We address the task of machine translation from an extremely low-resourc...
research
03/30/2023

Fine-Tuning BERT with Character-Level Noise for Zero-Shot Transfer to Dialects and Closely-Related Languages

In this work, we induce character-level noise in various forms when fine...
research
04/20/2023

Does Manipulating Tokenization Aid Cross-Lingual Transfer? A Study on POS Tagging for Non-Standardized Languages

One of the challenges with finetuning pretrained language models (PLMs) ...
research
12/11/2020

Orthogonal Language and Task Adapters in Zero-Shot Cross-Lingual Transfer

Adapter modules, additional trainable parameters that enable efficient f...
research
06/14/2021

Modeling Profanity and Hate Speech in Social Media with Semantic Subspaces

Hate speech and profanity detection suffer from data sparsity, especiall...
research
04/16/2020

Cross-lingual Contextualized Topic Models with Zero-shot Learning

Many data sets in a domain (reviews, forums, news, etc.) exist in parall...
research
06/27/2022

Few-Shot Cross-Lingual TTS Using Transferable Phoneme Embedding

This paper studies a transferable phoneme embedding framework that aims ...

Please sign up or login with your details

Forgot password? Click here to reset