Cross-Lingual Text Classification with Minimal Resources by Transferring a Sparse Teacher

10/06/2020
by   Giannis Karamanolakis, et al.
0

Cross-lingual text classification alleviates the need for manually labeled documents in a target language by leveraging labeled documents from other languages. Existing approaches for transferring supervision across languages require expensive cross-lingual resources, such as parallel corpora, while less expensive cross-lingual representation learning approaches train classifiers without target labeled documents. In this work, we propose a cross-lingual teacher-student method, CLTS, that generates "weak" supervision in the target language using minimal cross-lingual resources, in the form of a small number of word translations. Given a limited translation budget, CLTS extracts and transfers only the most important task-specific seed words across languages and initializes a teacher classifier based on the translated seed words. Then, CLTS iteratively trains a more powerful student that also exploits the context of the seed words in unlabeled target documents and outperforms the teacher. CLTS is simple and surprisingly effective in 18 diverse languages: by transferring just 20 seed words, even a bag-of-words logistic regression student outperforms state-of-the-art cross-lingual methods (e.g., based on multilingual BERT). Moreover, CLTS can accommodate any type of student classifier: leveraging a monolingual BERT student leads to further improvements and outperforms even more expensive approaches by up to 12 emerging tasks in low-resource languages using just a small number of word translations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2021

Cross-lingual Text Classification with Heterogeneous Graph Neural Network

Cross-lingual text classification aims at training a classifier on the s...
research
11/26/2016

Structural Correspondence Learning for Cross-lingual Sentiment Classification with One-to-many Mappings

Structural correspondence learning (SCL) is an effective method for cros...
research
11/02/2018

Neural Task Representations as Weak Supervision for Model Agnostic Cross-Lingual Transfer

Natural language processing is heavily Anglo-centric, while the demand f...
research
07/06/2019

Best Practices for Learning Domain-Specific Cross-Lingual Embeddings

Cross-lingual embeddings aim to represent words in multiple languages in...
research
10/09/2014

BilBOWA: Fast Bilingual Distributed Representations without Word Alignments

We introduce BilBOWA (Bilingual Bag-of-Words without Alignments), a simp...
research
11/27/2018

Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision

Joint representation learning of words and entities benefits many NLP ta...
research
10/11/2020

Detecting Foodborne Illness Complaints in Multiple Languages Using English Annotations Only

Health departments have been deploying text classification systems for t...

Please sign up or login with your details

Forgot password? Click here to reset