Bridging the domain gap in cross-lingual document classification

09/16/2019
by   Guokun Lai, et al.
0

The scarcity of labeled training data often prohibits the internationalization of NLP models to multiple languages. Recent developments in cross-lingual understanding (XLU) has made progress in this area, trying to bridge the language barrier using language universal representations. However, even if the language problem was resolved, models trained in one language would not transfer to another language perfectly due to the natural domain drift across languages and cultures. We consider the setting of semi-supervised cross-lingual understanding, where labeled data is available in a source language (English), but only unlabeled data is available in the target language. We combine state-of-the-art cross-lingual methods with recently proposed methods for weakly supervised learning such as unsupervised pre-training and unsupervised data augmentation to simultaneously close both the language gap and the domain gap in XLU. We show that addressing the domain gap is crucial. We improve over strong baselines and achieve a new state-of-the-art for cross-lingual document classification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/09/2022

Enhancing Cross-lingual Transfer by Manifold Mixup

Based on large-scale pre-trained multilingual representations, recent cr...
research
03/07/2017

Leveraging Large Amounts of Weakly Supervised Data for Multi-Language Sentiment Classification

This paper presents a novel approach for multi-lingual sentiment classif...
research
10/04/2019

Contrastive Language Adaptation for Cross-Lingual Stance Detection

We study cross-lingual stance detection, which aims to leverage labeled ...
research
04/13/2017

Cross-lingual and cross-domain discourse segmentation of entire documents

Discourse segmentation is a crucial step in building end-to-end discours...
research
08/03/2022

Cross-Lingual Knowledge Transfer for Clinical Phenotyping

Clinical phenotyping enables the automatic extraction of clinical condit...
research
03/24/2020

Cross-Lingual Adaptation Using Universal Dependencies

We describe a cross-lingual adaptation method based on syntactic parse t...
research
11/12/2021

Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR

We present a method for cross-lingual training an ASR system using absol...

Please sign up or login with your details

Forgot password? Click here to reset