A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

10/15/2021
by   Sosuke Nishikawa, et al.
0

We present a multilingual bag-of-entities model that effectively boosts the performance of zero-shot cross-lingual text classification by extending a multilingual pre-trained language model (e.g., M-BERT). It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. This enables entities described in multiple languages to be represented using shared embeddings. A model trained on entity features in a resource-rich language can thus be directly applied to other languages. Our experimental results on cross-lingual topic classification (using the MLDoc and TED-CLDC datasets) and entity typing (using the SHINRA2020-ML dataset) show that the proposed model consistently outperforms state-of-the-art models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/28/2022

Cross-Lingual Text Classification with Multilingual Distillation and Zero-Shot-Aware Training

Multilingual pre-trained language models (MPLMs) not only can handle tas...
research
03/15/2022

Multilingual Generative Language Models for Zero-Shot Cross-Lingual Event Argument Extraction

We present a study on leveraging multilingual pre-trained generative lan...
research
11/15/2022

Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

This paper presents M3L-Contrast – a novel multimodal multilingual (M3L)...
research
10/15/2021

Cross-Lingual Fine-Grained Entity Typing

The growth of cross-lingual pre-trained models has enabled NLP tools to ...
research
10/06/2020

The Multilingual Amazon Reviews Corpus

We present the Multilingual Amazon Reviews Corpus (MARC), a large-scale ...
research
07/02/2020

Bayesian multilingual topic model for zero-shot cross-lingual topic identification

This paper presents a Bayesian multilingual topic model for learning lan...
research
05/31/2022

EMS: Efficient and Effective Massively Multilingual Sentence Representation Learning

Massively multilingual sentence representation models, e.g., LASER, SBER...

Please sign up or login with your details

Forgot password? Click here to reset