Bayesian multilingual topic model for zero-shot cross-lingual topic identification

07/02/2020
by   Santosh Kesiraju, et al.
0

This paper presents a Bayesian multilingual topic model for learning language-independent document embeddings. Our model learns to represent the documents in the form of Gaussian distributions, thereby encoding the uncertainty in its covariance. We propagate the learned uncertainties through linear classifiers for zero-shot cross-lingual topic identification. Our experiments on 5 language Europarl and Reuters (MLDoc) corpora show that the proposed model outperforms multi-lingual word embedding and BiLSTM sentence encoder based systems with significant margins in the majority of the transfer directions. Moreover, our system trained under a single day on a single GPU with much lower amounts of data performs competitively as compared to the state-of-the-art universal BiLSTM sentence encoder trained on 93 languages. Our experimental analysis shows that the amount of parallel data improves the overall performance of embeddings. Nonetheless, exploiting the uncertainties is always beneficial.

READ FULL TEXT

page 7

page 10

research
12/26/2018

Massively Multilingual Sentence Embeddings for Zero-Shot Cross-Lingual Transfer and Beyond

We introduce an architecture to learn joint multilingual sentence repres...
research
08/20/2019

Learning document embeddings along with their uncertainties

Majority of the text modelling techniques yield only point estimates of ...
research
11/15/2022

Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

This paper presents M3L-Contrast – a novel multimodal multilingual (M3L)...
research
04/16/2020

Cross-lingual Contextualized Topic Models with Zero-shot Learning

Many data sets in a domain (reviews, forums, news, etc.) exist in parall...
research
10/15/2021

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

We present a multilingual bag-of-entities model that effectively boosts ...
research
03/03/2021

Zero-Shot Cross-Lingual Dependency Parsing through Contextual Embedding Transformation

Linear embedding transformation has been shown to be effective for zero-...
research
12/01/2021

Zero-Shot Cross-Lingual Machine Reading Comprehension via Inter-Sentence Dependency Graph

We target the task of cross-lingual Machine Reading Comprehension (MRC) ...

Please sign up or login with your details

Forgot password? Click here to reset