Multilingual and Multimodal Topic Modelling with Pretrained Embeddings

11/15/2022
by   Elaine Zosa, et al.
0

This paper presents M3L-Contrast – a novel multimodal multilingual (M3L) neural topic model for comparable data that maps texts from multiple languages and images into a shared topic space. Our model is trained jointly on texts and images and takes advantage of pretrained document and image embeddings to abstract the complexities between different languages and modalities. As a multilingual topic model, it produces aligned language-specific topics and as multimodal model, it infers textual representations of semantic concepts in images. We demonstrate that our model is competitive with a zero-shot topic model in predicting topic distributions for comparable multilingual data and significantly outperforms a zero-shot model in predicting topic distributions for comparable texts and images. We also show that our model performs almost as well on unaligned embeddings as it does on aligned embeddings.

READ FULL TEXT
research
07/02/2020

Bayesian multilingual topic model for zero-shot cross-lingual topic identification

This paper presents a Bayesian multilingual topic model for learning lan...
research
10/15/2021

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

We present a multilingual bag-of-entities model that effectively boosts ...
research
08/13/2020

ANDES at SemEval-2020 Task 12: A jointly-trained BERT multilingual model for offensive language detection

This paper describes our participation in SemEval-2020 Task 12: Multilin...
research
06/04/2020

M3P: Learning Universal Representations via Multitask Multilingual Multimodal Pre-training

This paper presents a Multitask Multilingual Multimodal Pre-trained mode...
research
06/30/2022

Domain Adaptive Pretraining for Multilingual Acronym Extraction

This paper presents our findings from participating in the multilingual ...
research
05/23/2022

Artificial intelligence for topic modelling in Hindu philosophy: mapping themes between the Upanishads and the Bhagavad Gita

A distinct feature of Hindu religious and philosophical text is that the...
research
05/03/2021

Looking for COVID-19 misinformation in multilingual social media texts

This paper presents the Multilingual COVID-19 Analysis Method (CMTA) for...

Please sign up or login with your details

Forgot password? Click here to reset