InfoCTM: A Mutual Information Maximization Perspective of Cross-Lingual Topic Modeling

04/07/2023
by   Xiaobao Wu, et al.
0

Cross-lingual topic models have been prevalent for cross-lingual text analysis by revealing aligned latent topics. However, most existing methods suffer from producing repetitive topics that hinder further analysis and performance decline caused by low-coverage dictionaries. In this paper, we propose the Cross-lingual Topic Modeling with Mutual Information (InfoCTM). Instead of the direct alignment in previous work, we propose a topic alignment with mutual information method. This works as a regularization to properly align topics and prevent degenerate topic representations of words, which mitigates the repetitive topic issue. To address the low-coverage dictionary issue, we further propose a cross-lingual vocabulary linking method that finds more linked cross-lingual words for topic alignment beyond the translations of a given dictionary. Extensive experiments on English, Chinese, and Japanese datasets demonstrate that our method outperforms state-of-the-art baselines, producing more coherent, diverse, and well-aligned topics and showing better transferability for cross-lingual classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2014

Coarse-grained Cross-lingual Alignment of Comparable Texts with Topic Models and Encyclopedic Knowledge

We present a method for coarse-grained cross-lingual alignment of compar...
research
03/11/2021

Towards Multi-Sense Cross-Lingual Alignment of Contextual Embeddings

Cross-lingual word embeddings (CLWE) have been proven useful in many cro...
research
09/19/2018

Unsupervised cross-lingual matching of product classifications

Unsupervised cross-lingual embeddings mapping has provided a unique tool...
research
10/10/2019

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

Learning multilingual representations of text has proven a successful me...
research
04/03/2023

Efficiently Aligned Cross-Lingual Transfer Learning for Conversational Tasks using Prompt-Tuning

Cross-lingual transfer of language models trained on high-resource langu...
research
05/20/2023

Cross2StrA: Unpaired Cross-lingual Image Captioning with Cross-lingual Cross-modal Structure-pivoted Alignment

Unpaired cross-lingual image captioning has long suffered from irrelevan...
research
12/07/2020

Diverse Melody Generation from Chinese Lyrics via Mutual Information Maximization

In this paper, we propose to adapt the method of mutual information maxi...

Please sign up or login with your details

Forgot password? Click here to reset