Multilingual Topic Models

12/18/2017
by   Kriste Krstovski, et al.
0

Scientific publications have evolved several features for mitigating vocabulary mismatch when indexing, retrieving, and computing similarity between articles. These mitigation strategies range from simply focusing on high-value article sections, such as titles and abstracts, to assigning keywords, often from controlled vocabularies, either manually or through automatic annotation. Various document representation schemes possess different cost-benefit tradeoffs. In this paper, we propose to model different representations of the same article as translations of each other, all generated from a common latent representation in a multilingual topic model. We start with a methodological overview on latent variable models for parallel document representations that could be used across many information science tasks. We then show how solving the inference problem of mapping diverse representations into a shared topic space allows us to evaluate representations based on how topically similar they are to the original article. In addition, our proposed approach provides means to discover where different concept vocabularies require improvement.

READ FULL TEXT

page 5

page 7

page 10

page 13

research
05/14/2019

Multilingual Factor Analysis

In this work we approach the task of learning multilingual word represen...
research
02/09/2022

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Topic models have been the prominent tools for automatic topic discovery...
research
10/13/2018

Understanding Crosslingual Transfer Mechanisms in Probabilistic Topic Modeling

Probabilistic topic modeling is a popular choice as the first step of cr...
research
09/19/2018

Latent Topic Conversational Models

Latent variable models have been a preferred choice in conversational mo...
research
04/11/2012

Concept Modeling with Superwords

In information retrieval, a fundamental goal is to transform a document ...
research
10/26/2022

ProSiT! Latent Variable Discovery with PROgressive SImilarity Thresholds

The most common ways to explore latent document dimensions are topic mod...
research
11/02/2022

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

The cornerstone of multilingual neural translation is shared representat...

Please sign up or login with your details

Forgot password? Click here to reset