Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data

12/21/2016
by   Tengfei Ma, et al.
0

Topic models have been successfully applied in lexicon extraction. However, most previous methods are limited to document-aligned data. In this paper, we try to address two challenges of applying topic models to lexicon extraction in non-parallel data: 1) hard to model the word relationship and 2) noisy seed dictionary. To solve these two challenges, we propose two new bilingual topic models to better capture the semantic information of each word while discriminating the multiple translations in a noisy seed dictionary. We extend the scope of topic models by inverting the roles of "word" and "document". In addition, to solve the problem of noise in seed dictionary, we incorporate the probability of translation selection in our models. Moreover, we also propose an effective measure to evaluate the similarity of words in different languages and select the optimal translation pairs. Experimental results using real world data demonstrate the utility and efficacy of the proposed models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/24/2015

Bilingual Distributed Word Representations from Document-Aligned Comparable Data

We propose a new model for learning bilingual word representations from ...
research
11/20/2021

Weakly Supervised Prototype Topic Model with Discriminative Seed Words: Modifying the Category Prior by Self-exploring Supervised Signals

Dataless text classification, i.e., a new paradigm of weakly supervised ...
research
12/19/2017

Unsupervised Word Mapping Using Structural Similarities in Monolingual Embeddings

Most existing methods of automatic bilingual dictionary induction rely o...
research
12/12/2022

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Instead of mining coherent topics from a given text corpus in a complete...
research
10/25/2021

Contrastive Learning for Neural Topic Model

Recent empirical studies show that adversarial topic models (ATM) can su...
research
08/05/2015

Topic Stability over Noisy Sources

Topic modelling techniques such as LDA have recently been applied to spe...
research
01/06/2019

Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

We address for the first time unsupervised training for a translation ta...

Please sign up or login with your details

Forgot password? Click here to reset