DeepAI AI Chat
Log In Sign Up

Feature-based Decipherment for Large Vocabulary Machine Translation

by   Iftekhar Naim, et al.

Orthographic similarities across languages provide a strong signal for probabilistic decipherment, especially for closely related language pairs. The existing decipherment models, however, are not well-suited for exploiting these orthographic similarities. We propose a log-linear model with latent variables that incorporates orthographic similarity features. Maximum likelihood training is computationally expensive for the proposed log-linear model. To address this challenge, we perform approximate inference via MCMC sampling and contrastive divergence. Our results show that the proposed log-linear model with contrastive divergence scales to large vocabularies and outperforms the existing generative decipherment models by exploiting the orthographic features.


page 1

page 2

page 3

page 4


Building Morphological Chains for Agglutinative Languages

In this paper, we build morphological chains for agglutinative languages...

Learning Multi-grid Generative ConvNets by Minimal Contrastive Divergence

This paper proposes a minimal contrastive divergence method for learning...

Training Restricted Boltzmann Machine by Perturbation

A new approach to maximum likelihood learning of discrete graphical mode...

A two-stage estimation procedure for non-linear structural equation models

Applications of structural equation models (SEMs) are often restricted t...

Modeling outcomes of soccer matches

We compare various extensions of the Bradley-Terry model and a hierarchi...