Agreement-based Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora

06/15/2016
by   Chunyang Liu, et al.
0

We introduce an agreement-based approach to learning parallel lexicons and phrases from non-parallel corpora. The basic idea is to encourage two asymmetric latent-variable translation models (i.e., source-to-target and target-to-source) to agree on identifying latent phrase and word alignments. The agreement is defined at both word and phrase levels. We develop a Viterbi EM algorithm for jointly training the two unidirectional models efficiently. Experiments on the Chinese-English dataset show that agreement-based learning significantly improves both alignment and translation performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/12/2016

Morphological Constraints for Phrase Pivot Statistical Machine Translation

The lack of parallel data for many language pairs is an important challe...
research
10/08/2022

ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation

Recently, a new training oaxe loss has proven effective to ameliorate th...
research
12/01/2015

Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT

Pivot language is employed as a way to solve the data sparseness problem...
research
06/04/2018

Agreement-based Learning

Model selection is a problem that has occupied machine learning research...
research
08/30/2022

Combining keyphrase extraction and lexical diversity to characterize ideas in publication titles

Beyond bibliometrics, there is interest in characterizing the evolution ...
research
01/06/2022

Phrase-level Adversarial Example Generation for Neural Machine Translation

While end-to-end neural machine translation (NMT) has achieved impressiv...
research
10/03/2016

Orthographic Syllable as basic unit for SMT between Related Languages

We explore the use of the orthographic syllable, a variable-length conso...

Please sign up or login with your details

Forgot password? Click here to reset