Unsupervised Training for Large Vocabulary Translation Using Sparse Lexicon and Word Classes

01/06/2019
by   Yunsu Kim, et al.
0

We address for the first time unsupervised training for a translation task with hundreds of thousands of vocabulary words. We scale up the expectation-maximization (EM) algorithm to learn a large translation table without any parallel text or seed lexicon. First, we solve the memory bottleneck and enforce the sparsity with a simple thresholding scheme for the lexicon. Second, we initialize the lexicon training with word classes, which efficiently boosts the performance. Our methods produced promising results on two large-scale unsupervised translation tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/12/2020

Urdu-English Machine Transliteration using Neural Networks

Machine translation has gained much attention in recent years. It is a s...
research
01/06/2019

A Comparative Study on Vocabulary Reduction for Phrase Table Smoothing

This work systematically analyzes the smoothing effect of vocabulary red...
research
12/05/2014

On Using Very Large Target Vocabulary for Neural Machine Translation

Neural machine translation, a recently proposed approach to machine tran...
research
07/13/2017

Automatic Speech Recognition with Very Large Conversational Finnish and Estonian Vocabularies

Today, the vocabulary size for language models in large vocabulary speec...
research
06/12/2017

Attention-based Vocabulary Selection for NMT Decoding

Neural Machine Translation (NMT) models usually use large target vocabul...
research
07/26/2019

Unsupervised Learning Framework of Interest Point Via Properties Optimization

This paper presents an entirely unsupervised interest point training fra...
research
12/21/2016

Inverted Bilingual Topic Models for Lexicon Extraction from Non-parallel Data

Topic models have been successfully applied in lexicon extraction. Howev...

Please sign up or login with your details

Forgot password? Click here to reset