Beyond Word-based Language Model in Statistical Machine Translation

02/05/2015
by   Jiajun Zhang, et al.
0

Language model is one of the most important modules in statistical machine translation and currently the word-based language model dominants this community. However, many translation models (e.g. phrase-based models) generate the target language sentences by rendering and compositing the phrases rather than the words. Thus, it is much more reasonable to model dependency between phrases, but few research work succeed in solving this problem. In this paper, we tackle this problem by designing a novel phrase-based language model which attempts to solve three key sub-problems: 1, how to define a phrase in language model; 2, how to determine the phrase boundary in the large-scale monolingual data in order to enlarge the training set; 3, how to alleviate the data sparsity problem due to the huge vocabulary size of phrases. By carefully handling these issues, the extensive experiments on Chinese-to-English translation show that our phrase-based language model can significantly improve the translation quality by up to +1.47 absolute BLEU score.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/21/2015

Phrase Based Language Model for Statistical Machine Translation: Empirical Study

Reordering is a challenge to machine translation (MT) systems. In MT, th...
research
05/22/2011

Correction of Noisy Sentences using a Monolingual Corpus

Correction of Noisy Natural Language Text is an important and well studi...
research
04/03/2020

Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation

Translation models based on hierarchical phrase-based statistical machin...
research
08/05/2016

Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine Translation

Out-of-vocabulary words account for a large proportion of errors in mach...
research
12/01/2015

Augmenting Phrase Table by Employing Lexicons for Pivot-based SMT

Pivot language is employed as a way to solve the data sparseness problem...
research
05/09/2017

Word and Phrase Translation with word2vec

Word and phrase tables are key inputs to machine translations, but costl...
research
09/10/2021

Euphemistic Phrase Detection by Masked Language Model

It is a well-known approach for fringe groups and organizations to use e...

Please sign up or login with your details

Forgot password? Click here to reset