Learning synchronous context-free grammars with multiple specialised non-terminals for hierarchical phrase-based translation

Translation models based on hierarchical phrase-based statistical machine translation (HSMT) have shown better performances than the non-hierarchical phrase-based counterparts for some language pairs. The standard approach to HSMT learns and apply a synchronous context-free grammar with a single non-terminal. The hypothesis behind the grammar refinement algorithm presented in this work is that this single non-terminal is overloaded, and insufficiently discriminative, and therefore, an adequate split of it into more specialised symbols could lead to improved models. This paper presents a method to learn synchronous context-free grammars with a huge number of initial non-terminals, which are then grouped via a clustering algorithm. Our experiments show that the resulting smaller set of non-terminals correctly capture the contextual information that makes it possible to statistically significantly improve the BLEU score of the standard HSMT approach.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2015

Beyond Word-based Language Model in Statistical Machine Translation

Language model is one of the most important modules in statistical machi...
research
11/15/2022

Hierarchical Phrase-based Sequence-to-Sequence Learning

We describe a neural transducer that maintains the flexibility of standa...
research
04/09/2021

Design and Implementation of English To Yoruba Verb Phrase Machine Translation System

We aim to develop an English to Yoruba machine translation system which ...
research
09/20/2017

On the Use of Machine Translation-Based Approaches for Vietnamese Diacritic Restoration

This paper presents an empirical study of two machine translation-based ...
research
07/11/2017

Multiple Context-Free Tree Grammars: Lexicalization and Characterization

Multiple (simple) context-free tree grammars are investigated, where "si...
research
02/28/2015

Non-linear Learning for Statistical Machine Translation

Modern statistical machine translation (SMT) systems usually use a linea...
research
03/09/2015

Context-Dependent Translation Selection Using Convolutional Neural Network

We propose a novel method for translation selection in statistical machi...

Please sign up or login with your details

Forgot password? Click here to reset