The Impact of Preprocessing on Arabic-English Statistical and Neural Machine Translation

06/27/2019
by   Mai Oudah, et al.
0

Neural networks have become the state-of-the-art approach for machine translation (MT) in many languages. While linguistically-motivated tokenization techniques were shown to have significant effects on the performance of statistical MT, it remains unclear if those techniques are well suited for neural MT. In this paper, we systematically compare neural and statistical MT models for Arabic-English translation on data preprecossed by various prominent tokenization schemes. Furthermore, we consider a range of data and vocabulary sizes and compare their effect on both approaches. Our empirical results show that the best choice of tokenization scheme is largely based on the type of model and the size of data. We also show that we can gain significant improvements using a system selection that combines the output from neural and statistical MT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2018

Système de traduction automatique statistique Anglais-Arabe

Machine translation (MT) is the process of translating text written in a...
research
10/11/2022

Exploring Segmentation Approaches for Neural Machine Translation of Code-Switched Egyptian Arabic-English Text

Data sparsity is one of the main challenges posed by Code-switching (CS)...
research
02/16/2023

Evaluating and Improving the Coreference Capabilities of Machine Translation Models

Machine translation (MT) requires a wide range of linguistic capabilitie...
research
12/01/2015

LSTM Neural Reordering Feature for Statistical Machine Translation

Artificial neural networks are powerful models, which have been widely a...
research
08/03/2016

To Swap or Not to Swap? Exploiting Dependency Word Pairs for Reordering in Statistical Machine Translation

Reordering poses a major challenge in machine translation (MT) between t...
research
05/04/2018

Extreme Adaptation for Personalized Neural Machine Translation

Every person speaks or writes their own flavor of their native language,...
research
12/13/2019

Document Sub-structure in Neural Machine Translation

Current approaches to machine translation (MT) either translate sentence...

Please sign up or login with your details

Forgot password? Click here to reset