Orthographic Syllable as basic unit for SMT between Related Languages

10/03/2016
by   Anoop Kunchukuttan, et al.
0

We explore the use of the orthographic syllable, a variable-length consonant-vowel sequence, as a basic unit of translation between related languages which use abugida or alphabetic scripts. We show that orthographic syllable level translation significantly outperforms models trained over other basic units (word, morpheme and character) when training over small parallel corpora.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/20/2016

Learning variable length units for SMT between related languages via Byte Pair Encoding

We explore the use of segments learnt using Byte Pair Encoding (referred...
research
11/01/2016

Faster decoding for subword level Phrase-based SMT between related languages

A common and effective way to train translation systems between related ...
research
02/23/2017

Utilizing Lexical Similarity between Related, Low-resource Languages for Pivot-based SMT

We investigate pivot-based translation between related languages in a lo...
research
10/19/2018

Mainumby: un Ayudante para la Traducción Castellano-Guaraní

A wide range of applications play an important role in the daily work of...
research
12/05/2015

PJAIT Systems for the IWSLT 2015 Evaluation Campaign Enhanced by Comparable Corpora

In this paper, we attempt to improve Statistical Machine Translation (SM...
research
06/15/2016

Agreement-based Learning of Parallel Lexicons and Phrases from Non-Parallel Corpora

We introduce an agreement-based approach to learning parallel lexicons a...
research
10/12/2020

Post-Training BatchNorm Recalibration

We revisit non-blocking simultaneous multithreading (NB-SMT) introduced ...

Please sign up or login with your details

Forgot password? Click here to reset