Data Augmentation and Terminology Integration for Domain-Specific Sinhala-English-Tamil Statistical Machine Translation

11/05/2020
by   Aloka Fernando, et al.
0

Out of vocabulary (OOV) is a problem in the context of Machine Translation (MT) in low-resourced languages. When source and/or target languages are morphologically rich, it becomes even worse. Bilingual list integration is an approach to address the OOV problem. This allows more words to be translated than are in the training data. However, since bilingual lists contain words in the base form, it will not translate inflected forms for morphologically rich languages such as Sinhala and Tamil. This paper focuses on data augmentation techniques where bilingual lexicon terms are expanded based on case-markers with the objective of generating new words, to be used in Statistical machine Translation (SMT). This data augmentation technique for dictionary terms shows improved BLEU scores for Sinhala-English SMT.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/24/2016

Statistical Machine Translation for Indian Languages: Mission Hindi

This paper discusses Centre for Development of Advanced Computing Mumbai...
01/25/2021

Facilitating Terminology Translation with Target Lemma Annotations

Most of the recent work on terminology integration in machine translatio...
03/31/2021

Few-shot learning through contextual data augmentation

Machine translation (MT) models used in industries with constantly chang...
04/05/2018

Domain Adaptation for Statistical Machine Translation

Statistical machine translation (SMT) systems perform poorly when it is ...
07/01/2021

Zero-pronoun Data Augmentation for Japanese-to-English Translation

For Japanese-to-English translation, zero pronouns in Japanese pose a ch...
04/07/2020

Re-translation versus Streaming for Simultaneous Translation

There has been great progress in improving streaming machine translation...
03/25/2017

Simplifying the Bible and Wikipedia Using Statistical Machine Translation

I started this work with the hope of generating a text synthesizer (like...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.