A Survey of Orthographic Information in Machine Translation

Machine translation is one of the applications of natural language processing which has been explored in different languages. Recently researchers started paying attention towards machine translation for resource-poor languages and closely related languages. A widespread and underlying problem for these machine translation systems is the variation in orthographic conventions which causes many issues to traditional approaches. Two languages written in two different orthographies are not easily comparable, but orthographic information can also be used to improve the machine translation system. This article offers a survey of research regarding orthography's influence on machine translation of under-resourced languages. It introduces under-resourced languages in terms of machine translation and how orthographic information can be utilised to improve machine translation. We describe previous work in this area, discussing what underlying assumptions were made, and showing how orthographic knowledge improves the performance of machine translation of under-resourced languages. We discuss different types of machine translation and demonstrate a recent trend that seeks to link orthographic information with well-established machine translation methods. Considerable attention is given to current efforts of cognates information at different levels of machine translation and the lessons that can be drawn from this. Additionally, multilingual neural machine translation of closely related languages is given a particular focus in this survey. This article ends with a discussion of the way forward in machine translation with orthographic information, focusing on multilingual settings and bilingual lexicon induction.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2021

Extremely low-resource machine translation for closely related languages

An effective method to improve extremely low-resource neural machine tra...
research
06/11/2019

A Focus on Neural Machine Translation for African Languages

African languages are numerous, complex and low-resourced. The datasets ...
research
10/15/2021

Breaking Down Multilingual Machine Translation

While multilingual training is now an essential ingredient in machine tr...
research
11/26/2017

Machine Translation Using Semantic Web Technologies: A Survey

A large number of machine translation approaches has been developed rece...
research
12/28/2018

Machine Translation: A Literature Review

Machine translation (MT) plays an important role in benefiting linguists...
research
05/19/2023

HalOmi: A Manually Annotated Benchmark for Multilingual Hallucination and Omission Detection in Machine Translation

Hallucinations in machine translation are translations that contain info...
research
07/14/2022

Open Terminology Management and Sharing Toolkit for Federation of Terminology Databases

Consolidated access to current and reliable terms from different subject...

Please sign up or login with your details

Forgot password? Click here to reset