Learning How to Translate North Korean through South Korean

01/27/2022
by   Hwichan Kim, et al.
0

South and North Korea both use the Korean language. However, Korean NLP research has focused on South Korean only, and existing NLP systems of the Korean language, such as neural machine translation (NMT) models, cannot properly handle North Korean inputs. Training a model using North Korean data is the most straightforward approach to solving this problem, but there is insufficient data to train NMT models. In this study, we create data for North Korean NMT models using a comparable corpus. First, we manually create evaluation data for automatic alignment and machine translation. Then, we investigate automatic alignment methods suitable for North Korean. Finally, we verify that a model trained by North Korean bilingual data without human annotation can significantly boost North Korean translation accuracy compared to existing South Korean models in zero-shot settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2019

Improved Zero-shot Neural Machine Translation via Ignoring Spurious Correlations

Zero-shot translation, translating between language pairs on which a Neu...
research
04/23/2018

A neural interlingua for multilingual machine translation

We incorporate an explicit neural interlingua into a multilingual encode...
research
06/11/2021

Towards User-Driven Neural Machine Translation

A good translation should not only translate the original content semant...
research
02/12/2021

Improving Zero-shot Neural Machine Translation on Language-specific Encoders-Decoders

Recently, universal neural machine translation (NMT) with shared encoder...
research
06/25/2019

Saliency-driven Word Alignment Interpretation for Neural Machine Translation

Despite their original goal to jointly learn to align and translate, Neu...
research
04/07/2020

Self-Induced Curriculum Learning in Neural Machine Translation

Self-supervised neural machine translation (SS-NMT) learns how to extrac...
research
02/25/2020

MuST-Cinema: a Speech-to-Subtitles corpus

Growing needs in localising audiovisual content in multiple languages th...

Please sign up or login with your details

Forgot password? Click here to reset