Translating Hanja historical documents to understandable Korean and English

05/20/2022
by   Juhee Son, et al.
8

The Annals of Joseon Dynasty (AJD) contain the daily records of the Kings of Joseon, the 500-year kingdom preceding the modern nation of Korea. The Annals were originally written in an archaic Korean writing system, `Hanja', and translated into Korean from 1968 to 1993. However, this translation was literal and contained many archaic Korean words; thus, a new expert translation effort began in 2012, completing the records of only one king in a decade. Also, expert translators are working on an English translation, of which only one king's records are available because of the high cost and slow progress. Thus, we propose H2KE, the neural machine translation model that translates Hanja historical documents to understandable Korean and English. Based on the multilingual neural machine translation approach, it translates the historical document written in Hanja, using both the full dataset of outdated Korean translation and a small dataset of recently translated Korean and English. We compare our method with two baselines: one is a recent model that simultaneously learns to restore and translate Hanja historical document and the other is the transformer that trained on newly translated corpora only. The results show that our method significantly outperforms the baselines in terms of BLEU score in both modern Korean and English translations. We also conduct a human evaluation that shows that our translation is preferred over the original expert translation.

READ FULL TEXT

page 6

page 8

research
04/13/2021

Restoring and Mining the Records of the Joseon Dynasty via Neural Language Modeling and Machine Translation

Understanding voluminous historical records provides clues on the past i...
research
08/01/2019

JUCBNMT at WMT2018 News Translation Task: Character Based Neural Machine Translation of Finnish to English

In the current work, we present a description of the system submitted to...
research
10/16/2019

Using Whole Document Context in Neural Machine Translation

In Machine Translation, considering the document as a whole can help to ...
research
07/01/2019

Modernizing Historical Documents: a User Study

Accessibility to historical documents is mostly limited to scholars. Thi...
research
10/08/2019

An Interactive Machine Translation Framework for Modernizing Historical Documents

Due to the nature of human language, historical documents are hard to co...
research
02/02/2021

Two Demonstrations of the Machine Translation Applications to Historical Documents

We present our demonstration of two machine translation applications to ...
research
11/09/2015

A Century of Portraits: A Visual Historical Record of American High School Yearbooks

Many details about our world are not captured in written records because...

Please sign up or login with your details

Forgot password? Click here to reset