Towards Effective Ancient Chinese Translation: Dataset, Model, and Evaluation

08/01/2023
by   Geyang Guo, et al.
0

Interpreting ancient Chinese has been the key to comprehending vast Chinese literature, tradition, and civilization. In this paper, we propose Erya for ancient Chinese translation. From a dataset perspective, we collect, clean, and classify ancient Chinese materials from various sources, forming the most extensive ancient Chinese resource to date. From a model perspective, we devise Erya training method oriented towards ancient Chinese. We design two jointly-working tasks: disyllabic aligned substitution (DAS) and dual masked language model (DMLM). From an evaluation perspective, we build a benchmark to judge ancient Chinese translation quality in different scenarios and evaluate the ancient Chinese translation capacities of various existing models. Our model exhibits remarkable zero-shot performance across five domains, with over +12.0 BLEU against GPT-3.5 models and better human evaluation results than ERNIE Bot. Subsequent fine-tuning further shows the superior transfer capability of Erya model with +6.2 BLEU gain. We release all the above-mentioned resources at https://github.com/RUCAIBox/Erya.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/16/2020

DiDi's Machine Translation System for WMT2020

This paper describes DiDi AI Labs' submission to the WMT2020 news transl...
research
07/15/2021

FewCLUE: A Chinese Few-shot Learning Evaluation Benchmark

Pretrained Language Models (PLMs) have achieved tremendous success in na...
research
11/08/2015

A Chinese POS Decision Method Using Korean Translation Information

In this paper we propose a method that imitates a translation expert usi...
research
08/31/2019

Generating Classical Chinese Poems from Vernacular Chinese

Classical Chinese poetry is a jewel in the treasure house of Chinese cul...
research
08/15/2023

VBD-MT Chinese-Vietnamese Translation Systems for VLSP 2022

We present our systems participated in the VLSP 2022 machine translation...
research
03/27/2023

Linguistically Informed ChatGPT Prompts to Enhance Japanese-Chinese Machine Translation: A Case Study on Attributive Clauses

In the field of Japanese-Chinese translation linguistics, the issue of c...
research
08/16/2023

RSpell: Retrieval-augmented Framework for Domain Adaptive Chinese Spelling Check

Chinese Spelling Check (CSC) refers to the detection and correction of s...

Please sign up or login with your details

Forgot password? Click here to reset