PETCI: A Parallel English Translation Dataset of Chinese Idioms

02/19/2022
by   Kenan Tang, et al.
0

Idioms are an important language phenomenon in Chinese, but idiom translation is notoriously hard. Current machine translation models perform poorly on idiom translation, while idioms are sparse in many translation datasets. We present PETCI, a parallel English translation dataset of Chinese idioms, aiming to improve idiom translation by both human and machine. The dataset is built by leveraging human and machine effort. Baseline generation models show unsatisfactory abilities to improve translation, but structure-aware classification models show good performance on distinguishing good translations. Furthermore, the size of PETCI can be easily increased without expertise. Overall, PETCI can be helpful to language learners and machine translation systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/20/2017

The Helsinki Neural Machine Translation System

We introduce the Helsinki Neural Machine Translation system (HNMT) and h...
research
10/01/2022

FRMT: A Benchmark for Few-Shot Region-Aware Machine Translation

We present FRMT, a new dataset and evaluation benchmark for Few-shot Reg...
research
03/27/2023

Linguistically Informed ChatGPT Prompts to Enhance Japanese-Chinese Machine Translation: A Case Study on Attributive Clauses

In the field of Japanese-Chinese translation linguistics, the issue of c...
research
09/14/2017

Machine-Translation History and Evolution: Survey for Arabic-English Translations

As a result of the rapid changes in information and communication techno...
research
10/23/2022

Translation Word-Level Auto-Completion: What can we achieve out of the box?

Research on Machine Translation (MT) has achieved important breakthrough...
research
01/23/2017

A Multichannel Convolutional Neural Network For Cross-language Dialog State Tracking

The fifth Dialog State Tracking Challenge (DSTC5) introduces a new cross...

Please sign up or login with your details

Forgot password? Click here to reset