Towards Semi-Supervised Learning of Automatic Post-Editing: Data-Synthesis by Infilling Mask with Erroneous Tokens

04/08/2022
by   WonKee Lee, et al.
0

Semi-supervised learning that leverages synthetic training data has been widely adopted in the field of Automatic post-editing (APE) to overcome the lack of human-annotated training data. In that context, data-synthesis methods to create high-quality synthetic data have also received much attention. Considering that APE takes machine-translation outputs containing translation errors as input, we propose a noising-based data-synthesis method that uses a mask language model to create noisy texts through substituting masked tokens with erroneous tokens, yet following the error-quantity statistics appearing in genuine APE data. In addition, we propose corpus interleaving, which is to combine two separate synthetic data by taking only advantageous samples, to further enhance the quality of the synthetic data created with our noising method. Experimental results reveal that using the synthetic data created with our approach results in significant improvements in APE performance upon using other synthetic data created with different existing data-synthesis methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2017

Online Learning for Neural Machine Translation Post-editing

Neural machine translation has meant a revolution of the field. Neverthe...
research
02/08/2021

Quality Estimation without Human-labeled Data

Quality estimation aims to measure the quality of translated content wit...
research
04/05/2020

AR: Auto-Repair the Synthetic Data for Neural Machine Translation

Compared with only using limited authentic parallel data as training cor...
research
02/18/2023

Scalable Prompt Generation for Semi-supervised Learning with Language Models

Prompt-based learning methods in semi-supervised learning (SSL) settings...
research
03/02/2020

Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework

This article investigates into recently emerging approaches that use dee...
research
10/28/2018

Semi-Supervised Translation with MMD Networks

This work aims to improve semi-supervised learning in a neural network a...
research
07/03/2019

Learning to Predict Robot Keypoints Using Artificially Generated Images

This work considers robot keypoint estimation on color images as a super...

Please sign up or login with your details

Forgot password? Click here to reset