Improving Grammatical Error Correction with Machine Translation Pairs

11/07/2019
by   Wangchunshu Zhou, et al.
0

We propose a novel data synthesis method to generate diverse error-corrected sentence pairs for improving grammatical error correction, which is based on a pair of machine translation models of different qualities (i.e., poor and good). The poor translation model resembles the ESL (English as a second language) learner and tends to generate translations of low quality in terms of fluency and grammatical correctness, while the good translation model generally generates fluent and grammatically correct translations. We build the poor and good translation model with phrase-based statistical machine translation model with decreased language model weight and neural machine translation model respectively. By taking the pair of their translations of the same sentences in a bridge language as error-corrected sentence pairs, we can construct unlimited pseudo parallel data. Our approach is capable of generating diverse fluency-improving patterns without being limited by the pre-defined rule set and the seed error-corrected data. Experimental results demonstrate the effectiveness of our approach and show that it can be combined with other synthetic data sources to yield further improvements.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/26/2022

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

Synthetic data construction of Grammatical Error Correction (GEC) for no...
research
09/08/2021

Mixup Decoding for Diverse Machine Translation

Diverse machine translation aims at generating various target language t...
research
09/01/2021

An Unsupervised Method for Building Sentence Simplification Corpora in Multiple Languages

The availability of parallel sentence simplification (SS) is scarce for ...
research
10/06/2020

Adversarial Grammatical Error Correction

Recent works in Grammatical Error Correction (GEC) have leveraged the pr...
research
11/14/2021

DEEP: DEnoising Entity Pre-training for Neural Machine Translation

It has been shown that machine translation models usually generate poor ...
research
10/20/2016

Iterative Refinement for Machine Translation

Existing machine translation decoding algorithms generate translations i...
research
04/08/2020

Error-correction and extraction in request dialogs

We propose a component that gets a request and a correction and outputs ...

Please sign up or login with your details

Forgot password? Click here to reset