Controllable Data Synthesis Method for Grammatical Error Correction

09/29/2019
by   Chencheng Wang, et al.
0

Due to the lack of parallel data in current Grammatical Error Correction (GEC) task, models based on Sequence to Sequence framework cannot be adequately trained to obtain higher performance. We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data. The first approach is to corrupt each word in the monolingual corpus with a fixed probability, including replacement, insertion and deletion. Another approach is to train error generation models and further filtering the decoding results of the models. The experiments on different synthetic data show that the error rate is 40 model performance better. Finally, we synthesize about 100 million data and achieve comparable performance as the state of the art, which uses twice as much data as we use.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 7

page 10

page 11

page 12

research
09/21/2018

Attention-based Encoder-Decoder Networks for Spelling and Grammatical Error Correction

Automatic spelling and grammatical correction systems are one of the mos...
research
08/20/2022

Judge a Sentence by Its Content to Generate Grammatical Errors

Data sparsity is a well-known problem for grammatical error correction (...
research
10/31/2018

Weakly Supervised Grammatical Error Correction using Iterative Decoding

We describe an approach to Grammatical Error Correction (GEC) that is ef...
research
06/26/2023

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

Data-centric AI approach aims to enhance the model performance without m...
research
10/13/2021

Data Incubation – Synthesizing Missing Data for Handwriting Recognition

In this paper, we demonstrate how a generative model can be used to buil...
research
09/12/2023

Minimum Bayes' Risk Decoding for System Combination of Grammatical Error Correction Systems

For sequence-to-sequence tasks it is challenging to combine individual s...
research
07/05/2023

Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction

Grammatical Error Correction (GEC) is the task of correcting errorful se...

Please sign up or login with your details

Forgot password? Click here to reset