Judge a Sentence by Its Content to Generate Grammatical Errors

08/20/2022
by   Chowdhury Rafeed Rahman, et al.
0

Data sparsity is a well-known problem for grammatical error correction (GEC). Generating synthetic training data is one widely proposed solution to this problem, and has allowed models to achieve state-of-the-art (SOTA) performance in recent years. However, these methods often generate unrealistic errors, or aim to generate sentences with only one error. We propose a learning based two stage method for synthetic data generation for GEC that relaxes this constraint on sentences containing only one error. Errors are generated in accordance with sentence merit. We show that a GEC model trained on our synthetically generated corpus outperforms models trained on synthetic data from prior work.

READ FULL TEXT
research
05/27/2021

Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models

Synthetic data generation is widely known to boost the accuracy of neura...
research
07/21/2019

The Unbearable Weight of Generating Artificial Errors for Grammatical Error Correction

In recent years, sequence-to-sequence models have been very effective fo...
research
07/05/2023

Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction

Grammatical Error Correction (GEC) is the task of correcting errorful se...
research
08/29/2022

Reweighting Strategy based on Synthetic Data Identification for Sentence Similarity

Semantically meaningful sentence embeddings are important for numerous t...
research
09/29/2019

Controllable Data Synthesis Method for Grammatical Error Correction

Due to the lack of parallel data in current Grammatical Error Correction...
research
06/16/2023

Improving Audio Caption Fluency with Automatic Error Correction

Automated audio captioning (AAC) is an important cross-modality translat...
research
06/26/2023

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

Data-centric AI approach aims to enhance the model performance without m...

Please sign up or login with your details

Forgot password? Click here to reset