Evaluation of large-scale synthetic data for Grammar Error Correction

10/31/2022
by   Vanya Bannihatti Kumar, et al.
0

Grammar Error Correction(GEC) mainly relies on the availability of high quality of large amount of synthetic parallel data of grammatically correct and erroneous sentence pairs. The quality of the synthetic data is evaluated on how well the GEC system performs when pre-trained using it. But this does not provide much insight into what are the necessary factors which define the quality of these data. So this work aims to introduce 3 metrics - reliability, diversity and distribution match to provide more insight into the quality of large-scale synthetic data generated for the GEC task, as well as automatically evaluate them. Evaluating these three metrics automatically can also help in providing feedback to the data generation systems and thereby improve the quality of the synthetic data generated dynamically

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/27/2021

Synthetic Data Generation for Grammatical Error Correction with Tagged Corruption Models

Synthetic data generation is widely known to boost the accuracy of neura...
research
11/03/2022

From Spelling to Grammar: A New Framework for Chinese Grammatical Error Correction

Chinese Grammatical Error Correction (CGEC) aims to generate a correct s...
research
06/29/2020

Pricing cyber insurance for a large-scale network

Facing the lack of cyber insurance loss data, we propose an innovative a...
research
07/05/2023

Leveraging Denoised Abstract Meaning Representation for Grammatical Error Correction

Grammatical Error Correction (GEC) is the task of correcting errorful se...
research
09/20/2023

GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

Grammatical Error Detection and Correction (GEC) tools have proven usefu...
research
06/26/2023

Synthetic Alone: Exploring the Dark Side of Synthetic Data for Grammatical Error Correction

Data-centric AI approach aims to enhance the model performance without m...
research
07/11/2022

PSP-HDRI+: A Synthetic Dataset Generator for Pre-Training of Human-Centric Computer Vision Models

We introduce a new synthetic data generator PSP-HDRI+ that proves to be ...

Please sign up or login with your details

Forgot password? Click here to reset