LM-Critic: Language Models for Unsupervised Grammatical Error Correction

09/14/2021
by   Michihiro Yasunaga, et al.
12

Training a model for grammatical error correction (GEC) requires a set of labeled ungrammatical / grammatical sentence pairs, but manually annotating such pairs can be expensive. Recently, the Break-It-Fix-It (BIFI) framework has demonstrated strong results on learning to repair a broken program without any labeled examples, but this relies on a perfect critic (e.g., a compiler) that returns whether an example is valid or not, which does not exist for the GEC task. In this work, we show how to leverage a pretrained language model (LM) in defining an LM-Critic, which judges a sentence to be grammatical if the LM assigns it a higher probability than its local perturbations. We apply this LM-Critic and BIFI along with a large set of unlabeled sentences to bootstrap realistic ungrammatical / grammatical pairs for training a corrector. We evaluate our approach on GEC datasets across multiple domains (CoNLL-2014, BEA-2019, GMEG-wiki and GMEG-yahoo) and show that it outperforms existing methods in both the unsupervised setting (+7.7 F0.5) and the supervised setting (+0.5 F0.5).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/11/2021

Break-It-Fix-It: Unsupervised Learning for Program Repair

We consider repair tasks: given a critic (e.g., compiler) that assesses ...
research
01/10/2020

Towards Minimal Supervision BERT-based Grammar Error Correction

Current grammatical error correction (GEC) models typically consider the...
research
07/03/2018

Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

Neural sequence-to-sequence (seq2seq) approaches have proven to be succe...
research
09/15/2022

uChecker: Masked Pretrained Language Models as Unsupervised Chinese Spelling Checkers

The task of Chinese Spelling Check (CSC) is aiming to detect and correct...
research
04/15/2021

Generating Datasets with Pretrained Language Models

To obtain high-quality sentence embeddings from pretrained language mode...
research
01/26/2022

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

Synthetic data construction of Grammatical Error Correction (GEC) for no...
research
11/22/2022

Converge to the Truth: Factual Error Correction via Iterative Constrained Editing

Given a possibly false claim sentence, how can we automatically correct ...

Please sign up or login with your details

Forgot password? Click here to reset