A BERT-based Unsupervised Grammatical Error Correction Framework

03/30/2023
by   Nankai Lin, et al.
0

Grammatical error correction (GEC) is a challenging task of natural language processing techniques. While more attempts are being made in this approach for universal languages like English or Chinese, relatively little work has been done for low-resource languages for the lack of large annotated corpora. In low-resource languages, the current unsupervised GEC based on language model scoring performs well. However, the pre-trained language model is still to be explored in this context. This study proposes a BERT-based unsupervised GEC framework, where GEC is viewed as multi-class classification task. The framework contains three modules: data flow construction module, sentence perplexity scoring module, and error detecting and correcting module. We propose a novel scoring method for pseudo-perplexity to evaluate a sentence's probable correctness and construct a Tagalog corpus for Tagalog GEC research. It obtains competitive performance on the Tagalog corpus we construct and open-source Indonesian corpus and it demonstrates that our framework is complementary to baseline method for low-resource GEC task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2019

Evaluating Language Model Finetuning Techniques for Low-resource Languages

Unlike mainstream languages (such as English and French), low-resource l...
research
07/23/2019

Towards Unsupervised Grammatical Error Correction using Statistical Machine Translation with Synthetic Comparable Corpus

We introduce unsupervised techniques based on phrase-based statistical m...
research
02/28/2023

H-AES: Towards Automated Essay Scoring for Hindi

The use of Natural Language Processing (NLP) for Automated Essay Scoring...
research
01/26/2022

A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

Synthetic data construction of Grammatical Error Correction (GEC) for no...
research
11/17/2021

Green CWS: Extreme Distillation and Efficient Decode Method Towards Industrial Application

Benefiting from the strong ability of the pre-trained model, the researc...
research
10/27/2020

Volctrans Parallel Corpus Filtering System for WMT 2020

In this paper, we describe our submissions to the WMT20 shared task on p...

Please sign up or login with your details

Forgot password? Click here to reset