A Unified Strategy for Multilingual Grammatical Error Correction with Pre-trained Cross-Lingual Language Model

01/26/2022
by   Xin Sun, et al.
6

Synthetic data construction of Grammatical Error Correction (GEC) for non-English languages relies heavily on human-designed and language-specific rules, which produce limited error-corrected patterns. In this paper, we propose a generic and language-independent strategy for multilingual GEC, which can train a GEC system effectively for a new non-English language with only two easy-to-access resources: 1) a pretrained cross-lingual language model (PXLM) and 2) parallel translation data between English and the language. Our approach creates diverse parallel GEC data without any language-specific operations by taking the non-autoregressive translation generated by PXLM and the gold translation as error-corrected sentence pairs. Then, we reuse PXLM to initialize the GEC model and pretrain it with the synthetic data generated by itself, which yields further improvement. We evaluate our approach on three public benchmarks of GEC in different languages. It achieves the state-of-the-art results on the NLPCC 2018 Task 2 dataset (Chinese) and obtains competitive performance on Falko-Merlin (German) and RULEC-GEC (Russian). Further analysis demonstrates that our data construction method is complementary to rule-based approaches.

READ FULL TEXT
research
11/07/2019

Improving Grammatical Error Correction with Machine Translation Pairs

We propose a novel data synthesis method to generate diverse error-corre...
research
02/28/2022

CINO: A Chinese Minority Pre-trained Language Model

Multilingual pre-trained language models have shown impressive performan...
research
10/24/2019

Low-Resource Sequence Labeling via Unsupervised Multilingual Contextualized Representations

Previous work on cross-lingual sequence labeling tasks either requires p...
research
09/16/2020

Reusing a Pretrained Language Model on Languages with Limited Corpora for Unsupervised NMT

Using a language model (LM) pretrained on two languages with large monol...
research
03/30/2023

A BERT-based Unsupervised Grammatical Error Correction Framework

Grammatical error correction (GEC) is a challenging task of natural lang...
research
11/27/2019

Findings of the 2016 WMT Shared Task on Cross-lingual Pronoun Prediction

We describe the design, the evaluation setup, and the results of the 201...
research
09/14/2021

LM-Critic: Language Models for Unsupervised Grammatical Error Correction

Training a model for grammatical error correction (GEC) requires a set o...

Please sign up or login with your details

Forgot password? Click here to reset