Hierarchical Character Tagger for Short Text Spelling Error Correction

09/29/2021
by   Mengyi Gao, et al.
0

State-of-the-art approaches to spelling error correction problem include Transformer-based Seq2Seq models, which require large training sets and suffer from slow inference time; and sequence labeling models based on Transformer encoders like BERT, which involve token-level label space and therefore a large pre-defined vocabulary dictionary. In this paper we present a Hierarchical Character Tagger model, or HCTagger, for short text spelling error correction. We use a pre-trained language model at the character level as a text encoder, and then predict character-level edits to transform the original text into its error-free form with a much smaller label space. For decoding, we propose a hierarchical multi-task approach to alleviate the issue of long-tail label distribution without introducing extra model parameters. Experiments on two public misspelling correction datasets demonstrate that HCTagger is an accurate and much faster approach than many existing models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/10/2020

Towards Minimal Supervision BERT-based Grammar Error Correction

Current grammatical error correction (GEC) models typically consider the...
research
12/14/2020

Vartani Spellcheck – Automatic Context-Sensitive Spelling Correction of OCR-generated Hindi Text Using BERT and Levenshtein Distance

Traditional Optical Character Recognition (OCR) systems that generate te...
research
05/22/2023

Bidirectional Transformer Reranker for Grammatical Error Correction

Pre-trained seq2seq models have achieved state-of-the-art results in the...
research
05/11/2020

Hierarchical Attention Transformer Architecture For Syntactic Spell Correction

The attention mechanisms are playing a boosting role in advancements in ...
research
05/28/2021

Hierarchical Transformer Encoders for Vietnamese Spelling Correction

In this paper, we propose a Hierarchical Transformer model for Vietnames...
research
05/15/2020

Spelling Error Correction with Soft-Masked BERT

Spelling error correction is an important yet challenging task because a...
research
05/26/2020

GECToR – Grammatical Error Correction: Tag, Not Rewrite

In this paper, we present a simple and efficient GEC sequence tagger usi...

Please sign up or login with your details

Forgot password? Click here to reset