DPCSpell: A Transformer-based Detector-Purificator-Corrector Framework for Spelling Error Correction of Bangla and Resource Scarce Indic Languages

11/07/2022
by   Mehedi Hasan Bijoy, et al.
0

Spelling error correction is the task of identifying and rectifying misspelled words in texts. It is a potential and active research topic in Natural Language Processing because of numerous applications in human language understanding. The phonetically or visually similar yet semantically distinct characters make it an arduous task in any language. Earlier efforts on spelling error correction in Bangla and resource-scarce Indic languages focused on rule-based, statistical, and machine learning-based methods which we found rather inefficient. In particular, machine learning-based approaches, which exhibit superior performance to rule-based and statistical methods, are ineffective as they correct each character regardless of its appropriateness. In this work, we propose a novel detector-purificator-corrector framework based on denoising transformers by addressing previous issues. Moreover, we present a method for large-scale corpus creation from scratch which in turn resolves the resource limitation problem of any left-to-right scripted language. The empirical outcomes demonstrate the effectiveness of our approach that outperforms previous state-of-the-art methods by a significant margin for Bangla spelling error correction. The models and corpus are publicly available at https://tinyurl.com/DPCSpell.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/01/2019

Grammatical Error Correction in Low-Resource Scenarios

Grammatical error correction in English is a long studied problem with m...
research
05/12/2021

Spelling Correction with Denoising Transformer

We present a novel method of performing spelling correction on short inp...
research
08/28/2023

Rule-Based Error Detection and Correction to Operationalize Movement Trajectory Classification

Classification of movement trajectories has many applications in transpo...
research
03/30/2023

A BERT-based Unsupervised Grammatical Error Correction Framework

Grammatical error correction (GEC) is a challenging task of natural lang...
research
08/04/2020

An improved Bayesian TRIE based model for SMS text normalization

Normalization of SMS text, commonly known as texting language, is being ...
research
11/10/2020

OCR Post Correction for Endangered Language Texts

There is little to no data available to build natural language processin...
research
05/29/2023

Exploring Effectiveness of GPT-3 in Grammatical Error Correction: A Study on Performance and Controllability in Prompt-Based Methods

Large-scale pre-trained language models such as GPT-3 have shown remarka...

Please sign up or login with your details

Forgot password? Click here to reset