BSpell: A CNN-blended BERT Based Bengali Spell Checker

08/20/2022
by   Chowdhury Rafeed Rahman, et al.
0

Bengali typing is mostly performed using English keyboard and can be highly erroneous due to the presence of compound and similarly pronounced letters. Spelling correction of a misspelled word requires understanding of word typing pattern as well as the context of the word usage. We propose a specialized BERT model, BSpell targeted towards word for word correction in sentence level. BSpell contains an end-to-end trainable CNN sub-model named SemanticNet along with specialized auxiliary loss. This allows BSpell to specialize in highly inflected Bengali vocabulary in the presence of spelling errors. We further propose hybrid pretraining scheme for BSpell combining word level and character level masking. Utilizing this pretraining scheme, BSpell achieves 91.5 accuracy on real life Bengali spelling correction validation set. Detailed comparison on two Bengali and one Hindi spelling correction dataset shows the superiority of proposed BSpell over existing spell checkers.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

"Is Whole Word Masking Always Better for Chinese BERT?": Probing on Chinese Grammatical Error Correction

Whole word masking (WWM), which masks all subwords corresponding to a wo...
research
10/25/2021

Paradigm Shift in Language Modeling: Revisiting CNN for Modeling Sanskrit Originated Bengali and Hindi Language

Though there has been a large body of recent works in language modeling ...
research
03/12/2022

MarkBERT: Marking Word Boundaries Improves Chinese BERT

We present a Chinese BERT model dubbed MarkBERT that uses word informati...
research
02/09/2023

Correcting Real-Word Spelling Errors: A New Hybrid Approach

Spelling correction is one of the main tasks in the field of Natural Lan...
research
05/28/2021

Hierarchical Transformer Encoders for Vietnamese Spelling Correction

In this paper, we propose a Hierarchical Transformer model for Vietnames...
research
10/20/2020

CharacterBERT: Reconciling ELMo and BERT for Word-Level Open-Vocabulary Representations From Characters

Due to the compelling improvements brought by BERT, many recent represen...
research
04/26/2022

Pretraining Chinese BERT for Detecting Word Insertion and Deletion Errors

Chinese BERT models achieve remarkable progress in dealing with grammati...

Please sign up or login with your details

Forgot password? Click here to reset