Log In Sign Up

Spelling Correction with Denoising Transformer

by   Alex Kuznetsov, et al.

We present a novel method of performing spelling correction on short input strings, such as search queries or individual words. At its core lies a procedure for generating artificial typos which closely follow the error patterns manifested by humans. This procedure is used to train the production spelling correction model based on a transformer architecture. This model is currently served in the HubSpot product search. We show that our approach to typo generation is superior to the widespread practice of adding noise, which ignores human patterns. We also demonstrate how our approach may be extended to resource-scarce settings and train spelling correction models for Arabic, Greek, Russian, and Setswana languages, without using any labeled data.


page 1

page 2

page 3

page 4


Learning From How Human Correct

In industry NLP application, our manually labeled data has a certain num...

Comparison of Grammatical Error Correction Using Back-Translation Models

Grammatical error correction (GEC) suffers from a lack of sufficient par...

Post-OCR Document Correction with large Ensembles of Character Sequence Models

In this paper, we propose a novel method based on character sequence-to-...

VSEC: Transformer-based Model for Vietnamese Spelling Correction

Spelling error correction is one of topics which have a long history in ...

Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Soft spelling errors are a class of spelling mistakes that is widespread...