Spelling Correction with Denoising Transformer

05/12/2021
by   Alex Kuznetsov, et al.
0

We present a novel method of performing spelling correction on short input strings, such as search queries or individual words. At its core lies a procedure for generating artificial typos which closely follow the error patterns manifested by humans. This procedure is used to train the production spelling correction model based on a transformer architecture. This model is currently served in the HubSpot product search. We show that our approach to typo generation is superior to the widespread practice of adding noise, which ignores human patterns. We also demonstrate how our approach may be extended to resource-scarce settings and train spelling correction models for Arabic, Greek, Russian, and Setswana languages, without using any labeled data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/30/2021

Learning From How Human Correct

In industry NLP application, our manually labeled data has a certain num...
research
06/28/2023

You Can Generate It Again: Data-to-text Generation with Verification and Correction Prompting

Despite significant advancements in existing models, generating text des...
research
05/24/2023

Advancements in Arabic Grammatical Error Detection and Correction: An Empirical Investigation

Grammatical error correction (GEC) is a well-explored problem in English...
research
08/29/2023

Enhancing OCR Performance through Post-OCR Models: Adopting Glyph Embedding for Improved Correction

The study investigates the potential of post-OCR models to overcome limi...
research
08/02/2021

Correcting Arabic Soft Spelling Mistakes using BiLSTM-based Machine Learning

Soft spelling errors are a class of spelling mistakes that is widespread...
research
04/07/2023

Cleansing Jewel: A Neural Spelling Correction Model Built On Google OCR-ed Tibetan Manuscripts

Scholars in the humanities rely heavily on ancient manuscripts to study ...

Please sign up or login with your details

Forgot password? Click here to reset