A context sensitive real-time Spell Checker with language adaptability

10/23/2019
by   Prabhakar Gupta, et al.
0

We present a novel language adaptable spell checking system which detects spelling errors and suggests context sensitive corrections in real-time. We show that our system can be extended to new languages with minimal language-specific processing. Available literature majorly discusses spell checkers for English but there are no publicly available systems which can be extended to work for other languages out of the box. Most of the systems do not work in real-time. We explain the process of generating a language's word dictionary and n-gram probability dictionaries using Wikipedia-articles data and manually curated video subtitles. We present the results of generating a list of suggestions for a misspelled word. We also propose three approaches to create noisy channel datasets of real-world typographic errors. We compare our system with industry-accepted spell checker tools for 11 languages. Finally, we show the performance of our system on synthetic datasets for 24 languages.

READ FULL TEXT

page 5

page 6

research
08/12/2022

Automatically Creating a Large Number of New Bilingual Dictionaries

This paper proposes approaches to automatically create a large number of...
research
01/13/2022

Compressing Word Embeddings Using Syllables

This work examines the possibility of using syllable embeddings, instead...
research
01/31/2022

Correcting diacritics and typos with a ByT5 transformer model

Due to the fast pace of life and online communications and the prevalenc...
research
08/08/2022

Automatically constructing Wordnet synsets

Manually constructing a Wordnet is a difficult task, needing years of ex...
research
10/19/2017

Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings

We present an unsupervised context-sensitive spelling correction method ...
research
10/31/2018

Real-time Automatic Word Segmentation for User-generated Text

For readability and possibly for disambiguation, appropriate word segmen...
research
11/27/2019

Sideways Transliteration: How to Transliterate Multicultural Person Names?

In a global setting, texts contain transliterated names from many cultur...

Please sign up or login with your details

Forgot password? Click here to reset