Automatic Spell Checker and Correction for Under-represented Spoken Languages: Case Study on Wolof

05/22/2023
by   Thierno Ibrahima Cissé, et al.
0

This paper presents a spell checker and correction tool specifically designed for Wolof, an under-represented spoken language in Africa. The proposed spell checker leverages a combination of a trie data structure, dynamic programming, and the weighted Levenshtein distance to generate suggestions for misspelled words. We created novel linguistic resources for Wolof, such as a lexicon and a corpus of misspelled words, using a semi-automatic approach that combines manual and automatic annotation methods. Despite the limited data available for the Wolof language, the spell checker's performance showed a predictive accuracy of 98.31 remains the revitalization and preservation of Wolof as an Indigenous and spoken language in Africa, providing our efforts to develop novel linguistic resources. This work represents a valuable contribution to the growth of computational tools and resources for the Wolof language and provides a strong foundation for future studies in the automatic spell checking and correction field.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2023

Linguistic Analysis using Paninian System of Sounds and Finite State Machines

The study of spoken languages comprises phonology, morphology, and gramm...
research
11/08/2019

Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates

Current research into spoken language translation (SLT) is often hampere...
research
10/07/2018

Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Transliteration converts words in a source language (e.g., English) into...
research
07/08/2018

A Deep Generative Model of Vowel Formant Typology

What makes some types of languages more probable than others? For instan...
research
05/11/2023

The First Parallel Corpora for Kurdish Sign Language

Kurdish Sign Language (KuSL) is the natural language of the Kurdish Deaf...
research
12/18/2019

Towards an automatic recognition of mixed languages: The Ukrainian-Russian hybrid language Surzhyk

Language interference is common in today's multilingual societies where ...
research
05/07/2020

Nakdan: Professional Hebrew Diacritizer

We present a system for automatic diacritization of Hebrew text. The sys...

Please sign up or login with your details

Forgot password? Click here to reset