Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling Corrector

05/15/2023
by   Derguene Mbaye, et al.
0

The progress of Natural Language Processing (NLP), although fast in recent years, is not at the same pace for all languages. African languages in particular are still behind and lack automatic processing tools. Some of these tools are very important for the development of these languages but also have an important role in many NLP applications. This is particularly the case for automatic spell checkers. Several approaches have been studied to address this task and the one modeling spelling correction as a translation task from misspelled (noisy) text to well-spelled (correct) text shows promising results. However, this approach requires a parallel corpus of noisy data on the one hand and correct data on the other hand, whereas Wolof is a low-resource language and does not have such a corpus. In this paper, we present a way to address the constraint related to the lack of data by generating synthetic data and we present sequence-to-sequence models using Deep Learning for spelling correction in Wolof. We evaluated these models in three different scenarios depending on the subwording method applied to the data and showed that the latter had a significant impact on the performance of the models, which opens the way for future research in Wolof spelling correction.

READ FULL TEXT

page 2

page 3

page 5

research
05/22/2011

Correction of Noisy Sentences using a Monolingual Corpus

Correction of Noisy Natural Language Text is an important and well studi...
research
08/25/2023

Ngambay-French Neural Machine Translation (sba-Fr)

In Africa, and the world at large, there is an increasing focus on devel...
research
05/23/2023

LIMIT: Language Identification, Misidentification, and Translation using Hierarchical Models in 350+ Languages

Knowing the language of an input text/audio is a necessary first step fo...
research
04/27/2020

Natural language processing for achieving sustainable development: the case of neural labelling to enhance community profiling

In recent years, there has been an increasing interest in the applicatio...
research
11/10/2020

OCR Post Correction for Endangered Language Texts

There is little to no data available to build natural language processin...
research
01/30/2021

Learning From How Human Correct

In industry NLP application, our manually labeled data has a certain num...

Please sign up or login with your details

Forgot password? Click here to reset