Comparison of Interactive Knowledge Base Spelling Correction Models for Low-Resource Languages

10/20/2020
by   Yiyuan Li, et al.
0

Spelling normalization for low resource languages is a challenging task because the patterns are hard to predict and large corpora are usually required to collect enough examples. This work shows a comparison of a neural model and character language models with varying amounts on target language data. Our usage scenario is interactive correction with nearly zero amounts of training examples, improving models as more data is collected, for example within a chat app. Such models are designed to be incrementally improved as feedback is given from users. In this work, we design a knowledge-base and prediction model embedded system for spelling correction in low-resource languages. Experimental results on multiple languages show that the model could become effective with a small amount of data. We perform experiments on both natural and synthetic data, as well as on data from two endangered languages (Ainu and Griko). Last, we built a prototype system that was used for a small case study on Hinglish, which further demonstrated the suitability of our approach in real world scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2021

Text Normalization for Low-Resource Languages of Africa

Training data for machine learning models can come from many different s...
research
10/26/2021

Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?

Recent impressive improvements in NLP, largely based on the success of c...
research
06/30/2019

Evaluating Language Model Finetuning Techniques for Low-resource Languages

Unlike mainstream languages (such as English and French), low-resource l...
research
06/16/2022

Text normalization for endangered languages: the case of Ligurian

Text normalization is a crucial technology for low-resource languages wh...
research
05/31/2023

MetaXLR – Mixed Language Meta Representation Transformation for Low-resource Cross-lingual Learning based on Multi-Armed Bandit

Transfer learning for extremely low resource languages is a challenging ...
research
10/26/2022

Modeling the Graphotactics of Low-Resource Languages Using Sequential GANs

Generative Adversarial Networks (GANs) have been shown to aid in the cre...
research
05/14/2016

Capturing divergence in dependency trees to improve syntactic projection

Obtaining syntactic parses is a crucial part of many NLP pipelines. Howe...

Please sign up or login with your details

Forgot password? Click here to reset