Log In Sign Up

Algorithms for certain classes of Tamil Spelling correction

by   Muthiah Annamalai, et al.

Tamil language has an agglutinative, diglossic, alpha-syllabary structure which provides a significant combinatorial explosion of morphological forms all of which are effectively used in Tamil prose, poetry from antiquity to the modern age in an unbroken chain of continuity. However, for the language understanding, spelling correction purposes some of these present challenges as out-of-dictionary words. In this paper the authors propose algorithmic techniques to handle specific problems of conjoined-words (out-of-dictionary) (transliteration)[thendRalkattRu] = [thendRal]+[kattRu] when parts are alone present in word-list in efficient way. Morphological structure of Tamil makes it necessary to depend on synthesis-analysis approach and dictionary lists will never be sufficient to truly capture the language. In this paper we have attempted to make a summary of various known algorithms for specific classes of Tamil spelling errors. We believe this collection of suggestions to improve future spelling checkers. We also note do not cover many important techniques like affix removal and other such techniques of key importance in rule-based spell checkers.


page 1

page 2

page 3

page 4


UzbekStemmer: Development of a Rule-Based Stemming Algorithm for Uzbek Language

In this paper we present a rule-based stemming algorithm for the Uzbek l...

Spell Correction for Azerbaijani Language using Deep Neural Networks

Spell correction is used to detect and correct orthographic mistakes in ...

SinSpell: A Comprehensive Spelling Checker for Sinhala

We have built SinSpell, a comprehensive spelling checker for the Sinhala...

Analyzer and generator for Pali

This work describes a system that performs morphological analysis and ge...

The Grievance Dictionary: Understanding Threatening Language Use

This paper introduces the Grievance Dictionary, a psycholinguistic dicti...

A Novel Approach to Compress Centralized Text Data using Indexed Dictionary

Data compression is very important feature in terms of saving the memory...

Correcting diacritics and typos with a ByT5 transformer model

Due to the fast pace of life and online communications and the prevalenc...