Correction of Noisy Sentences using a Monolingual Corpus

05/22/2011
by   Diptesh Chatterhee, et al.
0

Correction of Noisy Natural Language Text is an important and well studied problem in Natural Language Processing. It has a number of applications in domains like Statistical Machine Translation, Second Language Learning and Natural Language Generation. In this work, we consider some statistical techniques for Text Correction. We define the classes of errors commonly found in text and describe algorithms to correct them. The data has been taken from a poorly trained Machine Translation system. The algorithms use only a language model in the target language in order to correct the sentences. We use phrase based correction methods in both the algorithms. The phrases are replaced and combined to give us the final corrected sentence. We also present the methods to model different kinds of errors, in addition to results of the working of the algorithms on the test set. We show that one of the approaches fail to achieve the desired goal, whereas the other succeeds well. In the end, we analyze the possible reasons for such a trend in performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/05/2015

Beyond Word-based Language Model in Statistical Machine Translation

Language model is one of the most important modules in statistical machi...
research
05/15/2023

Beqi: Revitalize the Senegalese Wolof Language with a Robust Spelling Corrector

The progress of Natural Language Processing (NLP), although fast in rece...
research
07/11/2016

sk_p: a neural program corrector for MOOCs

We present a novel technique for automatic program correction in MOOCs, ...
research
07/06/2023

Statistical Mechanics of Strahler Number via Random and Natural Language Sentences

The Strahler number was originally proposed to characterize the complexi...
research
11/12/2020

Context-aware Stand-alone Neural Spelling Correction

Existing natural language processing systems are vulnerable to noisy inp...
research
02/18/2021

Fixing Errors of the Google Voice Recognizer through Phonetic Distance Metrics

Speech recognition systems for the Spanish language, such as Google's, p...
research
08/26/2019

uniblock: Scoring and Filtering Corpus with Unicode Block Information

The preprocessing pipelines in Natural Language Processing usually invol...

Please sign up or login with your details

Forgot password? Click here to reset