Evaluation of basic modules for isolated spelling error correction in Polish texts

05/26/2019
by   Szymon Rutkowski, et al.
0

Spelling error correction is an important problem in natural language processing, as a prerequisite for good performance in downstream tasks as well as an important feature in user-facing applications. For texts in Polish language, there exist works on specific error correction solutions, often developed for dealing with specialized corpora, but not evaluations of many different approaches on big resources of errors. We begin to address this problem by testing some basic and promising methods on PlEWi, a corpus of annotated spelling extracted from Polish Wikipedia. These modules may be further combined with appropriate solutions for error detection and context awareness. Following our results, combining edit distance with cosine distance of semantic vectors may be suggested for interpretable systems, while an LSTM, particularly enhanced by ELMo embeddings, seems to offer the best raw performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2018

Building a Lemmatizer and a Spell-checker for Sorani Kurdish

The present paper aims at presenting a lemmatization and a word-level er...
research
07/04/2023

A Language Model for Grammatical Error Correction in L2 Russian

Grammatical error correction is one of the fundamental tasks in Natural ...
research
10/02/2008

Enhanced Integrated Scoring for Cleaning Dirty Texts

An increasing number of approaches for ontology engineering from text ar...
research
02/14/2017

JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG...
research
04/05/2019

Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models --- Is Single-Corpus Evaluation Enough?

This study explores the necessity of performing cross-corpora evaluation...
research
05/23/2022

Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond

Natural language processing technology has rapidly improved automated gr...
research
10/28/2021

Optimizing Tail Latency in Commodity Datacenters using Forward Error Correction

Long tail latency of short flows (or messages) greatly affects user-faci...

Please sign up or login with your details

Forgot password? Click here to reset