Manually Annotated Spelling Error Corpus for Amharic

This paper presents a manually annotated spelling error corpus for Amharic, lingua franca in Ethiopia. The corpus is designed to be used for the evaluation of spelling error detection and correction. The misspellings are tagged as non-word and real-word errors. In addition, the contextual information available in the corpus makes it useful in dealing with both types of spelling errors.

READ FULL TEXT

page 1

page 2

page 3

research
03/31/2021

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

We present a corpus professionally annotated for grammatical error corre...
research
12/01/2020

HORAE: an annotated dataset of books of hours

We introduce in this paper a new dataset of annotated pages from books o...
research
03/24/2021

Finnish Paraphrase Corpus

In this paper, we introduce the first fully manually annotated paraphras...
research
10/22/2017

How big is big enough? Unsupervised word sense disambiguation using a very large corpus

In this paper, the problem of disambiguating a target word for Polish is...
research
09/04/2020

ViS-Á-ViS : Detecting Similar Patterns in Annotated Literary Text

We present a web-based system called ViS-Á-ViS aiming to assist literary...
research
02/26/2018

Publishing a Quality Context-aware Annotated Corpus and Lexicon for Harassment Research

Having a quality annotated corpus is essential especially for applied re...
research
07/17/2017

Artificial Error Generation with Machine Translation and Syntactic Patterns

Shortage of available training data is holding back progress in the area...

Please sign up or login with your details

Forgot password? Click here to reset