UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

03/31/2021
by   Oleksiy Syvokon, et al.
0

We present a corpus professionally annotated for grammatical error correction (GEC) and fluency edits in the Ukrainian language. To the best of our knowledge, this is the first GEC corpus for the Ukrainian language. We collected texts with errors (20,715 sentences) from a diverse pool of contributors, including both native and non-native speakers. The data cover a wide variety of writing domains, from text chats and essays to formal writing. Professional proofreaders corrected and annotated the corpus for errors relating to fluency, grammar, punctuation, and spelling. This corpus can be used for developing and evaluating GEC systems in Ukrainian. More generally, it can be used for researching multilingual and low-resource NLP, morphologically rich languages, document-level GEC, and fluency correction. The corpus is publicly available at https://github.com/grammarly/ua-gec

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2023

A Language Model for Grammatical Error Correction in L2 Russian

Grammatical error correction is one of the fundamental tasks in Natural ...
research
02/14/2017

JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG...
research
11/28/2019

GitHub Typo Corpus: A Large-Scale Multilingual Dataset of Misspellings and Grammatical Errors

The lack of large-scale datasets has been a major hindrance to the devel...
research
06/25/2021

Manually Annotated Spelling Error Corpus for Amharic

This paper presents a manually annotated spelling error corpus for Amhar...
research
09/20/2023

GECTurk: Grammatical Error Correction and Detection Dataset for Turkish

Grammatical Error Detection and Correction (GEC) tools have proven usefu...
research
06/04/2020

Personalizing Grammatical Error Correction: Adaptation to Proficiency Level and L1

Grammar error correction (GEC) systems have become ubiquitous in a varie...
research
05/23/2022

Towards Automated Document Revision: Grammatical Error Correction, Fluency Edits, and Beyond

Natural language processing technology has rapidly improved automated gr...

Please sign up or login with your details

Forgot password? Click here to reset