Character Transformations for Non-Autoregressive GEC Tagging

11/17/2021
by   Milan Straka, et al.
0

We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations. Recently, per-word classification of correction edits has proven an efficient, parallelizable alternative to current encoder-decoder GEC systems. We show that word replacement edits may be suboptimal and lead to explosion of rules for spelling, diacritization and errors in morphologically rich languages, and propose a method for generating character transformations from GEC corpus. Finally, we train character transformation models for Czech, German and Russian, reaching solid results and dramatic speedup compared to autoregressive systems. The source code is released at https://github.com/ufal/wnut2021_character_transformations_gec.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/15/2017

Character-based Neural Embeddings for Tweet Clustering

In this paper we show how the performance of tweet clustering can be imp...
research
08/28/2018

Explaining Character-Aware Neural Networks for Word-Level Prediction: Do They Discover Linguistic Rules?

Character-level features are currently used in different neural network-...
research
09/13/2021

Post-OCR Document Correction with large Ensembles of Character Sequence Models

In this paper, we propose a novel method based on character sequence-to-...
research
09/20/2023

Large Synthetic Data from the arXiv for OCR Post Correction of Historic Scientific Articles

Scientific articles published prior to the "age of digitization" ( 1997)...
research
09/08/2021

Highly Parallel Autoregressive Entity Linking with Discriminative Correction

Generative approaches have been recently shown to be effective for both ...
research
05/27/2022

Patching Leaks in the Charformer for Efficient Character-Level Generation

Character-based representations have important advantages over subword-b...
research
05/10/2023

Acceleration of FM-index Queries Through Prefix-free Parsing

FM-indexes are a crucial data structure in DNA alignment, for example, b...

Please sign up or login with your details

Forgot password? Click here to reset