Text Simplification by Tagging

03/08/2021
by   Kostiantyn Omelianchuk, et al.
0

Edit-based approaches have recently shown promising results on multiple monolingual sequence transduction tasks. In contrast to conventional sequence-to-sequence (Seq2Seq) models, which learn to generate text from scratch as they are trained on parallel corpora, these methods have proven to be much more effective since they are able to learn to make fast and accurate transformations while leveraging powerful pre-trained language models. Inspired by these ideas, we present TST, a simple and efficient Text Simplification system based on sequence Tagging, leveraging pre-trained Transformer-based encoders. Our system makes simplistic data augmentations and tweaks in training and inference on a pre-existing system, which makes it less reliant on large amounts of parallel training data, provides more control over the outputs and enables faster inference speeds. Our best model achieves near state-of-the-art performance on benchmark test datasets for the task. Since it is fully non-autoregressive, it achieves faster inference speeds by over 11 times than the current state-of-the-art text simplification system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/24/2020

Felix: Flexible Text Editing Through Tagging and Insertion

We present Felix — a flexible text-editing approach for generation, desi...
research
10/21/2021

Improving Non-autoregressive Generation with Mixup Training

While pre-trained language models have achieved great success on various...
research
06/11/2022

An Evaluation of OCR on Egocentric Data

In this paper, we evaluate state-of-the-art OCR methods on Egocentric da...
research
05/26/2020

GECToR – Grammatical Error Correction: Tag, Not Rewrite

In this paper, we present a simple and efficient GEC sequence tagger usi...
research
04/15/2022

Text Revision by On-the-Fly Representation Optimization

Text revision refers to a family of natural language generation tasks, w...
research
03/24/2022

Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction

In this paper, we investigate improvements to the GEC sequence tagging a...
research
03/16/2023

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

Progress in machine learning has been driven in large part by massive in...

Please sign up or login with your details

Forgot password? Click here to reset