Neural CRF Model for Sentence Alignment in Text Simplification

05/05/2020
by   Chao Jiang, et al.
0

The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity. Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1. We apply our CRF aligner to construct two new text simplification datasets, Newsela-Auto and Wiki-Auto, which are much larger and of better quality compared to the existing datasets. A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2018

Monolingual sentence matching for text simplification

This work improves monolingual sentence alignment for text simplificatio...
research
06/04/2021

Neural semi-Markov CRF for Monolingual Word Alignment

Monolingual word alignment is important for studying fine-grained editin...
research
10/26/2022

arXivEdits: Understanding the Human Revision Process in Scientific Writing

Scientific publications are the primary means to communicate research di...
research
12/13/2016

Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora

Parallel corpora have driven great progress in the field of Text Simplif...
research
06/17/2016

Improving Agreement and Disagreement Identification in Online Discussions with A Socially-Tuned Sentiment Lexicon

We study the problem of agreement and disagreement detection in online d...
research
04/02/2022

Learning to Simplify with Data Hopelessly Out of Alignment

We consider whether it is possible to do text simplification without rel...
research
08/31/2019

Humor Detection: A Transformer Gets the Last Laugh

Much previous work has been done in attempting to identify humor in text...

Please sign up or login with your details

Forgot password? Click here to reset