Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora

Parallel corpora have driven great progress in the field of Text Simplification. However, most sentence alignment algorithms either offer a limited range of alignment types supported, or simply ignore valuable clues present in comparable documents. We address this problem by introducing a new set of flexible vicinity-driven paragraph and sentence alignment algorithms that 1-N, N-1, N-N and long distance null alignments without the need for hard-to-replicate supervised models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/30/2015

A Sentence Meaning Based Alignment Method for Parallel Text Corpora Preparation

Text alignment is crucial to the accuracy of Machine Translation (MT) sy...
research
05/05/2020

Neural CRF Model for Sentence Alignment in Text Simplification

The success of a text simplification system heavily depends on the quali...
research
06/12/2021

Exploiting Parallel Corpora to Improve Multilingual Embedding based Document and Sentence Alignment

Multilingual sentence representations pose a great advantage for low-res...
research
01/18/2022

Improve Sentence Alignment by Divide-and-conquer

In this paper, we introduce a divide-and-conquer algorithm to improve se...
research
10/15/2015

Noisy-parallel and comparable corpora filtering methodology for the extraction of bi-lingual equivalent data at sentence level

Text alignment and text quality are critical to the accuracy of Machine ...
research
09/19/2018

Monolingual sentence matching for text simplification

This work improves monolingual sentence alignment for text simplificatio...
research
06/22/2022

Understanding the Properties of Generated Corpora

Models for text generation have become focal for many research tasks and...

Please sign up or login with your details

Forgot password? Click here to reset