Improve Sentence Alignment by Divide-and-conquer

01/18/2022
by   Wu Zhang, et al.
0

In this paper, we introduce a divide-and-conquer algorithm to improve sentence alignment speed. We utilize external bilingual sentence embeddings to find accurate hard delimiters for the parallel texts to be aligned. We use Monte Carlo simulation to show experimentally that using this divide-and-conquer algorithm, we can turn any quadratic time complexity sentence alignment algorithm into an algorithm with average time complexity of O(NlogN). On a standard OCR-generated dataset, our method improves the Bleualign baseline by 3 F1 points. Besides, when computational resources are restricted, our algorithm is faster than Vecalign in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2020

Fast and Work-Optimal Parallel Algorithms for Predicate Detection

Recently, the predicate detection problem was shown to be in the paralle...
research
01/29/2019

Divide and Generate: Neural Generation of Complex Sentences

We propose a task to generate a complex sentence from a simple sentence ...
research
12/13/2016

Vicinity-Driven Paragraph and Sentence Alignment for Comparable Corpora

Parallel corpora have driven great progress in the field of Text Simplif...
research
06/05/2019

Learning Bilingual Sentence Embeddings via Autoencoding and Computing Similarities with a Multilayer Perceptron

We propose a novel model architecture and training algorithm to learn bi...
research
09/01/2023

A Massively Parallel Dynamic Programming for Approximate Rectangle Escape Problem

Sublinear time complexity is required by the massively parallel computat...
research
08/04/2020

Exact, Parallelizable Dynamic Time Warping Alignment with Linear Memory

Audio alignment is a fundamental preprocessing step in many MIR pipeline...
research
01/15/2001

Multiple-Size Divide-and-Conquer Recurrences

This short note reports a master theorem on tight asymptotic solutions t...

Please sign up or login with your details

Forgot password? Click here to reset