Scalable String Reconciliation by Recursive Content-Dependent Shingling

10/01/2019
by   Bowen Song, et al.
0

We consider the problem of reconciling similar, but remote, strings with minimum communication complexity. This "string reconciliation" problem is a fundamental building block for a variety of networking applications, including those that maintain large-scale distributed networks and perform remote file synchronization. We present the novel Recursive Content-Dependent Shingling (RCDS) protocol that is computationally practical for large strings and scales linearly with the edit distance between the remote strings. We provide comparisons to the performance of Rsync, one of the most popular file synchronization tools in active use. Our experiments show that, with minimal engineering, RCDS outperforms the heavily optimized Rsync in reconciling release revisions for about 51 GitHub. The improvement is particularly evident for repositories that see frequent, but small, updates.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2018

Synchronization Strings: Efficient and Fast Deterministic Constructions over Small Alphabets

Synchronization strings are recently introduced by Haeupler and Shahrasb...
research
02/09/2023

Locally consistent decomposition of strings with applications to edit distance sketching

In this paper we provide a new locally consistent decomposition of strin...
research
10/20/2018

MinJoin: Efficient Edit Similarity Joins via Local Hash Minimums

In this paper we study edit similarity joins, in which we are given a se...
research
11/10/2018

Efficiently Approximating Edit Distance Between Pseudorandom Strings

We present an algorithm for approximating the edit distance ed(x, y) bet...
research
03/21/2019

Scalable Similarity Joins of Tokenized Strings

This work tackles the problem of fuzzy joining of strings that naturally...
research
11/01/2022

sRSP: GPUlarda Asimetrik Senkronizasyon Icin Yeni Olceklenebilir Bir Cozum

Asymmetric sharing is a dynamic sharing model, where a shared data is he...
research
02/18/2018

Scalable Alignment Kernels via Space-Efficient Feature Maps

String kernels are attractive data analysis tools for analyzing string d...

Please sign up or login with your details

Forgot password? Click here to reset