On Estimating Edit Distance: Alignment, Dimension Reduction, and Embeddings

04/26/2018
by   Moses Charikar, et al.
0

Edit distance is a fundamental measure of distance between strings and has been widely studied in computer science. While the problem of estimating edit distance has been studied extensively, the equally important question of actually producing an alignment (i.e., the sequence of edits) has received far less attention. Somewhat surprisingly, we show that any algorithm to estimate edit distance can be used in a black-box fashion to produce an approximate alignment of strings, with modest loss in approximation factor and small loss in run time. Plugging in the result of Andoni, Krauthgamer, and Onak, we obtain an alignment that is a ( n)^O(1/ε^2) approximation in time Õ(n^1 + ε). Closely related to the study of approximation algorithms is the study of metric embeddings for edit distance. We show that min-hash techniques can be useful in designing edit distance embeddings through three results: (1) An embedding from Ulam distance (edit distance over permutations) to Hamming space that matches the best known distortion of O( n) and also implicitly encodes a sequence of edits between the strings; (2) In the case where the edit distance between the input strings is known to have an upper bound K, we show that embeddings of edit distance into Hamming space with distortion f(n) can be modified in a black-box fashion to give distortion O(f(poly(K))) for a class of periodic-free strings; (3) A randomized dimension-reduction map with contraction c and asymptotically optimal expected distortion O(c), improving on the previous Õ(c^1 + 2 / n) distortion result of Batu, Ergun, and Sahinalp.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Sublinear-Time Algorithms for Computing Embedding Gap Edit Distance

In this paper, we design new sublinear-time algorithms for solving the g...
research
04/10/2019

Reducing approximate Longest Common Subsequence to approximate Edit Distance

Given a pair of strings, the problems of computing their Longest Common ...
research
04/11/2018

Approximating Edit Distance in Truly Subquadratic Time: Quantum and MapReduce

The edit distance between two strings is defined as the smallest number ...
research
07/14/2021

Levenshtein Graphs: Resolvability, Automorphisms Determining Sets

We introduce the notion of Levenshtein graphs, an analog to Hamming grap...
research
03/11/2021

Imagined-Trailing-Whitespace-Agnostic Levenshtein Distance For Plaintext Table Detection

The standard algorithm for Levenshtein distance, treats trailing whitesp...
research
11/12/2017

Longest Alignment with Edits in Data Streams

Analyzing patterns in data streams generated by network traffic, sensor ...
research
10/25/2020

An Improved Sketching Bound for Edit Distance

We provide improved upper bounds for the simultaneous sketching complexi...

Please sign up or login with your details

Forgot password? Click here to reset