Sublinear-Time Algorithms for Computing Embedding Gap Edit Distance
In this paper, we design new sublinear-time algorithms for solving the gap edit distance problem and for embedding edit distance to Hamming distance. For the gap edit distance problem, we give a greedy algorithm that distinguishes in time Õ(n/k+k^2) between length-n input strings with edit distance at most k and those with edit distance more than 4k^2. This is an improvement and a simplification upon the main result of [Goldenberg, Krauthgamer, Saha, FOCS 2019], where the k vs Θ(k^2) gap edit distance problem is solved in Õ(n/k+k^3) time. We further generalize our result to solve the k vs α k gap edit distance problem in time Õ(n/α+k^2+ k/α√(nk)), strictly improving upon the previously known bound Õ(n/α+k^3). Finally, we show that if the input strings do not have long highly periodic substrings, then the gap edit distance problem can be solved in sublinear time within any factor α>1. We further give the first sublinear-time algorithm for the probabilistic embedding of edit distance to Hamming distance. Our Õ(n/p)-time procedure yields an embedding with distortion k^2p, where k is the edit distance of the original strings. Specifically, the Hamming distance of the resultant strings is between k/p and k^2 with good probability. This generalizes the linear-time embedding of [Chakraborty, Goldenberg, Koucky, STOC 2016], where the resultant Hamming distance is between k and k^2. Our algorithm is based on a random walk over samples, which we believe will find other applications in sublinear-time algorithms.
READ FULL TEXT