Soft edit distance for differentiable comparison of symbolic sequences

04/29/2019
by   Evgenii Ofitserov, et al.
0

Edit distance, also known as Levenshtein distance, is an essential way to compare two strings that proved to be particularly useful in the analysis of genetic sequences and natural language processing. However, edit distance is a discrete function that is known to be hard to optimize. This fact hampers the use of this metric in Machine Learning. Even as simple algorithm as K-means fails to cluster a set of sequences using edit distance if they are of variable length and abundance. In this paper we propose a novel metric - soft edit distance (SED), which is a smooth approximation of edit distance. It is differentiable and therefore it is possible to optimize it with gradient methods. Similar to original edit distance, SED as well as its derivatives can be calculated with recurrent formulas at polynomial time. We prove usefulness of the proposed metric on synthetic datasets and clustering of biological sequences.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2013

Towards Normalizing the Edit Distance Using a Genetic Algorithms Based Scheme

The normalized edit distance is one of the distances derived from the ed...
research
04/21/2022

Time Window Frechet and Metric-Based Edit Distance for Passively Collected Trajectories

The advances of modern localization techniques and the wide spread of mo...
research
01/25/2018

An Integrated Soft Computing Approach to a Multi-biometric Security Model

The abstract of the thesis consists of three sections, videlicet, Moti...
research
03/20/2023

On the Maximal Independent Sets of k-mers with the Edit Distance

In computational biology, k-mers and edit distance are fundamental conce...
research
12/04/2022

Clustering Permutations: New Techniques with Streaming Applications

We study the classical metric k-median clustering problem over a set of ...
research
07/26/2022

Tree edit distance for hierarchical data compatible with HMIL paradigm

We define edit distance for hierarchically structured data compatible wi...
research
08/16/2020

Discovering Lexical Similarity Through Articulatory Feature-based Phonetic Edit Distance

Lexical Similarity (LS) between two languages uncovers many interesting ...

Please sign up or login with your details

Forgot password? Click here to reset