Log In Sign Up

Deep Squared Euclidean Approximation to the Levenshtein Distance for DNA Storage

by   Alan J. X. Guo, et al.

Storing information in DNA molecules is of great interest because of its advantages in longevity, high storage density, and low maintenance cost. A key step in the DNA storage pipeline is to efficiently cluster the retrieved DNA sequences according to their similarities. Levenshtein distance is the most suitable metric on the similarity between two DNA sequences, but it is inferior in terms of computational complexity and less compatible with mature clustering algorithms. In this work, we propose a novel deep squared Euclidean embedding for DNA sequences using Siamese neural network, squared Euclidean embedding, and chi-squared regression. The Levenshtein distance is approximated by the squared Euclidean distance between the embedding vectors, which is fast calculated and clustering algorithm friendly. The proposed approach is analyzed theoretically and experimentally. The results show that the proposed embedding is efficient and robust.


A Novel Method for Comparative Analysis of DNA Sequences by Ramanujan-Fourier Transform

Alignment-free sequence analysis approaches provide important alternativ...

Unaligned Sequence Similarity Search Using Deep Learning

Gene annotation has traditionally required direct comparison of DNA sequ...

Robust Multi-Read Reconstruction from Contaminated Clusters Using Deep Neural Network for DNA Storage

DNA has immense potential as an emerging data storage medium. The princi...

Tackling Early Sparse Gradients in Softmax Activation Using Leaky Squared Euclidean Distance

Softmax activation is commonly used to output the probability distributi...

Deep DNA Storage: Scalable and Robust DNA Storage via Coding Theory and Deep Learning

The concept of DNA storage was first suggested in 1959 by Richard Feynma...

Efficiently Supporting Hierarchy and Data Updates in DNA Storage

We propose a novel and flexible DNA-storage architecture that provides t...

Image processing in DNA

The main obstacles for the practical deployment of DNA-based data storag...