Neural Distance Embeddings for Biological Sequences

09/20/2021
by   Gabriele Corso, et al.
38

The development of data-dependent heuristics and representations for biological sequences that reflect their evolutionary distance is critical for large-scale biological research. However, popular machine learning approaches, based on continuous Euclidean spaces, have struggled with the discrete combinatorial formulation of the edit distance that models evolution and the hierarchical relationship that characterises real-world datasets. We present Neural Distance Embeddings (NeuroSEED), a general framework to embed sequences in geometric vector spaces, and illustrate the effectiveness of the hyperbolic space that captures the hierarchical structure and provides an average 22 reduction in embedding RMSE against the best competing geometry. The capacity of the framework and the significance of these improvements are then demonstrated devising supervised and unsupervised NeuroSEED approaches to multiple core tasks in bioinformatics. Benchmarked with common baselines, the proposed approaches display significant accuracy and/or runtime improvements on real-world datasets. As an example for hierarchical clustering, the proposed pretrained and from-scratch methods match the quality of competing baselines with 30x and 15x runtime reduction, respectively.

READ FULL TEXT

page 5

page 7

page 9

page 17

page 18

research
10/29/2019

Hyperbolic Node Embedding for Signed Networks

The rapid evolving World Wide Web has produced a large amount of complex...
research
06/09/2018

Learning Continuous Hierarchies in the Lorentz Model of Hyperbolic Geometry

We are concerned with the discovery of hierarchical relationships from l...
research
04/03/2018

Hyperbolic Entailment Cones for Learning Hierarchical Embeddings

Learning graph representations via low-dimensional embeddings that prese...
research
05/30/2023

Hyperbolic Diffusion Embedding and Distance for Hierarchical Representation Learning

Finding meaningful representations and distances of hierarchical data is...
research
05/11/2021

Hermitian Symmetric Spaces for Graph Embeddings

Learning faithful graph representations as sets of vertex embeddings has...
research
02/03/2023

Unsupervised hierarchical clustering using the learning dynamics of RBMs

Datasets in the real world are often complex and to some degree hierarch...
research
05/02/2019

SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint

Domains such as scientific workflows and business processes exhibit data...

Please sign up or login with your details

Forgot password? Click here to reset