Learned Indexing in Proteins: Extended Work on Substituting Complex Distance Calculations with Embedding and Clustering Techniques

08/18/2022
by   Jaroslav Oľha, et al.
0

Despite the constant evolution of similarity searching research, it continues to face the same challenges stemming from the complexity of the data, such as the curse of dimensionality and computationally expensive distance functions. Various machine learning techniques have proven capable of replacing elaborate mathematical models with combinations of simple linear functions, often gaining speed and simplicity at the cost of formal guarantees of accuracy and correctness of querying. The authors explore the potential of this research trend by presenting a lightweight solution for the complex problem of 3D protein structure search. The solution consists of three steps – (i) transformation of 3D protein structural information into very compact vectors, (ii) use of a probabilistic model to group these vectors and respond to queries by returning a given number of similar objects, and (iii) a final filtering step which applies basic vector distance functions to refine the result.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/20/2022

Learning Geometrically Disentangled Representations of Protein Folding Simulations

Massive molecular simulations of drug-target proteins have been used as ...
research
10/26/2011

Structural Similarity and Distance in Learning

We propose a novel method of introducing structure into existing machine...
research
05/28/2020

A Practical Index Structure Supporting Fréchet Proximity Queries Among Trajectories

We present a scalable approach for range and k nearest neighbor queries ...
research
01/31/2019

The SuperM-Tree: Indexing metric spaces with sized objects

A common approach to implementing similarity search applications is the ...
research
01/06/2020

Macromolecule Classification Based on the Amino-acid Sequence

Deep learning is playing a vital role in every field which involves data...
research
08/31/2020

Complex-valued embeddings of generic proximity data

Proximities are at the heart of almost all machine learning methods. If ...
research
11/19/2019

PDBMine: A Reformulation of the Protein Data Bank to Facilitate Structural Data Mining

Large scale initiatives such as the Human Genome Project, Structural Gen...

Please sign up or login with your details

Forgot password? Click here to reset