DIPS-Plus: The Enhanced Database of Interacting Protein Structures for Interface Prediction

10/16/2021
by   Alex Morehead, et al.
0

How and where proteins interface with one another can ultimately impact the proteins' functions along with a range of other biological processes. As such, precise computational methods for protein interface prediction (PIP) come highly sought after as they could yield significant advances in drug discovery and design as well as protein function analysis. However, the traditional benchmark dataset for this task, Docking Benchmark 5 (DB5), contains only a modest 230 complexes for training, validating, and testing different machine learning algorithms. In this work, we expand on a dataset recently introduced for this task, the Database of Interacting Protein Structures (DIPS), to present DIPS-Plus, an enhanced, feature-rich dataset of 42,112 complexes for geometric deep learning of protein interfaces. The previous version of DIPS contains only the Cartesian coordinates and types of the atoms comprising a given protein complex, whereas DIPS-Plus now includes a plethora of new residue-level features including protrusion indices, half-sphere amino acid compositions, and new profile hidden Markov model (HMM)-based sequence features for each amino acid, giving researchers a large, well-curated feature bank for training protein interface prediction methods. We demonstrate through rigorous benchmarks that training an existing state-of-the-art (SOTA) model for PIP on DIPS-Plus yields SOTA results, surpassing the performance of all other models trained on residue-level and atom-level encodings of protein complexes to date.

READ FULL TEXT
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

10/16/2021

Geometric Transformers for Protein Interface Contact Prediction

Computational methods for predicting the interface contacts between prot...
05/08/2021

MEGADOCK-GUI: a GUI-based complete cross-docking tool for exploring protein-protein interactions

Information on protein-protein interactions (PPIs) not only advances our...
08/23/2021

APObind: A Dataset of Ligand Unbound Protein Conformations for Machine Learning Applications in De Novo Drug Design

Protein-ligand complex structures have been utilised to design benchmark...
07/08/2021

Network and Sequence-Based Prediction of Protein-Protein Interactions

Background:Typically, proteins perform key biological functions by inter...
07/03/2018

Generalizable Protein Interface Prediction with End-to-End Learning

Predicting how proteins interact with one another - that is, which surfa...
11/05/2019

OMXWare, A Cloud-Based Platform for Studying Microbial Life at Scale

The rapid growth in biological sequence data is revolutionizing our unde...
07/09/2019

Multiscale Visual Drilldown for the Analysis of Large Ensembles of Multi-Body Protein Complexes

When studying multi-body protein complexes, biochemists use computationa...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.