DeepAI AI Chat
Log In Sign Up

Comparing two deep learning sequence-based models for protein-protein interaction prediction

by   Florian Richoux, et al.

Biological data are extremely diverse, complex but also quite sparse. The recent developments in deep learning methods are offering new possibilities for the analysis of complex data. However, it is easy to be get a deep learning model that seems to have good results but is in fact either overfitting the training data or the validation data. In particular, the fact to overfit the validation data, called "information leak", is almost never treated in papers proposing deep learning models to predict protein-protein interactions (PPI). In this work, we compare two carefully designed deep learning models and show pitfalls to avoid while predicting PPIs through machine learning methods. Our best model predicts accurately more than 78 conditions both for training and testing. The methodology we propose here allow us to have strong confidences about the ability of a model to scale up on larger datasets. This would allow sharper models when larger datasets would be available, rather than current models prone to information leaks. Our solid methodological foundations shall be applicable to more organisms and whole proteome networks predictions.


page 1

page 2

page 3

page 4


Do Deep Learning Models Really Outperform Traditional Approaches in Molecular Docking?

Molecular docking, given a ligand molecule and a ligand binding site (ca...

Deep Learning Methods for Protein Family Classification on PDB Sequencing Data

Composed of amino acid chains that influence how they fold and thus dict...

Biomedical data and deep learning computational models for predicting compound-protein relations

Researchers have developed a computational field called virtual screenin...

When does deep learning fail and how to tackle it? A critical analysis on polymer sequence-property surrogate models

Deep learning models are gaining popularity and potency in predicting po...

Mass Personalization of Deep Learning

We discuss training techniques, objectives and metrics toward mass perso...

Learning multi-scale functional representations of proteins from single-cell microscopy data

Protein function is inherently linked to its localization within the cel...

Machine learning modeling of family wide enzyme-substrate specificity screens

Biocatalysis is a promising approach to sustainably synthesize pharmaceu...