Multimodal Pre-Training Model for Sequence-based Prediction of Protein-Protein Interaction

12/09/2021
by   Yang Xue, et al.
0

Protein-protein interactions (PPIs) are essentials for many biological processes where two or more proteins physically bind together to achieve their functions. Modeling PPIs is useful for many biomedical applications, such as vaccine design, antibody therapeutics, and peptide drug discovery. Pre-training a protein model to learn effective representation is critical for PPIs. Most pre-training models for PPIs are sequence-based, which naively adopt the language models used in natural language processing to amino acid sequences. More advanced works utilize the structure-aware pre-training technique, taking advantage of the contact maps of known protein structures. However, neither sequences nor contact maps can fully characterize structures and functions of the proteins, which are closely related to the PPI problem. Inspired by this insight, we propose a multimodal protein pre-training model with three modalities: sequence, structure, and function (S2F). Notably, instead of using contact maps to learn the amino acid-level rigid structures, we encode the structure feature with the topology complex of point clouds of heavy atoms. It allows our model to learn structural information about not only the backbones but also the side chains. Moreover, our model incorporates the knowledge from the functional description of proteins extracted from literature or manual annotations. Our experiments show that the S2F learns protein embeddings that achieve good performances on a variety of PPIs tasks, including cross-species PPI, antibody-antigen affinity prediction, antibody neutralization prediction for SARS-CoV-2, and mutation-driven binding affinity change prediction.

READ FULL TEXT

page 4

page 12

research
03/11/2023

Enhancing Protein Language Models with Structure-based Encoder and Pre-training

Protein language models (PLMs) pre-trained on large-scale protein sequen...
research
06/08/2023

Multi-task Bioassay Pre-training for Protein-ligand Binding Affinity Prediction

Protein-ligand binding affinity (PLBA) prediction is the fundamental tas...
research
01/28/2023

Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction

Pre-training methods on proteins are recently gaining interest, leveragi...
research
09/16/2020

PANDA: Predicting the change in proteins binding affinity upon mutations using sequence information

Accurately determining a change in protein binding affinity upon mutatio...
research
01/28/2023

ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts

Current protein language models (PLMs) learn protein representations mai...
research
01/16/2023

Ankh: Optimized Protein Language Model Unlocks General-Purpose Modelling

As opposed to scaling-up protein language models (PLMs), we seek improvi...
research
11/06/2022

An Efficient MCMC Approach to Energy Function Optimization in Protein Structure Prediction

Protein structure prediction is a critical problem linked to drug design...

Please sign up or login with your details

Forgot password? Click here to reset