Prompt-Guided Injection of Conformation to Pre-trained Protein Model

by   Qiang Zhang, et al.

Pre-trained protein models (PTPMs) represent a protein with one fixed embedding and thus are not capable for diverse tasks. For example, protein structures can shift, namely protein folding, between several conformations in various biological processes. To enable PTPMs to produce task-aware representations, we propose to learn interpretable, pluggable and extensible protein prompts as a way of injecting task-related knowledge into PTPMs. In this regard, prior PTPM optimization with the masked language modeling task can be interpreted as learning a sequence prompt (Seq prompt) that enables PTPMs to capture the sequential dependency between amino acids. To incorporate conformational knowledge to PTPMs, we propose an interaction-conformation prompt (IC prompt) that is learned through back-propagation with the protein-protein interaction task. As an instantiation, we present a conformation-aware pre-trained protein model that learns both sequence and interaction-conformation prompts in a multi-task setting. We conduct comprehensive experiments on nine protein datasets. Results confirm our expectation that using the sequence prompt does not hurt PTPMs' performance on sequence-related tasks while incorporating the interaction-conformation prompt significantly improves PTPMs' performance on tasks where conformational knowledge counts. We also show the learned prompts can be combined and extended to deal with new complex tasks.


ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models

Protein language models (pLMs), pre-trained via causal language modeling...

OntoProtein: Protein Pretraining With Gene Ontology Embedding

Self-supervised protein language models have proved their effectiveness ...

Predicting protein variants with equivariant graph neural networks

Pre-trained models have been successful in many protein engineering task...

Incorporating network based protein complex discovery into automated model construction

We propose a method for gene expression based analysis of cancer phenoty...

Dock2D: Synthetic data for the molecular recognition problem

Predicting the physical interaction of proteins is a cornerstone problem...

Bio-JOIE: Joint Representation Learning of Biological Knowledge Bases

The widespread of Coronavirus has led to a worldwide pandemic with a hig...

Align-gram : Rethinking the Skip-gram Model for Protein Sequence Analysis

Background: The inception of next generations sequencing technologies ha...

Please sign up or login with your details

Forgot password? Click here to reset