Structure-aware Protein Self-supervised Learning

04/06/2022
by   Can Chen, et al.
32

Protein representation learning methods have shown great potential to yield useful representation for many downstream tasks, especially on protein classification. Moreover, a few recent studies have shown great promise in addressing insufficient labels of proteins with self-supervised learning methods. However, existing protein language models are usually pretrained on protein sequences without considering the important protein structural information. To this end, we propose a novel structure-aware protein self-supervised learning method to effectively capture structural information of proteins. In particular, a well-designed graph neural network (GNN) model is pretrained to preserve the protein structural information with self-supervised tasks from a pairwise residue distance perspective and a dihedral angle perspective, respectively. Furthermore, we propose to leverage the available protein language model pretrained on protein sequences to enhance the self-supervised learning. Specifically, we identify the relation between the sequential information in the protein language model and the structural information in the specially designed GNN model via a novel pseudo bi-level optimization scheme. Experiments on several supervised downstream tasks verify the effectiveness of our proposed method.

READ FULL TEXT

page 2

page 6

research
01/23/2023

Self-Supervised Image Representation Learning: Transcending Masking with Paired Image Overlay

Self-supervised learning has become a popular approach in recent years f...
research
11/18/2022

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Despite being self-supervised, protein language models have shown remark...
research
05/07/2023

Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins

We report a flexible language-model based deep learning strategy, applie...
research
06/19/2019

Evaluating Protein Transfer Learning with TAPE

Protein modeling is an increasingly popular area of machine learning res...
research
10/20/2022

Towards Sustainable Self-supervised Learning

Although increasingly training-expensive, most self-supervised learning ...
research
09/30/2019

Spread-gram: A spreading-activation schema of network structural learning

Network representation learning has exploded recently. However, existing...
research
07/28/2022

HelixFold-Single: MSA-free Protein Structure Prediction by Using Protein Language Model as an Alternative

AI-based protein structure prediction pipelines, such as AlphaFold2, hav...

Please sign up or login with your details

Forgot password? Click here to reset