Protein Representation Learning by Geometric Structure Pretraining

03/11/2022
by   Zuobai Zhang, et al.
80

Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid sequences and then finetune the models with some labeled data in downstream tasks. Despite the effectiveness of sequence-based approaches, the power of pretraining on smaller numbers of known protein structures has not been explored for protein property prediction, though protein structures are known to be determinants of protein function. We first present a simple yet effective encoder to learn protein geometry features. We pretrain the protein graph encoder by leveraging multiview contrastive learning and different self-prediction tasks. Experimental results on both function prediction and fold classification tasks show that our proposed pretraining methods outperform or are on par with the state-of-the-art sequence-based methods using much less data. All codes and models will be published upon acceptance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2023

Retrieved Sequence Augmentation for Protein Representation Learning

Protein language models have excelled in a variety of tasks, ranging fro...
research
05/31/2022

Contrastive Representation Learning for 3D Protein Structures

Learning from 3D protein structures has gained wide interest in protein ...
research
07/26/2022

Learning Protein Representations via Complete 3D Graph Networks

We consider representation learning for proteins with 3D structures. We ...
research
06/30/2011

On Prediction Using Variable Order Markov Models

This paper is concerned with algorithms for prediction of discrete seque...
research
01/28/2023

Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction

Pre-training methods on proteins are recently gaining interest, leveragi...
research
05/16/2021

Protein sequence-to-structure learning: Is this the end(-to-end revolution)?

The potential of deep learning has been recognized in the protein struct...
research
11/18/2022

Protein language model rescue mutations highlight variant effects and structure in clinically relevant genes

Despite being self-supervised, protein language models have shown remark...

Please sign up or login with your details

Forgot password? Click here to reset