Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction

08/20/2022
by   Jun Zhang, et al.
17

Data-driven predictive methods which can efficiently and accurately transform protein sequences into biologically active structures are highly valuable for scientific research and therapeutical development. Determining accurate folding landscape using co-evolutionary information is fundamental to the success of modern protein structure prediction methods. As the state of the art, AlphaFold2 has dramatically raised the accuracy without performing explicit co-evolutionary analysis. Nevertheless, its performance still shows strong dependence on available sequence homologs. We investigated the cause of such dependence and presented EvoGen, a meta generative model, to remedy the underperformance of AlphaFold2 for poor MSA targets. EvoGen allows us to manipulate the folding landscape either by denoising the searched MSA or by generating virtual MSA, and helps AlphaFold2 fold accurately in low-data regime or even achieve encouraging performance with single-sequence predictions. Being able to make accurate predictions with few-shot MSA not only generalizes AlphaFold2 better for orphan sequences, but also democratizes its use for high-throughput applications. Besides, EvoGen combined with AlphaFold2 yields a probabilistic structure generation method which could explore alternative conformations of protein sequences, and the task-aware differentiable algorithm for sequence generation will benefit other related tasks including protein design.

READ FULL TEXT

page 7

page 11

research
06/02/2023

Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation

The field of protein folding research has been greatly advanced by deep ...
research
04/03/2022

Few Shot Protein Generation

We present the MSA-to-protein transformer, a generative model of protein...
research
05/11/2021

EBM-Fold: Fully-Differentiable Protein Folding Powered by Energy-based Models

Accurate protein structure prediction from amino-acid sequences is criti...
research
05/31/2023

AbODE: Ab Initio Antibody Design using Conjoined ODEs

Antibodies are Y-shaped proteins that neutralize pathogens and constitut...
research
06/24/2021

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

Designing novel protein sequences for a desired 3D topological fold is a...
research
11/30/2022

xTrimoABFold: Improving Antibody Structure Prediction without Multiple Sequence Alignments

In the field of antibody engineering, an essential task is to design a n...
research
01/28/2023

Physics-Inspired Protein Encoder Pre-Training via Siamese Sequence-Structure Diffusion Trajectory Prediction

Pre-training methods on proteins are recently gaining interest, leveragi...

Please sign up or login with your details

Forgot password? Click here to reset