AlphaFold Distillation for Improved Inverse Protein Folding

10/05/2022
by   Igor Melnyk, et al.
6

Inverse protein folding, i.e., designing sequences that fold into a given three-dimensional structure, is one of the fundamental design challenges in bio-engineering and drug discovery. Traditionally, inverse folding mainly involves learning from sequences that have an experimentally resolved structure. However, the known structures cover only a tiny space of the protein sequences, imposing limitations on the model learning. Recently proposed forward folding models, e.g., AlphaFold, offer unprecedented opportunity for accurate estimation of the structure given a protein sequence. Naturally, incorporating a forward folding model as a component of an inverse folding approach offers the potential of significantly improving the inverse folding, as the folding model can provide a feedback on any generated sequence in the form of the predicted protein structure or a structural confidence metric. However, at present, these forward folding models are still prohibitively slow to be a part of the model optimization loop during training. In this work, we propose to perform knowledge distillation on the folding model's confidence metrics, e.g., pTM or pLDDT scores, to obtain a smaller, faster and end-to-end differentiable distilled model, which then can be included as part of the structure consistency regularized inverse folding model training. Moreover, our regularization technique is general enough and can be applied in other design tasks, e.g., sequence-based protein infilling. Extensive experiments show a clear benefit of our method over the non-regularized baselines. For example, in inverse folding design problems we observe up to 3 recovery and up to 45 structural consistency of the generated sequences.

READ FULL TEXT

page 8

page 9

research
11/12/2021

Benchmarking deep generative models for diverse antibody sequence design

Computational protein design, i.e. inferring novel and diverse protein s...
research
06/24/2021

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

Designing novel protein sequences for a desired 3D topological fold is a...
research
06/24/2022

PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction

Proteins are essential component of human life and their structures are ...
research
05/07/2023

Generative Pretrained Autoregressive Transformer Graph Neural Network applied to the Analysis and Discovery of Novel Proteins

We report a flexible language-model based deep learning strategy, applie...
research
05/31/2023

AbODE: Ab Initio Antibody Design using Conjoined ODEs

Antibodies are Y-shaped proteins that neutralize pathogens and constitut...
research
01/15/2022

StemP: A fast and deterministic Stem-graph approach for RNA and protein folding prediction

We propose a new deterministic methodology to predict RNA sequence and p...
research
05/16/2021

Protein sequence-to-structure learning: Is this the end(-to-end revolution)?

The potential of deep learning has been recognized in the protein struct...

Please sign up or login with your details

Forgot password? Click here to reset