Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

06/24/2021
by   Yue Cao, et al.
12

Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering. Challenges exist due to the complex sequence–fold relationship, as well as the difficulties to capture the diversity of the sequences (therefore structures and functions) within a fold. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific target fold. To model the complex sequence–structure relationship, Fold2Seq jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. On test sets with single, high-resolution and complete structure inputs for individual folds, our experiments demonstrate improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design, when compared to existing state-of-the-art methods that include data-driven deep generative models and physics-based RosettaDesign. The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges originating from low-quality, incomplete, or ambiguous input structures. Source code and data are available at https://github.com/IBM/fold2seq.

READ FULL TEXT
research
11/12/2021

Benchmarking deep generative models for diverse antibody sequence design

Computational protein design, i.e. inferring novel and diverse protein s...
research
10/31/2017

Designing RNA Secondary Structures is Hard

An RNA sequence is a word over an alphabet on four elements {A,C,G,U} ca...
research
05/27/2022

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design

Our world is ambiguous and this is reflected in the data we use to train...
research
10/05/2022

AlphaFold Distillation for Improved Inverse Protein Folding

Inverse protein folding, i.e., designing sequences that fold into a give...
research
01/29/2019

Conditioning by adaptive sampling for robust design

We present a new method for design problems wherein the goal is to maxim...
research
08/20/2022

Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction

Data-driven predictive methods which can efficiently and accurately tran...
research
05/18/2023

Vaxformer: Antigenicity-controlled Transformer for Vaccine Design Against SARS-CoV-2

The SARS-CoV-2 pandemic has emphasised the importance of developing a un...

Please sign up or login with your details

Forgot password? Click here to reset