Few Shot Protein Generation

04/03/2022
by   Soumya Ram, et al.
0

We present the MSA-to-protein transformer, a generative model of protein sequences conditioned on protein families represented by multiple sequence alignments (MSAs). Unlike existing approaches to learning generative models of protein families, the MSA-to-protein transformer conditions sequence generation directly on a learned encoding of the multiple sequence alignment, circumventing the need for fitting dedicated family models. By training on a large set of well-curated multiple sequence alignments in Pfam, our MSA-to-protein transformer generalizes well to protein families not observed during training and outperforms conventional family modeling approaches, especially when MSAs are small. Our generative approach accurately models epistasis and indels and allows for exact inference and efficient sampling unlike other approaches. We demonstrate the protein sequence modeling capabilities of our MSA-to-protein transformer and compare it with alternative sequence modeling approaches in comprehensive benchmark experiments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/09/2023

PoET: A generative model of protein families as sequences-of-sequences

Generative protein language models are a natural way to design new prote...
research
05/26/2022

Protein Structure and Sequence Generation with Equivariant Denoising Diffusion Probabilistic Models

Proteins are macromolecules that mediate a significant fraction of the c...
research
02/01/2022

Regression Transformer: Concurrent Conditional Generation and Regression by Blending Numerical and Textual Tokens

We report the Regression Transformer (RT), a method that abstracts regre...
research
05/31/2023

AbODE: Ab Initio Antibody Design using Conjoined ODEs

Antibodies are Y-shaped proteins that neutralize pathogens and constitut...
research
05/27/2022

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

The ability to accurately model the fitness landscape of protein sequenc...
research
05/11/2022

RITA: a Study on Scaling Up Generative Protein Sequence Models

In this work we introduce RITA: a suite of autoregressive generative mod...
research
08/20/2022

Few-Shot Learning of Accurate Folding Landscape for Protein Structure Prediction

Data-driven predictive methods which can efficiently and accurately tran...

Please sign up or login with your details

Forgot password? Click here to reset