RITA: a Study on Scaling Up Generative Protein Sequence Models

05/11/2022
by   Daniel Hesslow, et al.
12

In this work we introduce RITA: a suite of autoregressive generative models for protein sequences, with up to 1.2 billion parameters, trained on over 280 million protein sequences belonging to the UniRef-100 database. Such generative models hold the promise of greatly accelerating protein design. We conduct the first systematic study of how capabilities evolve with model size for autoregressive transformers in the protein domain: we evaluate RITA models in next amino acid prediction, zero-shot fitness, and enzyme function prediction, showing benefits from increased scale. We release the RITA models openly, to the benefit of the research community.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2021

Protein sequence design with deep generative models

Protein engineering seeks to identify protein sequences with optimized p...
research
04/03/2022

Few Shot Protein Generation

We present the MSA-to-protein transformer, a generative model of protein...
research
05/27/2022

Tranception: protein fitness prediction with autoregressive transformers and inference-time retrieval

The ability to accurately model the fitness landscape of protein sequenc...
research
01/24/2022

Guided Generative Protein Design using Regularized Transformers

The development of powerful natural language models have increased the a...
research
11/12/2021

Using Deep Learning Sequence Models to Identify SARS-CoV-2 Divergence

SARS-CoV-2 is an upper respiratory system RNA virus that has caused over...
research
06/02/2014

More Bang For Your Buck: Quorum-Sensing Capabilities Improve the Efficacy of Suicidal Altruism

Within the context of evolution, an altruistic act that benefits the rec...
research
07/07/2021

Deep Extrapolation for Attribute-Enhanced Generation

Attribute extrapolation in sample generation is challenging for deep neu...

Please sign up or login with your details

Forgot password? Click here to reset