Plug Play Directed Evolution of Proteins with Gradient-based Discrete MCMC

12/20/2022
by   Patrick Emami, et al.
0

A long-standing goal of machine-learning-based protein engineering is to accelerate the discovery of novel mutations that improve the function of a known protein. We introduce a sampling framework for evolving proteins in silico that supports mixing and matching a variety of unsupervised models, such as protein language models, and supervised models that predict protein function from sequence. By composing these models, we aim to improve our ability to evaluate unseen mutations and constrain search to regions of sequence space likely to contain functional proteins. Our framework achieves this without any model fine-tuning or re-training by constructing a product of experts distribution directly in discrete protein space. Instead of resorting to brute force search or random sampling, which is typical of classic directed evolution, we introduce a fast MCMC sampler that uses gradients to propose promising mutations. We conduct in silico directed evolution experiments on wide fitness landscapes and across a range of different pre-trained unsupervised models, including a 650M parameter protein language model. Our results demonstrate an ability to efficiently discover variants with high evolutionary likelihood as well as estimated activity multiple mutations away from a wild type protein, suggesting our sampler provides a practical and effective new paradigm for machine-learning-based protein engineering.

READ FULL TEXT

page 13

page 18

research
03/29/2023

ProtFIM: Fill-in-Middle Protein Sequence Design via Protein Language Models

Protein language models (pLMs), pre-trained via causal language modeling...
research
06/10/2021

Adaptive machine learning for protein engineering

Machine-learning models that learn from data to predict how protein sequ...
research
05/19/2022

ODBO: Bayesian Optimization with Search Space Prescreening for Directed Protein Evolution

Directed evolution is a versatile technique in protein engineering that ...
research
02/08/2022

Using Genetic Programming to Predict and Optimize Protein Function

Protein engineers conventionally use tools such as Directed Evolution to...
research
06/06/2023

Mathematics-assisted directed evolution and protein engineering

Directed evolution is a molecular biology technique that is transforming...
research
05/09/2022

Multi-segment preserving sampling for deep manifold sampler

Deep generative modeling for biological sequences presents a unique chal...
research
06/05/2022

Bandit Theory and Thompson Sampling-Guided Directed Evolution for Sequence Optimization

Directed Evolution (DE), a landmark wet-lab method originated in 1960s, ...

Please sign up or login with your details

Forgot password? Click here to reset