Importance Weighted Expectation-Maximization for Protein Sequence Design

04/30/2023
by   Zhenqiao Song, et al.
0

Designing protein sequences with desired biological function is crucial in biology and chemistry. Recent machine learning methods use a surrogate sequence-function model to replace the expensive wet-lab validation. How can we efficiently generate diverse and novel protein sequences with high fitness? In this paper, we propose IsEM-Pro, an approach to generate protein sequences towards a given fitness criterion. At its core, IsEM-Pro is a latent generative model, augmented by combinatorial structure features from a separately learned Markov random fields (MRFs). We develop an Monte Carlo Expectation-Maximization method (MCEM) to learn the model. During inference, sampling from its latent space enhances diversity while its MRFs features guide the exploration in high fitness regions. Experiments on eight protein sequence design tasks show that our IsEM-Pro outperforms the previous best methods by at least 55 fitness score and generates more diverse and novel protein sequences.

READ FULL TEXT
research
01/24/2022

Guided Generative Protein Design using Regularized Transformers

The development of powerful natural language models have increased the a...
research
07/02/2023

Optimizing protein fitness using Gibbs sampling with Graph-based Smoothing

The ability to design novel proteins with higher fitness on a given task...
research
05/22/2020

Fast differentiable DNA and protein sequence optimization for molecular design

Designing DNA and protein sequences with improved or novel function has ...
research
08/20/2023

SBSM-Pro: Support Bio-sequence Machine for Proteins

Proteins play a pivotal role in biological systems. The use of machine l...
research
03/04/2019

Two-level protein folding optimization on a three-dimensional AB off-lattice model

This paper presents a two-level protein folding optimization on a three-...
research
12/09/2017

Variational auto-encoding of protein sequences

Proteins are responsible for the most diverse set of functions in biolog...
research
10/09/2021

Iterative Refinement Graph Neural Network for Antibody Sequence-Structure Co-design

Antibodies are versatile proteins that bind to pathogens like viruses an...

Please sign up or login with your details

Forgot password? Click here to reset