Multi-segment preserving sampling for deep manifold sampler

05/09/2022
by   Daniel Berenberg, et al.
5

Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guided sampling procedure, multi-segment preserving sampling, that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We present its effectiveness in the context of antibody design by training two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/06/2023

Conditional Generation of Paired Antibody Chain Sequences through Encoder-Decoder Language Model

Protein language models (LMs) have been successful in sequence, structur...
research
11/12/2021

Benchmarking deep generative models for diverse antibody sequence design

Computational protein design, i.e. inferring novel and diverse protein s...
research
12/20/2022

Plug Play Directed Evolution of Proteins with Gradient-based Discrete MCMC

A long-standing goal of machine-learning-based protein engineering is to...
research
12/23/2019

Properties of Chromy's sampling procedure

Chromy (1979) proposed a unequal probability sampling algorithm, which e...
research
05/31/2023

Protein Design with Guided Discrete Diffusion

A popular approach to protein design is to combine a generative model wi...
research
02/13/2022

Improved analysis for a proximal algorithm for sampling

We study the proximal sampler of Lee, Shen, and Tian (2021) and obtain n...
research
12/20/2018

Bayesian Manifold-Constrained-Prior Model for an Experiment to Locate Xce

We propose an analysis for a novel experiment intended to locate the gen...

Please sign up or login with your details

Forgot password? Click here to reset