DeepAI AI Chat
Log In Sign Up

Multi-segment preserving sampling for deep manifold sampler

by   Daniel Berenberg, et al.

Deep generative modeling for biological sequences presents a unique challenge in reconciling the bias-variance trade-off between explicit biological insight and model flexibility. The deep manifold sampler was recently proposed as a means to iteratively sample variable-length protein sequences by exploiting the gradients from a function predictor. We introduce an alternative approach to this guided sampling procedure, multi-segment preserving sampling, that enables the direct inclusion of domain-specific knowledge by designating preserved and non-preserved segments along the input sequence, thereby restricting variation to only select regions. We present its effectiveness in the context of antibody design by training two models: a deep manifold sampler and a GPT-2 language model on nearly six million heavy chain sequences annotated with the IGHV1-18 gene. During sampling, we restrict variation to only the complementarity-determining region 3 (CDR3) of the input. We obtain log probability scores from a GPT-2 model for each sampled CDR3 and demonstrate that multi-segment preserving sampling generates reasonable designs while maintaining the desired, preserved regions.


page 1

page 2

page 3

page 4


Conditional Generation of Paired Antibody Chain Sequences through Encoder-Decoder Language Model

Protein language models (LMs) have been successful in sequence, structur...

Benchmarking deep generative models for diverse antibody sequence design

Computational protein design, i.e. inferring novel and diverse protein s...

Plug Play Directed Evolution of Proteins with Gradient-based Discrete MCMC

A long-standing goal of machine-learning-based protein engineering is to...

CoSam: An Efficient Collaborative Adaptive Sampler for Recommendation

Sampling strategies have been widely applied in many recommendation syst...

A Pareto-optimal compositional energy-based model for sampling and optimization of protein sequences

Deep generative models have emerged as a popular machine learning-based ...

Fast differentiable DNA and protein sequence optimization for molecular design

Designing DNA and protein sequences with improved or novel function has ...

Evaluation of Sampling Methods for Scatterplots

Given a scatterplot with tens of thousands of points or even more, a nat...