Conditioning by adaptive sampling for robust design

01/29/2019
by   David H. Brookes, et al.
0

We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic' inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2018

Design by adaptive sampling

We present a probabilistic modeling framework and adaptive sampling algo...
research
03/18/2023

Protein Sequence Design with Batch Bayesian Optimisation

Protein sequence design is a challenging problem in protein engineering,...
research
05/31/2023

Protein Design with Guided Discrete Diffusion

A popular approach to protein design is to combine a generative model wi...
research
06/24/2021

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

Designing novel protein sequences for a desired 3D topological fold is a...
research
12/31/2019

Model Inversion Networks for Model-Based Optimization

In this work, we aim to solve data-driven optimization problems, where t...
research
06/14/2020

Autofocused oracles for model-based design

Data-driven design is making headway into a number of application areas,...
research
08/31/2023

Boosting AND/OR-Based Computational Protein Design: Dynamic Heuristics and Generalizable UFO

Scientific computing has experienced a surge empowered by advancements i...

Please sign up or login with your details

Forgot password? Click here to reset