Bidirectional Learning for Offline Model-based Biological Sequence Design

01/07/2023
by   Can Chen, et al.
0

Offline model-based optimization aims to maximize a black-box objective function with a static dataset of designs and their scores. In this paper, we focus on biological sequence design to maximize some sequence score. A recent approach employs bidirectional learning, combining a forward mapping for exploitation and a backward mapping for constraint, and it relies on the neural tangent kernel (NTK) of an infinitely wide network to build a proxy model. Though effective, the NTK cannot learn features because of its parametrization, and its use prevents the incorporation of powerful pre-trained Language Models (LMs) that can capture the rich biophysical information in millions of biological sequences. We adopt an alternative proxy model, adding a linear head to a pre-trained LM, and propose a linearization scheme. This yields a closed-form loss and also takes into account the biophysical information in the pre-trained LM. In addition, the forward mapping and the backward mapping play different roles and thus deserve different weights during sequence optimization. To achieve this, we train an auxiliary model and leverage its weak supervision signal via a bi-level optimization framework to effectively learn how to balance the two mappings. Further, by extending the framework, we develop the first learning rate adaptation module \textit{Adaptive}-$\eta$, which is compatible with all gradient-based algorithms for offline model-based optimization. Experimental results on DNA/protein sequence design tasks verify the effectiveness of our algorithm. Our code is available~\href{https://anonymous.4open.science/r/BIB-ICLR2023-Submission/README.md}{here.}

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2022

Bidirectional Learning for Offline Infinite-width Model-based Optimization

In offline model-based optimization, we strive to maximize a black-box o...
research
06/05/2023

Bootstrapped Training of Score-Conditioned Generator for Offline Design of Biological Sequences

We study the problem of optimizing biological sequences, e.g., proteins,...
research
09/20/2023

Parallel-mentoring for Offline Model-based Optimization

We study offline model-based optimization to maximize a black-box object...
research
10/27/2021

RoMA: Robust Model Adaptation for Offline Model-based Optimization

We consider the problem of searching an input maximizing a black-box obj...
research
02/17/2022

Design-Bench: Benchmarks for Data-Driven Offline Model-Based Optimization

Black-box model-based optimization (MBO) problems, where the goal is to ...
research
08/16/2023

PEvoLM: Protein Sequence Evolutionary Information Language Model

With the exponential increase of the protein sequence databases over tim...

Please sign up or login with your details

Forgot password? Click here to reset