A Hierarchical Approach to Scaling Batch Active Search Over Structured Data

07/20/2020
by   Vivek Myers, et al.
0

Active search is the process of identifying high-value data points in a large and often high-dimensional parameter space that can be expensive to evaluate. Traditional active search techniques like Bayesian optimization trade off exploration and exploitation over consecutive evaluations, and have historically focused on single or small (<5) numbers of examples evaluated per round. As modern data sets grow, so does the need to scale active search to large data sets and batch sizes. In this paper, we present a general hierarchical framework based on bandit algorithms to scale active search to large batch sizes by maximizing information derived from the unique structure of each dataset. Our hierarchical framework, Hierarchical Batch Bandit Search (HBBS), strategically distributes batch selection across a learned embedding space by facilitating wide exploration of different structural elements within a dataset. We focus our application of HBBS on modern biology, where large batch experimentation is often fundamental to the research process, and demonstrate batch design of biological sequences (protein and DNA). We also present a new Gym environment to easily simulate diverse biological sequences and to enable more comprehensive evaluation of active search methods across heterogeneous data sets. The HBBS framework improves upon standard performance, wall-clock, and scalability benchmarks for batch search by using a broad exploration strategy across coarse partitions and fine-grained exploitation within each partition of structured data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/16/2023

Dynamic Exploration-Exploitation Trade-Off in Active Learning Regression with Bayesian Hierarchical Modeling

Active learning provides a framework to adaptively sample the most infor...
research
03/25/2021

Active Tree Search in Large POMDPs

Model-based planning and prospection are widely studied in both cognitiv...
research
06/02/2021

Parallelizing Thompson Sampling

How can we make use of information parallelism in online decision making...
research
09/05/2018

Controlled Random Search Improves Sample Mining and Hyper-Parameter Optimization

A common challenge in machine learning and related fields is the need to...
research
11/21/2018

Efficient nonmyopic active search with applications in drug and materials discovery

Active search is a learning paradigm for actively identifying as many me...
research
07/25/2023

How to Scale Your EMA

Preserving training dynamics across batch sizes is an important tool for...
research
08/22/2019

Clustered Hierarchical Entropy-Scaling Search of Astronomical and Biological Data

Both astronomy and biology are experiencing explosive growth of data, re...

Please sign up or login with your details

Forgot password? Click here to reset