Parallelizing Contextual Linear Bandits

05/21/2021

∙

Standard approaches to decision-making under uncertainty focus on sequential exploration of the space of decisions. However, simultaneously proposing a batch of decisions, which leverages available resources for parallel experimentation, has the potential to rapidly accelerate exploration. We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts – given access to the same total number of oracle queries – up to a lower-order "burn-in" term that is dependent on the context-set geometry. We provide matching information-theoretic lower bounds on parallel regret performance to establish our algorithms are asymptotically optimal in the time horizon. Finally, we also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems, to demonstrate the utility of parallelized bandits in practical settings.

READ FULL TEXT

Parallelizing Contextual Linear Bandits

Sign in with Google

Consider DeepAI Pro