Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling

04/26/2022
by   Baihan Lin, et al.
1

As two popular schools of machine learning, online learning and evolutionary computations have become two important driving forces behind real-world decision making engines for applications in biomedicine, economics, and engineering fields. Although there are prior work that utilizes bandits to improve evolutionary algorithms' optimization process, it remains a field of blank on how evolutionary approach can help improve the sequential decision making tasks of online learning agents such as the multi-armed bandits. In this work, we propose the Genetic Thompson Sampling, a bandit algorithm that keeps a population of agents and update them with genetic principles such as elite selection, crossover and mutations. Empirical results in multi-armed bandit simulation environments and a practical epidemic control problem suggest that by incorporating the genetic algorithm into the bandit algorithm, our method significantly outperforms the baselines in nonstationary settings. Lastly, we introduce EvoBandit, a web-based interactive visualization to guide the readers through the entire learning process and perform lightweight evaluations on the fly. We hope to engage researchers into this growing field of research with this investigation.

READ FULL TEXT
research
08/21/2023

Cost-Efficient Online Decision Making: A Combinatorial Multi-Armed Bandit Approach

Online decision making plays a crucial role in numerous real-world appli...
research
08/17/2020

Using Subjective Logic to Estimate Uncertainty in Multi-Armed Bandit Problems

The multi-armed bandit problem is a classical decision-making problem wh...
research
06/11/2023

Multi-Source Test-Time Adaptation as Dueling Bandits for Extractive Question Answering

In this work, we study multi-source test-time model adaptation from user...
research
02/15/2023

Genetic multi-armed bandits: a reinforcement learning approach for discrete optimization via simulation

This paper proposes a new algorithm, referred to as GMAB, that combines ...
research
07/10/2019

Productization Challenges of Contextual Multi-Armed Bandits

Contextual Multi-Armed Bandits is a well-known and accepted online optim...
research
08/28/2021

Self-fulfilling Bandits: Endogeneity Spillover and Dynamic Selection in Algorithmic Decision-making

In this paper, we study endogeneity problems in algorithmic decision-mak...
research
06/09/2020

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

Prisoner's Dilemma mainly treat the choice to cooperate or defect as an ...

Please sign up or login with your details

Forgot password? Click here to reset