Thompson Sampling with Virtual Helping Agents

09/16/2022
by   Kartik Anand Pant, et al.
0

We address the problem of online sequential decision making, i.e., balancing the trade-off between exploiting the current knowledge to maximize immediate performance and exploring the new information to gain long-term benefits using the multi-armed bandit framework. Thompson sampling is one of the heuristics for choosing actions that address this exploration-exploitation dilemma. We first propose a general framework that helps heuristically tune the exploration versus exploitation trade-off in Thompson sampling using multiple samples from the posterior distribution. Utilizing this framework, we propose two algorithms for the multi-armed bandit problem and provide theoretical bounds on the cumulative regret. Next, we demonstrate the empirical improvement in the cumulative regret performance of the proposed algorithm over Thompson Sampling. We also show the effectiveness of the proposed algorithm on real-world datasets. Contrary to the existing methods, our framework provides a mechanism to vary the amount of exploration/ exploitation based on the task at hand. Towards this end, we extend our framework for two additional problems, i.e., best arm identification and time-sensitive learning in bandits and compare our algorithm with existing methods.

READ FULL TEXT

page 1

page 14

research
01/21/2021

An empirical evaluation of active inference in multi-armed bandits

A key feature of sequential decision making under uncertainty is a need ...
research
10/29/2021

Variational Bayesian Optimistic Sampling

We consider online sequential decision problems where an agent must bala...
research
07/04/2023

Approximate information for efficient exploration-exploitation strategies

This paper addresses the exploration-exploitation dilemma inherent in de...
research
05/30/2022

Adaptive Learning for Discovery

In this paper, we study a sequential decision-making problem, called Ada...
research
10/08/2018

Balancing Global Exploration and Local-connectivity Exploitation with Rapidly-exploring Random disjointed-Trees

Sampling efficiency in a highly constrained environment has long been a ...
research
04/02/2021

Blind Exploration and Exploitation of Stochastic Experts

We present blind exploration and exploitation (BEE) algorithms for ident...
research
06/01/2023

Efficient Failure Pattern Identification of Predictive Algorithms

Given a (machine learning) classifier and a collection of unlabeled data...

Please sign up or login with your details

Forgot password? Click here to reset