Approximate information for efficient exploration-exploitation strategies

07/04/2023
by   Alex Barbier-Chebbah, et al.
0

This paper addresses the exploration-exploitation dilemma inherent in decision-making, focusing on multi-armed bandit problems. The problems involve an agent deciding whether to exploit current knowledge for immediate gains or explore new avenues for potential long-term rewards. We here introduce a novel algorithm, approximate information maximization (AIM), which employs an analytical approximation of the entropy gradient to choose which arm to pull at each point in time. AIM matches the performance of Infomax and Thompson sampling while also offering enhanced computational speed, determinism, and tractability. Empirical evaluation of AIM indicates its compliance with the Lai-Robbins asymptotic bound and demonstrates its robustness for a range of priors. Its expression is tunable, which allows for specific optimization in various settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/16/2022

Thompson Sampling with Virtual Helping Agents

We address the problem of online sequential decision making, i.e., balan...
research
01/21/2021

An empirical evaluation of active inference in multi-armed bandits

A key feature of sequential decision making under uncertainty is a need ...
research
07/22/2012

Meta-Learning of Exploration/Exploitation Strategies: The Multi-Armed Bandit Case

The exploration/exploitation (E/E) dilemma arises naturally in many subf...
research
02/21/2020

Double Explore-then-Commit: Asymptotic Optimality and Beyond

We study the two-armed bandit problem with subGaussian rewards. The expl...
research
08/05/2017

Thompson Sampling Guided Stochastic Searching on the Line for Deceptive Environments with Applications to Root-Finding Problems

The multi-armed bandit problem forms the foundation for solving a wide r...
research
08/16/2017

Racing Thompson: an Efficient Algorithm for Thompson Sampling with Non-conjugate Priors

Thompson sampling has impressive empirical performance for many multi-ar...
research
04/14/2015

Harnessing Natural Fluctuations: Analogue Computer for Efficient Socially Maximal Decision Making

Each individual handles many tasks of finding the most profitable option...

Please sign up or login with your details

Forgot password? Click here to reset