DeepAI AI Chat
Log In Sign Up

Blind Exploration and Exploitation of Stochastic Experts

04/02/2021
by   Noyan C. Sevuktekin, et al.
0

We present blind exploration and exploitation (BEE) algorithms for identifying the most reliable stochastic expert based on formulations that employ posterior sampling, upper-confidence bounds, empirical Kullback-Leibler divergence, and minmax methods for the stochastic multi-armed bandit problem. Joint sampling and consultation of experts whose opinions depend on the hidden and random state of the world becomes challenging in the unsupervised, or blind, framework as feedback from the true state is not available. We propose an empirically realizable measure of expert competence that can be inferred instantaneously using only the opinions of other experts. This measure preserves the ordering of true competences and thus enables joint sampling and consultation of stochastic experts based on their opinions on dynamically changing tasks. Statistics derived from the proposed measure is instantaneously available allowing both blind exploration-exploitation and unsupervised opinion aggregation. We discuss how the lack of supervision affects the asymptotic regret of BEE architectures that rely on UCB1, KL-UCB, MOSS, IMED, and Thompson sampling. We demonstrate the performance of different BEE algorithms empirically and compare them to their standard, or supervised, counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

10/06/2021

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

We propose a novel modification of the standard upper confidence bound (...
09/16/2022

Thompson Sampling with Virtual Helping Agents

We address the problem of online sequential decision making, i.e., balan...
05/14/2021

Thompson Sampling for Gaussian Entropic Risk Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
11/16/2020

Risk-Constrained Thompson Sampling for CVaR Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
05/26/2022

Exploration, Exploitation, and Engagement in Multi-Armed Bandits with Abandonment

Multi-armed bandit (MAB) is a classic model for understanding the explor...
07/15/2021

Expert Graphs: Synthesizing New Expertise via Collaboration

Consider multiple experts with overlapping expertise working on a classi...
10/08/2018

Balancing Global Exploration and Local-connectivity Exploitation with Rapidly-exploring Random disjointed-Trees

Sampling efficiency in a highly constrained environment has long been a ...