Blind Exploration and Exploitation of Stochastic Experts

04/02/2021
by   Noyan C. Sevuktekin, et al.
0

We present blind exploration and exploitation (BEE) algorithms for identifying the most reliable stochastic expert based on formulations that employ posterior sampling, upper-confidence bounds, empirical Kullback-Leibler divergence, and minmax methods for the stochastic multi-armed bandit problem. Joint sampling and consultation of experts whose opinions depend on the hidden and random state of the world becomes challenging in the unsupervised, or blind, framework as feedback from the true state is not available. We propose an empirically realizable measure of expert competence that can be inferred instantaneously using only the opinions of other experts. This measure preserves the ordering of true competences and thus enables joint sampling and consultation of stochastic experts based on their opinions on dynamically changing tasks. Statistics derived from the proposed measure is instantaneously available allowing both blind exploration-exploitation and unsupervised opinion aggregation. We discuss how the lack of supervision affects the asymptotic regret of BEE architectures that rely on UCB1, KL-UCB, MOSS, IMED, and Thompson sampling. We demonstrate the performance of different BEE algorithms empirically and compare them to their standard, or supervised, counterparts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2021

Tuning Confidence Bound for Stochastic Bandits with Bandit Distance

We propose a novel modification of the standard upper confidence bound (...
research
09/16/2022

Thompson Sampling with Virtual Helping Agents

We address the problem of online sequential decision making, i.e., balan...
research
05/14/2021

Thompson Sampling for Gaussian Entropic Risk Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
research
11/16/2020

Risk-Constrained Thompson Sampling for CVaR Bandits

The multi-armed bandit (MAB) problem is a ubiquitous decision-making pro...
research
02/23/2018

Contextual Bandits with Stochastic Experts

We consider the problem of contextual bandits with stochastic experts, w...
research
06/01/2023

Efficient Failure Pattern Identification of Predictive Algorithms

Given a (machine learning) classifier and a collection of unlabeled data...
research
10/08/2018

Balancing Global Exploration and Local-connectivity Exploitation with Rapidly-exploring Random disjointed-Trees

Sampling efficiency in a highly constrained environment has long been a ...

Please sign up or login with your details

Forgot password? Click here to reset