Thompson Sampling for Unsupervised Sequential Selection

09/16/2020
by   Arun Verma, et al.
18

Thompson Sampling has generated significant interest due to its better empirical performance than upper confidence bound based algorithms. In this paper, we study Thompson Sampling based algorithm for Unsupervised Sequential Selection (USS) problem. The USS problem is a variant of the stochastic multi-armed bandits problem, where the loss of an arm can not be inferred from the observed feedback. In the USS setup, arms are associated with fixed costs and are ordered, forming a cascade. In each round, the learner selects an arm and observes the feedback from arms up to the selected arm. The learner's goal is to find the arm that minimizes the expected total loss. The total loss is the sum of the cost incurred for selecting the arm and the stochastic loss associated with the selected arm. The problem is challenging because, without knowing the mean loss, one cannot compute the total loss for the selected arm. Clearly, learning is feasible only if the optimal arm can be inferred from the problem structure. As shown in the prior work, learning is possible when the problem instance satisfies the so-called `Weak Dominance' (WD) property. Under WD, we show that our Thompson Sampling based algorithm for the USS problem achieves near optimal regret and has better numerical performance than existing algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2020

Online Algorithm for Unsupervised Sequential Selection with Contextual Information

In this paper, we study Contextual Unsupervised Sequential Selection (US...
research
09/04/2019

Censored Semi-Bandits: A Framework for Resource Allocation with Censored Feedback

In this paper, we study Censored Semi-Bandits, a novel variant of the se...
research
05/10/2023

Best Arm Identification in Bandits with Limited Precision Sampling

We study best arm identification in a variant of the multi-armed bandit ...
research
12/22/2022

Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/rewar...
research
12/22/2022

Synopsis: Sequential Decision Problems with Weak Feedback

This thesis considers sequential decision problems, where the loss/rewar...
research
04/12/2021

Censored Semi-Bandits for Resource Allocation

We consider the problem of sequentially allocating resources in a censor...
research
01/15/2019

Online Algorithm for Unsupervised Sensor Selection

In many security and healthcare systems, the detection and diagnosis sys...

Please sign up or login with your details

Forgot password? Click here to reset