Thompson Sampling for Multinomial Logit Contextual Bandits

06/07/2020
by   Min-hwan Oh, et al.
0

We consider a dynamic assortment selection problem where the goal is to offer a sequence of assortments that maximizes the expected cumulative revenue, or alternatively, minimize the expected regret. The feedback here is the item that the user picks from the assortment. The distinguishing feature in this work is that this feedback has a multinomial logistic distribution. The utility of each item is a dynamic function of contextual information of both the item and the user. We propose two Thompson sampling algorithms for this multinomial logit contextual bandit. Our first algorithm maintains a posterior distribution of the true parameter and establishes O(d√T) Bayesian regret over T rounds with d dimensional context vector. The worst-case computational complexity of this algorithm could be high when the prior distribution is not a conjugate. The second algorithm approximates the posterior by a Gaussian distribution and uses a new optimistic sampling procedure to address the issues that arise in worst-case regret analysis. This algorithm achieves O(d^(3/2)√T) worst-case (frequentist) regret bound. The numerical experiments show that the practical performance of both methods is in line with the theoretical guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2021

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

We consider a sequential assortment selection problem where the user cho...
research
05/31/2023

Combinatorial Neural Bandits

We consider a contextual combinatorial bandit problem where in each roun...
research
02/09/2021

Robust Bandit Learning with Imperfect Context

A standard assumption in contextual multi-arm bandit is that the true co...
research
05/12/2019

On the Performance of Thompson Sampling on Logistic Bandits

We study the logistic bandit, in which rewards are binary with success p...
research
11/25/2022

On the Re-Solving Heuristic for (Binary) Contextual Bandits with Knapsacks

In the problem of (binary) contextual bandits with knapsacks (CBwK), the...
research
05/01/2023

First- and Second-Order Bounds for Adversarial Linear Contextual Bandits

We consider the adversarial linear contextual bandit setting, which allo...
research
10/02/2018

Thompson Sampling for Cascading Bandits

We design and analyze TS-Cascade, a Thompson sampling algorithm for the ...

Please sign up or login with your details

Forgot password? Click here to reset