Improved Optimistic Algorithm For The Multinomial Logit Contextual Bandit

11/28/2020
by   Priyank Agrawal, et al.
6

We consider a dynamic assortment selection problem where the goal is to offer a sequence of assortments of cardinality at most K, out of N items, to minimize the expected cumulative regret (loss of revenue). The feedback is given by a multinomial logit (MNL) choice model. This sequential decision making problem is studied under the MNL contextual bandit framework. The existing algorithms for MNL contexual bandit have frequentist regret guarantees as Õ(κ√(T)), where κ is an instance dependent constant. κ could be arbitrarily large, e.g. exponentially dependent on the model parameters, causing the existing regret guarantees to be substantially loose. We propose an optimistic algorithm with a carefully designed exploration bonus term and show that it enjoys Õ(√(T)) regret. In our bounds, the κ factor only affects the poly-log term and not the leading term of the regret bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2020

Improved Optimistic Algorithms for Logistic Bandits

The generalized linear bandit framework has attracted a lot of attention...
research
11/19/2020

Fully Gap-Dependent Bounds for Multinomial Logit Bandit

We study the multinomial logit (MNL) bandit problem, where at each time ...
research
06/02/2021

MNL-Bandit with Knapsacks

We consider a dynamic assortment selection problem where a seller has a ...
research
11/13/2019

Context-aware Dynamic Assets Selection for Online Portfolio Selection based on Contextual Bandit

Online portfolio selection is a sequential decision-making problem in fi...
research
08/06/2021

Joint AP Probing and Scheduling: A Contextual Bandit Approach

We consider a set of APs with unknown data rates that cooperatively serv...
research
10/18/2016

Dynamic Assortment Personalization in High Dimensions

We study the problem of dynamic assortment personalization with large, h...
research
05/08/2018

Multinomial Logit Bandit with Linear Utility Functions

Multinomial logit bandit is a sequential subset selection problem which ...

Please sign up or login with your details

Forgot password? Click here to reset