Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

03/25/2021
∙
by   Min-hwan Oh, et al.
∙
0
∙

We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. In each period, the learning agent observes a d-dimensional contextual information about the user and the N available items, and offers an assortment of size K to the user, and observes the bandit feedback of the item chosen from the assortment. We propose upper confidence bound based algorithms for this MNL contextual bandit. The first algorithm is a simple and practical method which achieves an 𝒊Ėƒ(d√(T)) regret over T rounds. Next, we propose a second algorithm which achieves a 𝒊Ėƒ(√(dT)) regret. This matches the lower bound for the MNL bandit problem, up to logarithmic terms, and improves on the best known result by a √(d) factor. To establish this sharper regret bound, we present a non-asymptotic confidence bound for the maximum likelihood estimator of the MNL model that may be of independent interest as its own theoretical contribution. We then revisit the simpler, significantly more practical, first algorithm and show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 02/28/2017

Provably Optimal Algorithms for Generalized Linear Contextual Bandits

Contextual bandits are widely used in Internet services from news recomm...
research
∙ 06/07/2020

Thompson Sampling for Multinomial Logit Contextual Bandits

We consider a dynamic assortment selection problem where the goal is to ...
research
∙ 10/30/2015

CONQUER: Confusion Queried Online Bandit Learning

We present a new recommendation setting for picking out two items from a...
research
∙ 09/07/2020

Learning to Rank under Multinomial Logit Choice

Learning the optimal ordering of content is an important challenge in we...
research
∙ 05/08/2018

Multinomial Logit Bandit with Linear Utility Functions

Multinomial logit bandit is a sequential subset selection problem which ...
research
∙ 05/12/2018

Near-Optimal Policies for Dynamic Multinomial Logit Assortment Selection Models

In this paper we consider the dynamic assortment selection problem under...
research
∙ 02/20/2020

Regret Minimization in Stochastic Contextual Dueling Bandits

We consider the problem of stochastic K-armed dueling bandit in the cont...

Please sign up or login with your details

Forgot password? Click here to reset