Multinomial Logit Bandit with Linear Utility Functions

05/08/2018
by   Mingdong Ou, et al.
0

Multinomial logit bandit is a sequential subset selection problem which arises in many applications. In each round, the player selects a K-cardinality subset from N candidate items, and receives a reward which is governed by a multinomial logit (MNL) choice model considering both item utility and substitution property among items. The player's objective is to dynamically learn the parameters of MNL model and maximize cumulative reward over a finite horizon T. This problem faces the exploration-exploitation dilemma, and the involved combinatorial nature makes it non-trivial. In recent years, there have developed some algorithms by exploiting specific characteristics of the MNL model, but all of them estimate the parameters of MNL model separately and incur a regret no better than Õ(√(NT)) which is not preferred for large candidate set size N. In this paper, we consider the linear utility MNL choice model whose item utilities are represented as linear functions of d-dimension item features, and propose an algorithm, titled LUMB, to exploit the underlying structure. It is proven that the proposed algorithm achieves Õ(dK√(T)) regret which is free of candidate set size. Experiments show the superiority of the proposed algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2021

Multinomial Logit Contextual Bandits: Provable Optimality and Practicality

We consider a sequential assortment selection problem where the user cho...
research
09/28/2022

Online Subset Selection using α-Core with no Augmented Regret

We consider the problem of sequential sparse subset selections in an onl...
research
11/28/2020

Improved Optimistic Algorithm For The Multinomial Logit Contextual Bandit

We consider a dynamic assortment selection problem where the goal is to ...
research
12/12/2018

On Distributed Multi-player Multiarmed Bandit Problems in Abruptly Changing Environment

We study the multi-player stochastic multiarmed bandit (MAB) problem in ...
research
06/05/2020

Rate-adaptive model selection over a collection of black-box contextual bandit algorithms

We consider the model selection task in the stochastic contextual bandit...
research
02/22/2022

No-Regret Learning in Partially-Informed Auctions

Auctions with partially-revealed information about items are broadly emp...
research
10/30/2015

CONQUER: Confusion Queried Online Bandit Learning

We present a new recommendation setting for picking out two items from a...

Please sign up or login with your details

Forgot password? Click here to reset