Instance-Wise Minimax-Optimal Algorithms for Logistic Bandits

by   Marc Abeille, et al.

Logistic Bandits have recently attracted substantial attention, by providing an uncluttered yet challenging framework for understanding the impact of non-linearity in parametrized bandits. It was shown by Faury et al. (2020) that the learning-theoretic difficulties of Logistic Bandits can be embodied by a large (sometimes prohibitively) problem-dependent constant κ, characterizing the magnitude of the reward's non-linearity. In this paper we introduce a novel algorithm for which we provide a refined analysis. This allows for a better characterization of the effect of non-linearity and yields improved problem-dependent guarantees. In most favorable cases this leads to a regret upper-bound scaling as 𝒪̃(d√(T/κ)), which dramatically improves over the 𝒪̃(d√(T)+κ) state-of-the-art guarantees. We prove that this rate is minimax-optimal by deriving a Ω(d√(T/κ)) problem-dependent lower-bound. Our analysis identifies two regimes (permanent and transitory) of the regret, which ultimately re-conciliates Faury et al. (2020) with the Bayesian approach of Dong et al. (2019). In contrast to previous works, we find that in the permanent regime non-linearity can dramatically ease the exploration-exploitation trade-off. While it also impacts the length of the transitory phase in a problem-dependent fashion, we show that this impact is mild in most reasonable configurations.


page 1

page 2

page 3

page 4


An Experimental Design Approach for Regret Minimization in Logistic Bandits

In this work we consider the problem of regret minimization for logistic...

Solving Bernoulli Rank-One Bandits with Unimodal Thompson Sampling

Stochastic Rank-One Bandits (Katarya et al, (2017a,b)) are a simple fram...

Jointly Efficient and Optimal Algorithms for Logistic Bandits

Logistic Bandits have recently undergone careful scrutiny by virtue of t...

Improved Confidence Bounds for the Linear Logistic Model and Applications to Linear Bandits

We propose improved fixed-design confidence bounds for the linear logist...

Asymptotic Instance-Optimal Algorithms for Interactive Decision Making

Past research on interactive decision making problems (bandits, reinforc...

Last Switch Dependent Bandits with Monotone Payoff Functions

In a recent work, Laforgue et al. introduce the model of last switch dep...

On the Minimax Regret for Linear Bandits in a wide variety of Action Spaces

As noted in the works of <cit.>, it has been mentioned that it is an ope...

Please sign up or login with your details

Forgot password? Click here to reset