Contextual Recommendations and Low-Regret Cutting-Plane Algorithms

06/09/2021
by   Sreenivas Gollapudi, et al.
0

We consider the following variant of contextual linear bandits motivated by routing applications in navigational engines and recommendation systems. We wish to learn a hidden d-dimensional value w^*. Every round, we are presented with a subset 𝒳_t ⊆ℝ^d of possible actions. If we choose (i.e. recommend to the user) action x_t, we obtain utility ⟨ x_t, w^* ⟩ but only learn the identity of the best action max_x ∈𝒳_t⟨ x, w^* ⟩. We design algorithms for this problem which achieve regret O(dlog T) and exp(O(d log d)). To accomplish this, we design novel cutting-plane algorithms with low "regret" – the total distance between the true point w^* and the hyperplanes the separation oracle returns. We also consider the variant where we are allowed to provide a list of several recommendations. In this variant, we give an algorithm with O(d^2 log d) regret and list size poly(d). Finally, we construct nearly tight algorithms for a weaker variant of this problem where the learner only learns the identity of an action that is better than the recommendation. Our results rely on new algorithmic techniques in convex geometry (including a variant of Steiner's formula for the centroid of a convex set) which may be of independent interest.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2018

Contextual Bandits with Cross-learning

In the classical contextual bandits problem, in each round t, a learner ...
research
07/04/2020

Linear Bandits with Limited Adaptivity and Learning Distributional Optimal Design

Motivated by practical needs such as large-scale learning, we study the ...
research
06/13/2011

Efficient Optimal Learning for Contextual Bandits

We address the problem of learning in an online setting where the learne...
research
02/23/2020

Survey Bandits with Regret Guarantees

We consider a variant of the contextual bandit problem. In standard cont...
research
02/01/2020

Efficient and Robust Algorithms for Adversarial Linear Contextual Bandits

We consider an adversarial variant of the classic K-armed linear context...
research
07/11/2016

Kernel-based methods for bandit convex optimization

We consider the adversarial convex bandit problem and we build the first...

Please sign up or login with your details

Forgot password? Click here to reset