Adversarial Linear Contextual Bandits with Graph-Structured Side Observations

12/10/2020
by   Lingda Wang, et al.
6

This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: contexts and side observations. In this setting, a learning agent repeatedly chooses from a set of K actions after being presented with a d-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on EXP3. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, EXP3-LGC-U, achieves the regret of order 𝒪(√((K+α(G)d)TlogK)) over the time horizon T, where α(G) is the average independence number of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, EXP3-LGC-IX, is developed for a special class of problems, for which the regret is reduced to 𝒪(√(α(G)dTlogKlog(KT))) for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.

READ FULL TEXT

page 4

page 5

page 6

page 7

page 9

page 10

page 11

page 12

research
05/23/2018

Analysis of Thompson Sampling for Graphical Bandits Without the Graphs

We study multi-armed bandit problems with graph feedback, in which the d...
research
07/17/2013

From Bandits to Experts: A Tale of Domination and Independence

We consider the partial observability model for multi-armed bandits, int...
research
09/30/2014

Nonstochastic Multi-Armed Bandits with Graph-Structured Feedback

We present and study a partial-information model of online learning, whe...
research
02/02/2022

Efficient Algorithms for Learning to Control Bandits with Unobserved Contexts

Contextual bandits are widely-used in the study of learning-based contro...
research
06/13/2011

From Bandits to Experts: On the Value of Side-Observations

We consider an adversarial online learning setting where a decision make...
research
06/16/2022

Simultaneously Learning Stochastic and Adversarial Bandits with General Graph Feedback

The problem of online learning with graph feedback has been extensively ...
research
11/25/2022

On the Re-Solving Heuristic for (Binary) Contextual Bandits with Knapsacks

In the problem of (binary) contextual bandits with knapsacks (CBwK), the...

Please sign up or login with your details

Forgot password? Click here to reset