Graphical Models for Bandit Problems

02/14/2012
by   Kareem Amin, et al.
0

We introduce a rich class of graphical models for multi-armed bandit problems that permit both the state or context space and the action space to be very large, yet succinctly specify the payoffs for any context-action pair. Our main result is an algorithm for such models whose regret is bounded by the number of parameters and whose running time depends only on the treewidth of the graph substructure induced by the action space.

READ FULL TEXT
research
02/17/2015

Regret bounds for Narendra-Shapiro bandit algorithms

Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have be...
research
07/29/2019

Bandits with Feedback Graphs and Switching Costs

We study the adversarial multi-armed bandit problem where partial observ...
research
09/23/2020

EXP4-DFDC: A Non-Stochastic Multi-Armed Bandit for Cache Replacement

In this work we study a variant of the well-known multi-armed bandit (MA...
research
11/30/2018

Asymptotically Optimal Multi-Armed Bandit Activation Policies under Side Constraints

This paper introduces the first asymptotically optimal strategy for the ...
research
12/14/2018

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

As reinforcement learning algorithms are being applied to increasingly c...
research
06/30/2015

Scalable Discrete Sampling as a Multi-Armed Bandit Problem

Drawing a sample from a discrete distribution is one of the building com...
research
12/14/2017

Context-specific independencies for ordinal variables in chain regression models

In this work we handle with categorical (ordinal) variables and we focus...

Please sign up or login with your details

Forgot password? Click here to reset