Markov Game with Switching Costs

07/13/2021
by   Jian Li, et al.
0

We study a general Markov game with metric switching costs: in each round, the player adaptively chooses one of several Markov chains to advance with the objective of minimizing the expected cost for at least k chains to reach their target states. If the player decides to play a different chain, an additional switching cost is incurred. The special case in which there is no switching cost was solved optimally by Dumitriu, Tetali, and Winkler [DTW03] by a variant of the celebrated Gittins Index for the classical multi-armed bandit (MAB) problem with Markovian rewards [Gittins 74, Gittins79]. However, for multi-armed bandit (MAB) with nontrivial switching cost, even if the switching cost is a constant, the classic paper by Banks and Sundaram [BS94] showed that no index strategy can be optimal. In this paper, we complement their result and show there is a simple index strategy that achieves a constant approximation factor if the switching cost is constant and k=1. To the best of our knowledge, this is the first index strategy that achieves a constant approximation factor for a general MAB variant with switching costs. For the general metric, we propose a more involved constant-factor approximation algorithm, via a nontrivial reduction to the stochastic k-TSP problem, in which a Markov chain is approximated by a random variable. Our analysis makes extensive use of various interesting properties of the Gittins index.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/17/2017

Online Multi-Armed Bandit

We introduce a novel variant of the multi-armed bandit problem, in which...
research
05/26/2019

Phase Transitions and Cyclic Phenomena in Bandits with Switching Constraints

We consider the classical stochastic multi-armed bandit problem with a c...
research
12/08/2017

Shrewd Selection Speeds Surfing: Use Smart EXP3!

In this paper, we explore the use of multi-armed bandit online learning ...
research
01/26/2020

Regime Switching Bandits

We study a multi-armed bandit problem where the rewards exhibit regime-s...
research
10/23/2018

Online learning with feedback graphs and switching costs

We study online learning when partial feedback information is provided f...
research
02/24/2022

A general framework for adaptive two-index fusion attribute weighted naive Bayes

Naive Bayes(NB) is one of the essential algorithms in data mining. Howev...
research
07/06/2023

PCL-Indexability and Whittle Index for Restless Bandits with General Observation Models

In this paper, we consider a general observation model for restless mult...

Please sign up or login with your details

Forgot password? Click here to reset