
Learning to Detect an Odd Markov Arm
A multiarmed bandit with finitely many arms is studied when each arm is...
read it

Detecting an Odd Restless Markov Arm with a Trembling Hand
In this paper, we consider a multiarmed bandit in which each arm is a M...
read it

Regional MultiArmed Bandits
We consider a variant of the classic multiarmed bandit problem where th...
read it

Optimal Odd Arm Identification with Fixed Confidence
The problem of detecting an odd arm from a set of K arms of a multiarme...
read it

A Bad Arm Existence Checking Problem
We study a bad arm existing checking problem in which a player's task is...
read it

Sequential Multihypothesis Testing in Multiarmed Bandit Problems:An Approach for Asymptotic Optimality
We consider a multihypothesis testing problem involving a Karmed bandi...
read it

A Note on the Equivalence of Upper Confidence Bounds and Gittins Indices for Patient Agents
This note gives a short, selfcontained, proof of a sharp connection bet...
read it
Learning to Detect an Odd Restless Markov Arm with a Trembling Hand
This paper studies the problem of finding an anomalous arm in a multiarmed bandit when (a) each arm is a finitestate Markov process, and (b) the arms are restless. Here, anomaly means that the transition probability matrix (TPM) of one of the arms (the odd arm) is different from the common TPM of each of the nonodd arms. The TPMs are unknown to a decision entity that wishes to find the index of the odd arm as quickly as possible, subject to an upper bound on the error probability. We derive a problem instancespecific asymptotic lower bound on the expected time required to find the odd arm index, where the asymptotics is as the error probability vanishes. Further, we devise a policy based on the principle of certainty equivalence, and demonstrate that under a continuous selection assumption and a certain regularity assumption on the TPMs, the policy achieves the lower bound arbitrarily closely. Thus, while the lower bound is shown for all problem instances, the upper bound is shown only for those problem instances satisfying the continuous selection and the regularity assumptions. Our achievability analysis is based on resolving the identifiability problem in the context of a certain lifted countablestate controlled Markov process.
READ FULL TEXT
Comments
There are no comments yet.