On State Variables, Bandit Problems and POMDPs

02/14/2020
by   Warren B. Powell, et al.
0

State variables are easily the most subtle dimension of sequential decision problems. This is especially true in the context of active learning problems (bandit problems") where decisions affect what we observe and learn. We describe our canonical framework that models any sequential decision problem, and present our definition of state variables that allows us to claim: Any properly modeled sequential decision problem is Markovian. We then present a novel two-agent perspective of partially observable Markov decision problems (POMDPs) that allows us to then claim: Any model of a real decision problem is (possibly) non-Markovian. We illustrate these perspectives using the context of observing and treating flu in a population, and provide examples of all four classes of policies in this setting. We close with an indication of how to extend this thinking to multiagent problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2020

Generalized Chernoff Sampling for Active Learning and Structured Bandit Algorithms

Active learning and structured stochastic bandit problems are intimately...
research
03/08/2021

Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games

Tree-form sequential decision making (TFSDM) extends classical one-shot ...
research
12/07/2019

From Reinforcement Learning to Optimal Control: A unified framework for sequential decisions

There are over 15 distinct communities that work in the general area of ...
research
11/05/2019

A Note on Quantum Markov Models

The study of Markov models is central to control theory and machine lear...
research
09/30/2022

Robust Q-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty

We present a novel Q-learning algorithm to solve distributionally robust...
research
02/05/2018

Wireless Optimisation via Convex Bandits: Unlicensed LTE/WiFi Coexistence

Bandit Convex Optimisation (BCO) is a powerful framework for sequential ...

Please sign up or login with your details

Forgot password? Click here to reset