Optimal Policies for Observing Time Series and Related Restless Bandit Problems

by   Christopher R. Dance, et al.

The trade-off between the cost of acquiring and processing data, and uncertainty due to a lack of data is fundamental in machine learning. A basic instance of this trade-off is the problem of deciding when to make noisy and costly observations of a discrete-time Gaussian random walk, so as to minimise the posterior variance plus observation costs. We present the first proof that a simple policy, which observes when the posterior variance exceeds a threshold, is optimal for this problem. The proof generalises to a wide range of cost functions other than the posterior variance. This result implies that optimal policies for linear-quadratic-Gaussian control with costly observations have a threshold structure. It also implies that the restless bandit problem of observing multiple such time series, has a well-defined Whittle index. We discuss computation of that index, give closed-form formulae for it, and compare the performance of the associated index policy with heuristic policies. The proof is based on a new verification theorem that demonstrates threshold structure for Markov decision processes, and on the relation between binary sequences known as mechanical words and the dynamics of discontinuous nonlinear maps, which frequently arise in physics, control and biology.


page 1

page 2

page 3

page 4


DeepTOP: Deep Threshold-Optimal Policy for MDPs and RMABs

We consider the problem of learning the optimal threshold policy for con...

Thompson Sampling for Linear-Quadratic Control Problems

We consider the exploration-exploitation tradeoff in linear quadratic (L...

Optimal Continuous State POMDP Planning with Semantic Observations: A Variational Approach

This work develops novel strategies for optimal planning with semantic o...

Verification of Markov Decision Processes with Risk-Sensitive Measures

We develop a method for computing policies in Markov decision processes ...

Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

We develop a Bayesian model for decision-making under time pressure with...

Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance

This paper investigates the optimization problem of an infinite stage di...

Markov decision processes with observation costs

We present a framework for a controlled Markov chain where the state of ...

Please sign up or login with your details

Forgot password? Click here to reset