Offline Estimation of Controlled Markov Chains: Minimax Nonparametric Estimators and Sample Efficiency

11/14/2022
by   Imon Banerjee, et al.
0

Controlled Markov chains (CMCs) form the bedrock for model-based reinforcement learning. In this work, we consider the estimation of the transition probability matrices of a finite-state finite-control CMC using a fixed dataset, collected using a so-called logging policy, and develop minimax sample complexity bounds for nonparametric estimation of these transition probability matrices. Our results are general, and the statistical bounds depend on the logging policy through a natural mixing coefficient. We demonstrate an interesting trade-off between stronger assumptions on mixing versus requiring more samples to achieve a particular PAC-bound. We demonstrate the validity of our results under various examples, such as ergodic Markov chains, weakly ergodic inhomogeneous Markov chains, and controlled Markov chains with non-stationary Markov, episodic, and greedy controls. Lastly, we use these sample complexity bounds to establish concomitant ones for offline evaluation of stationary, Markov policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/13/2018

Minimax Learning of Ergodic Markov Chains

We compute the finite-sample minimax (modulo logarithmic factors) sample...
research
10/07/2020

Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited

In this paper, we propose new problem-independent lower bounds on the sa...
research
03/07/2023

On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

Offline reinforcement learning (offline RL) considers problems where lea...
research
04/15/2019

Subgeometric ergodicity and β-mixing

It is well known that stationary geometrically ergodic Markov chains are...
research
05/20/2021

On the α-lazy version of Markov chains in estimation and testing problems

We formulate extendibility of the minimax one-trajectory length of sever...
research
12/10/2022

Estimation and Application of the Convergence Bounds for Nonlinear Markov Chains

Nonlinear Markov Chains (nMC) are regarded as the original (linear) Mark...
research
11/30/2019

Mix and Match: Markov Chains Mixing Times for Matching in Rideshare

Rideshare platforms such as Uber and Lyft dynamically dispatch drivers t...

Please sign up or login with your details

Forgot password? Click here to reset