Metrics for Finite Markov Decision Processes

07/11/2012
by   Norman Ferns, et al.
0

We present metrics for measuring the similarity of states in a finite Markov decision process (MDP). The formulation of our metrics is based on the notion of bisimulation for MDPs, with an aim towards solving discounted infinite horizon reinforcement learning tasks. Such metrics can be used to aggregate states, as well as to better structure other value function approximators (e.g., memory-based or nearest-neighbor approximators). We provide bounds that relate our metric distances to the optimal values of states in the given MDP.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2012

Metrics for Markov Decision Processes with Infinite State Spaces

We present metrics for measuring state similarity in Markov decision pro...
research
12/13/2019

Provably Efficient Reinforcement Learning with Aggregated States

We establish that an optimistic variant of Q-learning applied to a finit...
research
10/27/2021

Finite Horizon Q-learning: Stability, Convergence and Simulations

Q-learning is a popular reinforcement learning algorithm. This algorithm...
research
09/13/2021

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

Previous work on planning as active inference addresses finite horizon p...
research
12/01/2021

Comparing discounted and average-cost Markov Decision Processes: a statistical significance perspective

Optimal Markov Decision Process policies for problems with finite state ...
research
09/26/2019

Markov Decision Process for Video Generation

We identify two pathological cases of temporal inconsistencies in video ...
research
02/02/2021

Metrics and continuity in reinforcement learning

In most practical applications of reinforcement learning, it is untenabl...

Please sign up or login with your details

Forgot password? Click here to reset