-
Metrics for Markov Decision Processes with Infinite State Spaces
We present metrics for measuring state similarity in Markov decision pro...
read it
-
Metrics for Finite Markov Decision Processes
We present metrics for measuring the similarity of states in a finite Ma...
read it
-
Max-Plus Matching Pursuit for Deterministic Markov Decision Processes
We consider deterministic Markov decision processes (MDPs) and apply max...
read it
-
A Gauss-Newton Method for Markov Decision Processes
Approximate Newton methods are a standard optimization tool which aim to...
read it
-
Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning
Many problems in sequential decision making and stochastic control often...
read it
-
Risk-Sensitive Markov Decision Processes with Combined Metrics of Mean and Variance
This paper investigates the optimization problem of an infinite stage di...
read it
-
Reachability and Differential based Heuristics for Solving Markov Decision Processes
The solution convergence of Markov Decision Processes (MDPs) can be acce...
read it
Scalable methods for computing state similarity in deterministic Markov Decision Processes
We present new algorithms for computing and approximating bisimulation metrics in Markov Decision Processes (MDPs). Bisimulation metrics are an elegant formalism that capture behavioral equivalence between states and provide strong theoretical guarantees on differences in optimal behaviour. Unfortunately, their computation is expensive and requires a tabular representation of the states, which has thus far rendered them impractical for large problems. In this paper we present a new version of the metric that is tied to a behavior policy in an MDP, along with an analysis of its theoretical properties. We then present two new algorithms for approximating bisimulation metrics in large, deterministic MDPs. The first does so via sampling and is guaranteed to converge to the true metric. The second is a differentiable loss which allows us to learn an approximation even for continuous state MDPs, which prior to this work had not been possible.
READ FULL TEXT
Comments
There are no comments yet.