DeepAI AI Chat
Log In Sign Up

Partial Policy Iteration for L1-Robust Markov Decision Processes

by   Chin Pang Ho, et al.

Robust Markov decision processes (MDPs) allow to compute reliable solutions for dynamic decision problems whose evolution is modeled by rewards and partially-known transition probabilities. Unfortunately, accounting for uncertainty in the transition probabilities significantly increases the computational complexity of solving robust MDPs, which severely limits their scalability. This paper describes new efficient algorithms for solving the common class of robust MDPs with s- and sa-rectangular ambiguity sets defined by weighted L_1 norms. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. We also propose fast methods for computing the robust Bellman operator in quasi-linear time, nearly matching the linear complexity the non-robust Bellman operator. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach which uses linear programming solvers combined with a robust value iteration.


page 1

page 2

page 3

page 4


An Efficient Solution to s-Rectangular Robust Markov Decision Processes

We present an efficient robust value iteration for -rectangular robust M...

On the convex formulations of robust Markov decision processes

Robust Markov decision processes (MDPs) are used for applications of dyn...

Robust Phi-Divergence MDPs

In recent years, robust Markov decision processes (MDPs) have emerged as...

Faster Algorithms for Quantitative Analysis of Markov Chains and Markov Decision Processes with Small Treewidth

Discrete-time Markov Chains (MCs) and Markov Decision Processes (MDPs) a...

First-Order Methods for Wasserstein Distributionally Robust MDP

Markov Decision Processes (MDPs) are known to be sensitive to parameter ...

Scalable First-Order Methods for Robust MDPs

Markov Decision Processes (MDP) are a widely used model for dynamic deci...

Scaling Up Decentralized MDPs Through Heuristic Search

Decentralized partially observable Markov decision processes (Dec-POMDPs...