DeepAI AI Chat
Log In Sign Up

Flexible Decomposition Algorithms for Weakly Coupled Markov Decision Problems

by   Ron Parr, et al.

This paper presents two new approaches to decomposing and solving large Markov decision problems (MDPs), a partial decoupling method and a complete decoupling method. In these approaches, a large, stochastic decision problem is divided into smaller pieces. The first approach builds a cache of policies for each part of the problem independently, and then combines the pieces in a separate, light-weight step. A second approach also divides the problem into smaller pieces, but information is communicated between the different problem pieces, allowing intelligent decisions to be made about which piece requires the most attention. Both approaches can be used to find optimal policies or approximately optimal policies with provable bounds. These algorithms also provide a framework for the efficient transfer of knowledge across problems that share similar structure.


page 1

page 3

page 4

page 6

page 8

page 9


An Efficient Solution to s-Rectangular Robust Markov Decision Processes

We present an efficient robust value iteration for -rectangular robust M...

Inductive Policy Selection for First-Order MDPs

We select policies for large Markov Decision Processes (MDPs) with compa...

Reducing Blackwell and Average Optimality to Discounted MDPs via the Blackwell Discount Factor

We introduce the Blackwell discount factor for Markov Decision Processes...

Model Reduction Techniques for Computing Approximately Optimal Solutions for Markov Decision Processes

We present a method for solving implicit (factored) Markov decision proc...

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

This paper presents a new method to learn online policies in continuous ...

Multiscale Markov Decision Problems: Compression, Solution, and Transfer Learning

Many problems in sequential decision making and stochastic control often...

On State Variables, Bandit Problems and POMDPs

State variables are easily the most subtle dimension of sequential decis...