Planning and Learning with Stochastic Action Sets

05/07/2018
by   Craig Boutilier, et al.
0

In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities; and a sampling-based model for unknown distributions. We develop poly-time value and policy iteration methods for both cases; and in the first, we offer a poly-time linear programming solution.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2019

Reinforcement Learning When All Actions are Not Always Available

The Markov decision process (MDP) formulation used to model many real-wo...
research
10/03/2020

Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable Optimal Action-Value Functions

We consider the problem of local planning in fixed-horizon Markov Decisi...
research
09/26/2013

Solution Methods for Constrained Markov Decision Process with Continuous Probability Modulation

We propose solution methods for previously-unsolved constrained MDPs in ...
research
06/27/2012

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

SDYNA is a general framework designed to address large stochastic reinfo...
research
08/28/2015

Learning Efficient Representations for Reinforcement Learning

Markov decision processes (MDPs) are a well studied framework for solvin...
research
05/03/2015

Metareasoning for Planning Under Uncertainty

The conventional model for online planning under uncertainty assumes tha...
research
04/26/2022

Learning Value Functions from Undirected State-only Experience

This paper tackles the problem of learning value functions from undirect...

Please sign up or login with your details

Forgot password? Click here to reset