Reinforcement Learning When All Actions are Not Always Available

06/05/2019
by   Yash Chandak, et al.
0

The Markov decision process (MDP) formulation used to model many real-world sequential decision making problems does not capture the setting where the set of available decisions (actions) at each time step is stochastic. Recently, the stochastic action set Markov decision process (SAS-MDP) formulation has been proposed, which captures the concept of a stochastic action set. In this paper we argue that existing RL algorithms for SAS-MDPs suffer from divergence issues, and present new algorithms for SAS-MDPs that incorporate variance reduction techniques unique to this setting, and provide conditions for their convergence. We conclude with experiments that demonstrate the practicality of our approaches using several tasks inspired by real-life use cases wherein the action set is stochastic.

READ FULL TEXT

page 8

page 19

page 20

research
07/31/2014

MONEYBaRL: Exploiting pitcher decision-making using Reinforcement Learning

This manuscript uses machine learning techniques to exploit baseball pit...
research
05/07/2018

Planning and Learning with Stochastic Action Sets

In many practical uses of reinforcement learning (RL) the set of actions...
research
02/29/2012

Fast Reinforcement Learning with Large Action Sets using Error-Correcting Output Codes for MDP Factorization

The use of Reinforcement Learning in real-world scenarios is strongly li...
research
01/10/2023

Sequential Fair Resource Allocation under a Markov Decision Process Framework

We study the sequential decision-making problem of allocating a limited ...
research
12/03/2015

Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions

Many real-world problems come with action spaces represented as feature ...
research
01/02/2023

On the Challenges of using Reinforcement Learning in Precision Drug Dosing: Delay and Prolongedness of Action Effects

Drug dosing is an important application of AI, which can be formulated a...
research
09/26/2019

Markov Decision Process for Video Generation

We identify two pathological cases of temporal inconsistencies in video ...

Please sign up or login with your details

Forgot password? Click here to reset