Reinforcement Learning with Parameterized Actions

09/05/2015
by   Warwick Masson, et al.
0

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actions-discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.

READ FULL TEXT

page 5

page 6

research
05/10/2019

Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces

Parameterised actions in reinforcement learning are composed of discrete...
research
09/05/2022

SlateFree: a Model-Free Decomposition for Reinforcement Learning with Slate Actions

We consider the problem of sequential recommendations, where at each ste...
research
05/13/2023

Thompson Sampling for Parameterized Markov Decision Processes with Uninformative Actions

We study parameterized MDPs (PMDPs) in which the key parameters of inter...
research
05/07/2021

Utilizing Skipped Frames in Action Repeats via Pseudo-Actions

In many deep reinforcement learning settings, when an agent takes an act...
research
10/09/2020

Parameterized Reinforcement Learning for Optical System Optimization

Designing a multi-layer optical system with designated optical character...
research
12/12/2016

Online Reinforcement Learning for Real-Time Exploration in Continuous State and Action Markov Decision Processes

This paper presents a new method to learn online policies in continuous ...
research
03/14/2023

Act-Then-Measure: Reinforcement Learning for Partially Observable Environments with Active Measuring

We study Markov decision processes (MDPs), where agents have direct cont...

Please sign up or login with your details

Forgot password? Click here to reset