Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

10/23/2018
by   Ermo Wei, et al.
0

We explore Deep Reinforcement Learning in a parameterized action space. Specifically, we investigate how to achieve sample-efficient end-to-end training in these tasks. We propose a new compact architecture for the tasks where the parameter policy is conditioned on the output of the discrete action policy. We also propose two new methods based on the state-of-the-art algorithms Trust Region Policy Optimization (TRPO) and Stochastic Value Gradient (SVG) to train such an architecture. We demonstrate that these methods outperform the state of the art method, Parameterized Action DDPG, on test domains.

READ FULL TEXT
research
05/29/2018

Supervised Policy Update

We propose a new sample-efficient methodology, called Supervised Policy ...
research
06/05/2021

Learning Routines for Effective Off-Policy Reinforcement Learning

The performance of reinforcement learning depends upon designing an appr...
research
07/09/2021

Attend2Pack: Bin Packing through Deep Reinforcement Learning with Attention

This paper seeks to tackle the bin packing problem (BPP) through a learn...
research
07/29/2019

Action Grammars: A Cognitive Model for Learning Temporal Abstractions

Hierarchical Reinforcement Learning algorithms have successfully been ap...
research
05/28/2021

Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

The reinforcement learning problem of finding a control policy that mini...
research
04/27/2020

Evolutionary Stochastic Policy Distillation

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging ...
research
03/31/2020

Exploration in Action Space

Parameter space exploration methods with black-box optimization have rec...

Please sign up or login with your details

Forgot password? Click here to reset