Discretizing Continuous Action Space for On-Policy Optimization

01/29/2019
by   Yunhao Tang, et al.
5

In this work, we show that discretizing action space for continuous control is a simple yet powerful technique for on-policy optimization. The explosion in the number of discrete actions can be efficiently addressed by a policy with factorized distribution across action dimensions. We show that the discrete policy achieves significant performance gains with state-of-the-art on-policy optimization algorithms (PPO, TRPO, ACKTR) especially on high-dimensional tasks with complex dynamics. Additionally, we show that an ordinal parameterization of the discrete distribution can introduce the inductive bias that encodes the natural ordering between discrete actions. This ordinal architecture further significantly improves the performance of PPO/TRPO.

READ FULL TEXT

page 6

page 17

research
05/14/2017

Discrete Sequential Prediction of Continuous Actions for Deep RL

It has long been assumed that high dimensional continuous control proble...
research
10/29/2020

Deep Jump Q-Evaluation for Offline Policy Evaluation in Continuous Action Space

We consider off-policy evaluation (OPE) in continuous action domains, su...
research
07/13/2020

DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional Action Space

It has been arduous to assess the progress of a policy learning algorith...
research
01/22/2020

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can ...
research
10/07/2021

Design Strategy Network: A deep hierarchical framework to represent generative design strategies in complex action spaces

Generative design problems often encompass complex action spaces that ma...
research
11/28/2019

Hierarchical model-based policy optimization: from actions to action sequences and back

We develop a normative framework for hierarchical model-based policy opt...
research
01/26/2023

Joint action loss for proximal policy optimization

PPO (Proximal Policy Optimization) is a state-of-the-art policy gradient...

Please sign up or login with your details

Forgot password? Click here to reset