Discrete Sequential Prediction of Continuous Actions for Deep RL

05/14/2017
by   Luke Metz, et al.
0

It has long been assumed that high dimensional continuous control problems cannot be solved effectively by discretizing individual dimensions of the action space due to the exponentially large number of bins over which policies would have to be learned. In this paper, we draw inspiration from the recent success of sequence-to-sequence models for structured prediction problems to develop policies over discretized spaces. Central to this method is the realization that complex functions over high dimensional spaces can be modeled by neural networks that use next step prediction. Specifically, we show how Q-values and policies over continuous spaces can be modeled using a next step prediction model over discretized dimensions. With this parameterization, it is possible to both leverage the compositional structure of action spaces during learning, as well as compute maxima over action spaces (approximately). On a simple example task we demonstrate empirically that our method can perform global search, which effectively gets around the local optimization issues that plague DDPG and NAF. We apply the technique to off-policy (Q-learning) methods and show that our method can achieve the state-of-the-art for off-policy methods on several continuous control tasks.

READ FULL TEXT

page 7

page 8

page 12

page 13

page 14

page 15

page 17

research
01/22/2020

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can ...
research
01/29/2019

Discretizing Continuous Action Space for On-Policy Optimization

In this work, we show that discretizing action space for continuous cont...
research
05/20/2017

Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning

Deep Reinforcement Learning (DRL) methods have performed well in an incr...
research
06/13/2018

Reinforcement Learning with Function-Valued Action Spaces for Partial Differential Equation Control

Recent work has shown that reinforcement learning (RL) is a promising ap...
research
02/17/2018

Learning to Race through Coordinate Descent Bayesian Optimisation

In the automation of many kinds of processes, the observable outcome can...
research
12/05/2018

Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction

Deep reinforcement learning (RL) algorithms have made great strides in r...
research
11/30/2022

Policy Optimization over General State and Action Spaces

Reinforcement learning (RL) problems over general state and action space...

Please sign up or login with your details

Forgot password? Click here to reset