CAQL: Continuous Action Q-Learning

09/26/2019
by   Moonkyung Ryu, et al.
29

Value-based reinforcement learning (RL) methods like Q-learning have shown success in a variety of domains. One challenge in applying Q-learning to continuous-action RL problems, however, is the continuous action maximization (max-Q) required for optimal Bellman backup. In this work, we develop CAQL, a (class of) algorithm(s) for continuous-action Q-learning that can use several plug-and-play optimizers for the max-Q problem. Leveraging recent optimization results for deep neural networks, we show that max-Q can be solved optimally using mixed-integer programming (MIP). When the Q-function representation has sufficient power, MIP-based optimization gives rise to better policies and is more robust than approximate methods (e.g., gradient ascent, cross-entropy search). We further develop several techniques to accelerate inference in CAQL, which despite their approximate nature, perform well. We compare CAQL with state-of-the-art RL algorithms on benchmark continuous-control problems that have different degrees of action constraints and show that CAQL outperforms policy-based methods in heavily constrained environments, often dramatically.

READ FULL TEXT

page 19

page 20

page 21

page 22

research
03/25/2019

Q-Learning for Continuous Actions with Cross-Entropy Guided Policies

Off-Policy reinforcement learning (RL) is an important class of methods ...
research
12/05/2018

Entropic Policy Composition with Generalized Policy Improvement and Divergence Correction

Deep reinforcement learning (RL) algorithms have made great strides in r...
research
10/22/2018

Actor-Expert: A Framework for using Action-Value Methods in Continuous Action Spaces

Value-based approaches can be difficult to use in continuous action spac...
research
10/19/2021

Continuous Control with Action Quantization from Demonstrations

In Reinforcement Learning (RL), discrete actions, as opposed to continuo...
research
09/20/2022

Soft Action Priors: Towards Robust Policy Transfer

Despite success in many challenging problems, reinforcement learning (RL...
research
10/28/2020

Learning to Represent Action Values as a Hypergraph on the Action Vertices

Action-value estimation is a critical component of many reinforcement le...
research
04/25/2018

Multiagent Soft Q-Learning

Policy gradient methods are often applied to reinforcement learning in c...

Please sign up or login with your details

Forgot password? Click here to reset