Solving Continuous Control via Q-learning

10/22/2022
by   Tim Seyde, et al.
0

While there has been substantial success in applying actor-critic methods to continuous control, simpler critic-only methods such as Q-learning often remain intractable in the associated high-dimensional action spaces. However, most actor-critic methods come at the cost of added complexity: heuristics for stabilization, compute requirements as well as wider hyperparameter search spaces. We show that these issues can be largely alleviated via Q-learning by combining action discretization with value decomposition, framing single-agent control as cooperative multi-agent reinforcement learning (MARL). With bang-bang actions, performance of this critic-only approach matches state-of-the-art continuous actor-critic methods when learning from features or pixels. We extend classical bandit examples from cooperative MARL to provide intuition for how decoupled critics leverage state information to coordinate joint optimization, and demonstrate surprisingly strong performance across a wide variety of continuous control tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/24/2020

Value-Decomposition Multi-Agent Actor-Critics

The exploitation of extra state information has been an active research ...
research
02/19/2021

Decentralized Deterministic Multi-Agent Reinforcement Learning

[Zhang, ICML 2018] provided the first decentralized actor-critic algorit...
research
03/11/2020

Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a va...
research
05/08/2019

Actor-Critic Algorithms for Constrained Multi-agent Reinforcement Learning

In cooperative stochastic games multiple agents work towards learning jo...
research
08/22/2022

Efficient Planning in a Compact Latent Action Space

While planning-based sequence modelling methods have shown great potenti...
research
03/02/2023

Resource-Constrained Station-Keeping for Helium Balloons using Reinforcement Learning

High altitude balloons have proved useful for ecological aerial surveys,...
research
07/05/2022

Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework

Despite the promising results achieved, state-of-the-art interactive rei...

Please sign up or login with your details

Forgot password? Click here to reset