Online Meta-Critic Learning for Off-Policy Actor-Critic Methods

03/11/2020
by   Wei Zhou, et al.
21

Off-Policy Actor-Critic (Off-PAC) methods have proven successful in a variety of continuous control tasks. Normally, the critic's action-value function is updated using temporal-difference, and the critic in turn provides a loss for the actor that trains it to take actions with higher expected return. In this paper, we introduce a novel and flexible meta-critic that observes the learning process and meta-learns an additional loss for the actor that accelerates and improves actor-critic learning. Compared to the vanilla critic, the meta-critic network is explicitly trained to accelerate the learning process; and compared to existing meta-learning algorithms, meta-critic is rapidly learned online for a single task, rather than slowly over a family of tasks. Crucially, our meta-critic framework is designed for off-policy based learners, which currently provide state-of-the-art reinforcement learning sample efficiency. We demonstrate that online meta-critic learning leads to improvements in avariety of continuous control environments when combined with contemporary Off-PAC methods DDPG, TD3 and the state-of-the-art SAC.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2019

Provably Convergent Off-Policy Actor-Critic with Function Approximation

We present the first provably convergent off-policy actor-critic algorit...
research
07/05/2022

Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework

Despite the promising results achieved, state-of-the-art interactive rei...
research
10/13/2021

Adapting to Dynamic LEO-B5G Systems: Meta-Critic Learning Based Efficient Resource Scheduling

Low earth orbit (LEO) satellite-assisted communications have been consid...
research
05/06/2021

Deep Graph Convolutional Reinforcement Learning for Financial Portfolio Management – DeepPocket

Portfolio management aims at maximizing the return on investment while m...
research
09/06/2023

Addressing Imperfect Symmetry: a Novel Symmetry-Learning Actor-Critic Extension

Symmetry, a fundamental concept to understand our environment, often ove...
research
10/22/2022

Solving Continuous Control via Q-learning

While there has been substantial success in applying actor-critic method...
research
10/06/2020

Reinforcement Learning with Random Delays

Action and observation delays commonly occur in many Reinforcement Learn...

Please sign up or login with your details

Forgot password? Click here to reset