ReLU to the Rescue: Improve Your On-Policy Actor-Critic with Positive Advantages

06/02/2023
by   Andrew Jesson, et al.
0

In this paper, we introduce a novel method for enhancing the effectiveness of on-policy Deep Reinforcement Learning (DRL) algorithms. Current on-policy algorithms, such as Proximal Policy Optimization (PPO) and Asynchronous Advantage Actor-Critic (A3C), do not sufficiently account for cautious interaction with the environment. Our method addresses this gap by explicitly integrating cautious interaction in two critical ways: by maximizing a lower-bound on the true value function plus a constant, thereby promoting a conservative value estimation, and by incorporating Thompson sampling for cautious exploration. These features are realized through three surprisingly simple modifications to the A3C algorithm: processing advantage estimates through a ReLU function, spectral normalization, and dropout. We provide theoretical proof that our algorithm maximizes the lower bound, which also grounds Regret Matching Policy Gradients (RMPG), a discrete-action on-policy method for multi-agent reinforcement learning. Our rigorous empirical evaluations across various benchmarks consistently demonstrates our approach's improved performance against existing on-policy algorithms. This research represents a substantial step towards more cautious and effective DRL algorithms, which has the potential to unlock application to complex, real-world problems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/11/2019

Doubly Robust Off-Policy Actor-Critic Algorithms for Reinforcement Learning

We study the problem of off-policy critic evaluation in several variants...
research
08/30/2022

A further exploration of deep Multi-Agent Reinforcement Learning with Hybrid Action Space

The research of extending deep reinforcement learning (drl) to multi-age...
research
06/10/2018

Distributional Advantage Actor-Critic

In traditional reinforcement learning, an agent maximizes the reward col...
research
03/06/2021

Visual Explanation using Attention Mechanism in Actor-Critic-based Deep Reinforcement Learning

Deep reinforcement learning (DRL) has great potential for acquiring the ...
research
02/14/2023

Conservative State Value Estimation for Offline Reinforcement Learning

Offline reinforcement learning faces a significant challenge of value ov...
research
12/22/2021

Alpha-Mini: Minichess Agent with Deep Reinforcement Learning

We train an agent to compete in the game of Gardner minichess, a downsiz...
research
04/03/2019

Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning

Deep Reinforcement Learning (DRL) algorithms are known to be data ineffi...

Please sign up or login with your details

Forgot password? Click here to reset