Adversarially Guided Actor-Critic

02/08/2021
by   Yannis Flet-Berliac, et al.
5

Despite definite success in deep reinforcement learning problems, actor-critic algorithms are still confronted with sample inefficiency in complex environments, particularly in tasks where efficient exploration is a bottleneck. These methods consider a policy (the actor) and a value function (the critic) whose respective losses are built using different motivations and approaches. This paper introduces a third protagonist: the adversary. While the adversary mimics the actor by minimizing the KL-divergence between their respective action distributions, the actor, in addition to learning to solve the task, tries to differentiate itself from the adversary predictions. This novel objective stimulates the actor to follow strategies that could not have been correctly predicted from previous trajectories, making its behavior innovative in tasks where the reward is extremely rare. Our experimental analysis shows that the resulting Adversarially Guided Actor-Critic (AGAC) algorithm leads to more exhaustive exploration. Notably, AGAC outperforms current state-of-the-art methods on a set of various hard-exploration and procedurally-generated tasks.

READ FULL TEXT

page 5

page 8

page 9

page 14

research
06/23/2022

CGAR: Critic Guided Action Redistribution in Reinforcement Leaning

Training a game-playing reinforcement learning agent requires multiple i...
research
04/09/2021

Behavior-Guided Actor-Critic: Improving Exploration via Learning Policy Behavior Representation for Deep Reinforcement Learning

In this work, we propose Behavior-Guided Actor-Critic (BAC), an off-poli...
research
07/29/2020

Learning Object-conditioned Exploration using Distributed Soft Actor Critic

Object navigation is defined as navigating to an object of a given label...
research
09/08/2021

ADER:Adapting between Exploration and Robustness for Actor-Critic Methods

Combining off-policy reinforcement learning methods with function approx...
research
05/07/2020

Curious Hierarchical Actor-Critic Reinforcement Learning

Hierarchical abstraction and curiosity-driven exploration are two common...
research
07/06/2019

Entropic Regularization of Markov Decision Processes

An optimal feedback controller for a given Markov decision process (MDP)...
research
06/07/2023

Adaptive Frequency Green Light Optimal Speed Advisory based on Hybrid Actor-Critic Reinforcement Learning

Green Light Optimal Speed Advisory (GLOSA) system suggests speeds to veh...

Please sign up or login with your details

Forgot password? Click here to reset