Decoupling Value and Policy for Generalization in Reinforcement Learning

02/20/2021
by   Roberta Raileanu, et al.
0

Standard deep reinforcement learning algorithms use a shared representation for the policy and value function. However, we argue that more information is needed to accurately estimate the value function than to learn the optimal policy. Consequently, the use of a shared representation for the policy and value function can lead to overfitting. To alleviate this problem, we propose two approaches which are combined to create IDAAC: Invariant Decoupled Advantage Actor-Critic. First, IDAAC decouples the optimization of the policy and value function, using separate networks to model them. Second, it introduces an auxiliary loss which encourages the representation to be invariant to task-irrelevant properties of the environment. IDAAC shows good generalization to unseen environments, achieving a new state-of-the-art on the Procgen benchmark and outperforming popular methods on DeepMind Control tasks with distractors. Moreover, IDAAC learns representations, value predictions, and policies that are more robust to aesthetic changes in the observations that do not change the underlying state of the environment.

READ FULL TEXT

page 2

page 24

page 27

research
09/09/2020

Phasic Policy Gradient

We introduce Phasic Policy Gradient (PPG), a reinforcement learning fram...
research
05/29/2022

Representation Gap in Deep Reinforcement Learning

Deep reinforcement learning gives the promise that an agent learns good ...
research
09/10/2015

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

This paper proposes GProp, a deep reinforcement learning algorithm for c...
research
09/16/2022

Look where you look! Saliency-guided Q-networks for visual RL tasks

Deep reinforcement learning policies, despite their outstanding efficien...
research
06/01/2020

Invariant Policy Optimization: Towards Stronger Generalization in Reinforcement Learning

A fundamental challenge in reinforcement learning is to learn policies t...
research
07/30/2021

Maximum Entropy Dueling Network Architecture

In recent years, there have been many deep structures for Reinforcement ...
research
10/18/2022

Rethinking Value Function Learning for Generalization in Reinforcement Learning

We focus on the problem of training RL agents on multiple training envir...

Please sign up or login with your details

Forgot password? Click here to reset