Representation Gap in Deep Reinforcement Learning

05/29/2022
by   Qiang He, et al.
0

Deep reinforcement learning gives the promise that an agent learns good policy from high-dimensional information. Whereas representation learning removes irrelevant and redundant information and retains pertinent information. We consider the representation capacity of action value function and theoretically reveal its inherent property, representation gap with its target action value function. This representation gap is favorable. However, through illustrative experiments, we show that the representation of action value function grows similarly compared with its target value function, i.e. the undesirable inactivity of the representation gap (representation overlap). Representation overlap results in a loss of representation capacity, which further leads to sub-optimal learning performance. To activate the representation gap, we propose a simple but effective framework Policy Optimization from Preventing Representation Overlaps (POPRO), which regularizes the policy evaluation phase through differing the representation of action value function from its target. We also provide the convergence rate guarantee of POPRO. We evaluate POPRO on gym continuous control suites. The empirical results show that POPRO using pixel inputs outperforms or parallels the sample-efficiency of methods that use state-based features.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/20/2021

Decoupling Value and Policy for Generalization in Reinforcement Learning

Standard deep reinforcement learning algorithms use a shared representat...
research
10/22/2020

Reinforcement Learning with Combinatorial Actions: An Application to Vehicle Routing

Value-function-based methods have long played an important role in reinf...
research
11/18/2019

Gamma-Nets: Generalizing Value Estimation over Timescale

We present Γ-nets, a method for generalizing value function estimation o...
research
06/10/2019

Deep Reinforcement Learning with Discrete Normalized Advantage Functions for Resource Management in Network Slicing

Network slicing promises to provision diversified services with distinct...
research
11/19/2014

Compress and Control

This paper describes a new information-theoretic policy evaluation techn...
research
10/28/2021

Cooperative Deep Q-learning Framework for Environments Providing Image Feedback

In this paper, we address two key challenges in deep reinforcement learn...
research
08/17/2022

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (GCRL) has a wide range of poten...

Please sign up or login with your details

Forgot password? Click here to reset