Learning State Representations from Random Deep Action-conditional Predictions

02/09/2021
by   Zeyu Zheng, et al.
0

In this work, we study auxiliary prediction tasks defined by temporal-difference networks (TD networks); these networks are a language for expressing a rich space of general value function (GVF) prediction targets that may be learned efficiently with TD. Through analysis in an illustrative domain we show the benefits to learning state representations of exploiting the full richness of TD networks, including both action-conditional predictions and temporally deep predictions. Our main (and perhaps surprising) result is that deep action-conditional TD networks with random structures that create random prediction-questions about random features yield state representations that are competitive with state-of-the-art hand-crafted value prediction and pixel control auxiliary tasks in both Atari games and DeepMind Lab tasks. We also show through stop-gradient experiments that learning the state representations solely via these unsupervised random TD network prediction tasks yield agents that outperform the end-to-end-trained actor-critic baseline.

READ FULL TEXT
research
04/29/2020

How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

Deterministic-policy actor-critic algorithms for continuous control impr...
research
04/27/2023

Discovering Object-Centric Generalized Value Functions From Pixels

Deep Reinforcement Learning has shown significant progress in extracting...
research
09/10/2019

Discovery of Useful Questions as Auxiliary Tasks

Arguably, intelligent agents ought to be able to discover their own ques...
research
04/25/2023

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Auxiliary tasks improve the representations learned by deep reinforcemen...
research
04/01/2022

What makes useful auxiliary tasks in reinforcement learning: investigating the effect of the target policy

Auxiliary tasks have been argued to be useful for representation learnin...
research
10/08/2019

Deep Value Model Predictive Control

In this paper, we introduce an actor-critic algorithm called Deep Value ...
research
10/28/2020

Learning to Represent Action Values as a Hypergraph on the Action Vertices

Action-value estimation is a critical component of many reinforcement le...

Please sign up or login with your details

Forgot password? Click here to reset