Finding Useful Predictions by Meta-gradient Descent to Improve Decision-making

11/18/2021
by   Alex Kearney, et al.
0

In computational reinforcement learning, a growing body of work seeks to express an agent's model of the world through predictions about future sensations. In this manuscript we focus on predictions expressed as General Value Functions: temporally extended estimates of the accumulation of a future signal. One challenge is determining from the infinitely many predictions that the agent could possibly make which might support decision-making. In this work, we contribute a meta-gradient descent method by which an agent can directly specify what predictions it learns, independent of designer instruction. To that end, we introduce a partially observable domain suited to this investigation. We then demonstrate that through interaction with the environment an agent can independently select predictions that resolve the partial-observability, resulting in performance similar to expertly chosen value functions. By learning, rather than manually specifying these predictions, we enable the agent to identify useful predictions in a self-supervised manner, taking a step towards truly autonomous systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2022

What Should I Know? Using Meta-gradient Descent for Predictive Feature Discovery in a Single Stream of Experience

In computational reinforcement learning, a growing body of work seeks to...
research
01/11/2022

Pavlovian Signalling with General Value Functions in Agent-Agent Temporal Decision Making

In this paper, we contribute a multi-faceted study into Pavlovian signal...
research
03/17/2022

The Frost Hollow Experiments: Pavlovian Signalling as a Path to Coordination and Communication Between Agents

Learned communication between agents is a powerful tool when approaching...
research
04/18/2019

When is a Prediction Knowledge?

Within Reinforcement Learning, there is a growing collection of research...
research
06/17/2016

Introspective Agents: Confidence Measures for General Value Functions

Agents of general intelligence deployed in real-world scenarios must ada...
research
12/30/2021

Learning Agent State Online with Recurrent Generate-and-Test

Learning continually and online from a continuous stream of data is chal...
research
09/25/2017

The Consciousness Prior

A new prior is proposed for representation learning, which can be combin...

Please sign up or login with your details

Forgot password? Click here to reset