Meta-Gradient Reinforcement Learning with an Objective Discovered Online

07/16/2020
by   Zhongwen Xu, et al.
9

Deep reinforcement learning includes a broad family of algorithms that parameterise an internal representation, such as a value function or policy, by a deep neural network. Each algorithm optimises its parameters with respect to an objective, such as Q-learning or policy gradient, that defines its semantics. In this work, we propose an algorithm based on meta-gradient descent that discovers its own objective, flexibly parameterised by a deep neural network, solely from interactive experience with its environment. Over time, this allows the agent to learn how to learn increasingly effectively. Furthermore, because the objective is discovered online, it can adapt to changes over time. We demonstrate that the algorithm discovers how to address several important issues in RL, such as bootstrapping, non-stationarity, and off-policy learning. On the Atari Learning Environment, the meta-gradient algorithm adapts over time to learn with greater efficiency, eventually outperforming the median score of a strong actor-critic baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2022

Recursive Least Squares Advantage Actor-Critic Algorithms

As an important algorithm in deep reinforcement learning, advantage acto...
research
11/19/2022

Debiasing Meta-Gradient Reinforcement Learning by Learning the Outer Value Function

Meta-gradient Reinforcement Learning (RL) allows agents to self-tune the...
research
06/16/2020

Parameter-based Value Functions

Learning value functions off-policy is at the core of modern Reinforceme...
research
11/22/2021

Policy Gradient and Actor-Critic Learning in Continuous Time and Space: Theory and Algorithms

We study policy gradient (PG) for reinforcement learning in continuous t...
research
07/17/2020

Discovering Reinforcement Learning Algorithms

Reinforcement learning (RL) algorithms update an agent's parameters acco...
research
12/22/2021

Newsvendor Model with Deep Reinforcement Learning

I present a deep reinforcement learning (RL) solution to the mathematica...

Please sign up or login with your details

Forgot password? Click here to reset