How Should an Agent Practice?

12/15/2019
by   Janarthanan Rajendran, et al.
17

We present a method for learning intrinsic reward functions to drive the learning of an agent during periods of practice in which extrinsic task rewards are not available. During practice, the environment may differ from the one available for training and evaluation with extrinsic rewards. We refer to this setup of alternating periods of practice and objective evaluation as practice-match, drawing an analogy to regimes of skill acquisition common for humans in sports and games. The agent must effectively use periods in the practice environment so that performance improves during matches. In the proposed method the intrinsic practice reward is learned through a meta-gradient approach that adapts the practice reward parameters to reduce the extrinsic match reward loss computed from matches. We illustrate the method on a simple grid world, and evaluate it in two games in which the practice environment differs from match: Pong with practice against a wall without an opponent, and PacMan with practice in a maze without ghosts. The results show gains from learning in practice in addition to match periods over learning in matches only.

READ FULL TEXT

page 6

page 7

research
12/11/2019

What Can Learned Intrinsic Rewards Capture?

Reinforcement learning agents can include different components, such as ...
research
05/22/2023

Developmental Curiosity and Social Interaction in Virtual Agents

Infants explore their complex physical and social environment in an orga...
research
05/18/2017

Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

The problem of sparse rewards is one of the hardest challenges in contem...
research
08/13/2018

Large-Scale Study of Curiosity-Driven Learning

Reinforcement learning algorithms rely on carefully engineering environm...
research
07/29/2021

Learning more skills through optimistic exploration

Unsupervised skill learning objectives (Gregor et al., 2016, Eysenbach e...
research
05/12/2019

Mega-Reward: Achieving Human-Level Play without Extrinsic Rewards

Intrinsic rewards are introduced to simulate how human intelligence work...
research
01/24/2020

Forecasting football matches by predicting match statistics

This paper considers the use of observed and predicted match statistics ...

Please sign up or login with your details

Forgot password? Click here to reset