World Value Functions: Knowledge Representation for Learning and Planning

06/23/2022
by   Geraud Nangue Tasse, et al.
0

We propose world value functions (WVFs), a type of goal-oriented general value function that represents how to solve not just a given task, but any other goal-reaching task in an agent's environment. This is achieved by equipping an agent with an internal goal space defined as all the world states where it experiences a terminal transition. The agent can then modify the standard task rewards to define its own reward function, which provably drives it to learn how to achieve all reachable internal goals, and the value of doing so in the current task. We demonstrate two key benefits of WVFs in the context of learning and planning. In particular, given a learned WVF, an agent can compute the optimal policy in a new task by simply estimating the task's reward function. Furthermore, we show that WVFs also implicitly encode the transition dynamics of the environment, and so can be used to perform planning. Experimental results show that WVFs can be learned faster than regular value functions, while their ability to infer the environment's dynamics can be used to integrate learning and planning methods to further improve sample efficiency.

READ FULL TEXT

page 3

page 4

research
05/18/2022

World Value Functions: Knowledge Representation for Multitask Reinforcement Learning

An open problem in artificial intelligence is how to learn and represent...
research
11/20/2022

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

We introduce a physiological model-based agent as proof-of-principle tha...
research
08/17/2023

A Mathematical Characterization of Minimally Sufficient Robot Brains

This paper addresses the lower limits of encoding and processing the inf...
research
03/09/2020

Transfer Reinforcement Learning under Unobserved Contextual Information

In this paper, we study a transfer reinforcement learning problem where ...
research
11/09/2020

Planning under Uncertainty to Goal Distributions

Goal spaces for planning problems are typically conceived of as subsets ...
research
10/15/2020

Avoiding Side Effects By Considering Future Tasks

Designing reward functions is difficult: the designer has to specify wha...

Please sign up or login with your details

Forgot password? Click here to reset