Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

06/04/2022
by   Dilip Arumugam, et al.
0

The quintessential model-based reinforcement-learning agent iteratively refines its estimates or prior beliefs about the true underlying model of the environment. Recent empirical successes in model-based reinforcement learning with function approximation, however, eschew the true model in favor of a surrogate that, while ignoring various facets of the environment, still facilitates effective planning over behaviors. Recently formalized as the value equivalence principle, this algorithmic technique is perhaps unavoidable as real-world reinforcement learning demands consideration of a simple, computationally-bounded agent interacting with an overwhelmingly complex environment, whose underlying dynamics likely exceed the agent's capacity for representation. In this work, we consider the scenario where agent limitations may entirely preclude identifying an exactly value-equivalent model, immediately giving rise to a trade-off between identifying a model that is simple enough to learn while only incurring bounded sub-optimality. To address this problem, we introduce an algorithm that, using rate-distortion theory, iteratively computes an approximately-value-equivalent, lossy compression of the environment which an agent may feasibly target in lieu of the true model. We prove an information-theoretic, Bayesian regret bound for our algorithm that holds for any finite-horizon, episodic sequential decision-making problem. Crucially, our regret bound can be expressed in one of two possible forms, providing a performance guarantee for finding either the simplest model that achieves a desired sub-optimality gap or, alternatively, the best model given a limit on agent capacity.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/04/2022

Between Rate-Distortion Theory Value Equivalence in Model-Based Reinforcement Learning

The quintessential model-based reinforcement-learning agent iteratively ...
research
01/15/2021

Deciding What to Learn: A Rate-Distortion Approach

Agents that learn to select optimal actions represent a prominent focus ...
research
10/30/2022

On Rate-Distortion Theory in Capacity-Limited Cognition Reinforcement Learning

Throughout the cognitive-science literature, there is widespread agreeme...
research
10/26/2021

The Value of Information When Deciding What to Learn

All sequential decision-making agents explore so as to acquire knowledge...
research
06/09/2022

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...
research
05/05/2021

H-TD2: Hybrid Temporal Difference Learning for Adaptive Urban Taxi Dispatch

We present H-TD2: Hybrid Temporal Difference Learning for Taxi Dispatch,...
research
08/16/2023

Eliciting Risk Aversion with Inverse Reinforcement Learning via Interactive Questioning

This paper proposes a novel framework for identifying an agent's risk av...

Please sign up or login with your details

Forgot password? Click here to reset