Deciding What to Learn: A Rate-Distortion Approach

01/15/2021
by   Dilip Arumugam, et al.
0

Agents that learn to select optimal actions represent a prominent focus of the sequential decision-making literature. In the face of a complex environment or constraints on time and resources, however, aiming to synthesize such an optimal policy can become infeasible. These scenarios give rise to an important trade-off between the information an agent must acquire to learn and the sub-optimality of the resulting policy. While an agent designer has a preference for how this trade-off is resolved, existing approaches further require that the designer translate these preferences into a fixed learning target for the agent. In this work, leveraging rate-distortion theory, we automate this process such that the designer need only express their preferences via a single hyperparameter and the agent is endowed with the ability to compute its own learning targets that best achieve the desired trade-off. We establish a general bound on expected discounted regret for an agent that decides what to learn in this manner along with computational experiments that illustrate the expressiveness of designer preferences and even show improvements over Thompson sampling in identifying an optimal policy.

READ FULL TEXT
research
10/26/2021

The Value of Information When Deciding What to Learn

All sequential decision-making agents explore so as to acquire knowledge...
research
06/04/2022

Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning

The quintessential model-based reinforcement-learning agent iteratively ...
research
10/22/2017

Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints

In this semi-tutorial paper, we first review the information-theoretic a...
research
03/08/2022

Policy Regularization for Legible Behavior

In Reinforcement Learning interpretability generally means to provide in...
research
10/31/2017

Servant of Many Masters: Shifting priorities in Pareto-optimal sequential decision-making

It is often argued that an agent making decisions on behalf of two or mo...
research
02/24/2017

Bayes-Optimal Entropy Pursuit for Active Choice-Based Preference Learning

We analyze the problem of learning a single user's preferences in an act...
research
07/25/2022

Modelling non-reinforced preferences using selective attention

How can artificial agents learn non-reinforced preferences to continuous...

Please sign up or login with your details

Forgot password? Click here to reset