Maximizing Information Gain in Partially Observable Environments via Prediction Reward

05/11/2020
by   Yash Satsangi, et al.
32

Information gathering in a partially observable environment can be formulated as a reinforcement learning (RL), problem where the reward depends on the agent's uncertainty. For example, the reward can be the negative entropy of the agent's belief over an unknown (or hidden) variable. Typically, the rewards of an RL agent are defined as a function of the state-action pairs and not as a function of the belief of the agent; this hinders the direct application of deep RL methods for such tasks. This paper tackles the challenge of using belief-based rewards for a deep RL agent, by offering a simple insight that maximizing any convex function of the belief of the agent can be approximated by instead maximizing a prediction reward: a reward based on prediction accuracy. In particular, we derive the exact error between negative entropy and the expected prediction reward. This insight provides theoretical motivation for several fields using prediction rewards—namely visual attention, question answering systems, and intrinsic motivation—and highlights their connection to the usually distinct fields of active perception, active sensing, and sensor placement. Based on this insight we present deep anticipatory networks (DANs), which enables an agent to take actions to reduce its uncertainty without performing explicit belief inference. We present two applications of DANs: building a sensor selection system for tracking people in a shopping mall and learning discrete models of attention on fashion MNIST and MNIST digit classification.

READ FULL TEXT
research
07/14/2021

Experimental Evidence that Empowerment May Drive Exploration in Sparse-Reward Environments

Reinforcement Learning (RL) is known to be often unsuccessful in environ...
research
05/31/2016

VIME: Variational Information Maximizing Exploration

Scalable and effective exploration remains a key challenge in reinforcem...
research
11/03/2022

Sensor Control for Information Gain in Dynamic, Sparse and Partially Observed Environments

We present an approach for autonomous sensor control for information gat...
research
11/26/2009

A conversion between utility and information

Rewards typically express desirabilities or preferences over a set of al...
research
03/01/2020

Information Theoretic Characterization of Uncertainty Distinguishes Surprise From Accuracy Signals in the Brain

Uncertainty presents a problem for both human and machine decision-mak...
research
01/22/2021

Prior Preference Learning from Experts:Designing a Reward with Active Inference

Active inference may be defined as Bayesian modeling of a brain with a b...
research
12/18/2017

'Indifference' methods for managing agent rewards

Indifference is a class of methods that are used to control a reward bas...

Please sign up or login with your details

Forgot password? Click here to reset