Planning to Give Information in Partially Observed Domains with a Learned Weighted Entropy Model
In many real-world robotic applications, an autonomous agent must act within and explore a partially observed environment that is unobserved by its human teammate. We consider such a setting in which the agent can, while acting, transmit declarative information to the human that helps them understand aspects of this unseen environment. Importantly, we should expect the human to have preferences about what information they are given and when they are given it. In this work, we adopt an information-theoretic view of the human's preferences: the human scores a piece of information as a function of the induced reduction in weighted entropy of their belief about the environment state. We formulate this setting as a POMDP and give a practical algorithm for solving it approximately. Then, we give an algorithm that allows the agent to sample-efficiently learn the human's preferences online. Finally, we describe an extension in which the human's preferences are time-varying. We validate our approach experimentally in two planning domains: a 2D robot mining task and a more realistic 3D robot fetching task.
READ FULL TEXT