'Indifference' methods for managing agent rewards

12/18/2017
by   Stuart Armstrong, et al.
0

Indifference is a class of methods that are used to control a reward based agent, by, for example, safely changing their reward or policy, or making the agent behave as if a certain outcome could never happen. These methods of control work even if the implications of the agent's reward are otherwise not fully understood. Though they all come out of similar ideas, indifference techniques can be classified as way of achieving one or more of three distinct goals: rewards dependent on certain events (with no motivation for the agent to manipulate the probability of those events), effective disbelief that an event will ever occur, and seamless transition from one behaviour to another. There are five basic methods to achieve these three goals. This paper classifies and analyses these methods on POMDPs (though the methods are highly portable to other agent designs), and establishes their uses, strengths, and limitations. It aims to make the tools of indifference generally accessible and usable to agent designers.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset