Curiosity Killed the Cat and the Asymptotically Optimal Agent

06/05/2020
by   Michael K. Cohen, et al.
6

Reinforcement learners are agents that learn to pick actions that lead to high reward. Ideally, the value of a reinforcement learner's policy approaches optimality–where the optimal informed policy is the one which maximizes reward. Unfortunately, we show that if an agent is guaranteed to be "asymptotically optimal" in any (stochastically computable) environment, then subject to an assumption about the true environment, this agent will be either destroyed or incapacitated with probability 1; both of these are forms of traps as understood in the Markov Decision Process literature. Environments with traps pose a well-known problem for agents, but we are unaware of other work which shows that traps are not only a risk, but a certainty, for agents of a certain caliber. Much work in reinforcement learning uses an ergodicity assumption to avoid this problem. Often, doing theoretical research under simplifying assumptions prepares us to provide practical solutions even in the absence of those assumptions, but the ergodicity assumption in reinforcement learning may have led us entirely astray in preparing safe and effective exploration strategies for agents in dangerous environments. Rather than assuming away the problem, we present an agent with the modest guarantee of approaching the performance of a mentor, doing safe exploration instead of reckless exploration.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/04/2019

Strong Asymptotic Optimality in General Environments

Reinforcement Learning agents are expected to eventually perform well. T...
research
05/29/2018

Virtuously Safe Reinforcement Learning

We show that when a third party, the adversary, steps into the two-party...
research
06/15/2020

Pessimism About Unknown Unknowns Inspires Conservatism

If we could define the set of all bad outcomes, we could hard-code an ag...
research
06/30/2018

Modeling Friends and Foes

How can one detect friendly and adversarial behavior from raw data? Dete...
research
09/10/2019

Learning Transferable Domain Priors for Safe Exploration in Reinforcement Learning

Prior access to domain knowledge could significantly improve the perform...
research
09/29/2012

Optimistic Agents are Asymptotically Optimal

We use optimism to introduce generic asymptotically optimal reinforcemen...
research
09/29/2018

Reinforcement Learning in R

Reinforcement learning refers to a group of methods from artificial inte...

Please sign up or login with your details

Forgot password? Click here to reset