SMiRL: Surprise Minimizing RL in Dynamic Environments

12/11/2019
by   Glen Berseth, et al.
8

All living organisms struggle against the forces of nature to carve out niches where they can maintain homeostasis. We propose that such a search for order amidst chaos might offer a unifying principle for the emergence of useful behaviors in artificial agents. We formalize this idea into an unsupervised reinforcement learning method called surprise minimizing RL (SMiRL). SMiRL trains an agent with the objective of maximizing the probability of observed states under a model trained on previously seen states. The resulting agents can acquire proactive behaviors that seek out and maintain stable conditions, such as balancing and damage avoidance, that are closely tied to an environment's prevailing sources of entropy, such as wind, earthquakes, and other agents. We demonstrate that our surprise minimizing agents can successfully play Tetris, Doom, control a humanoid to avoid falls and navigate to escape enemy agents, without any task-specific reward supervision. We further show that SMiRL can be used together with a standard task reward to accelerate reward-driven learning.

READ FULL TEXT

page 2

page 3

page 5

page 7

research
05/18/2019

Evolving Rewards to Automate Reinforcement Learning

Many continuous control tasks have easily formulated objectives, yet usi...
research
07/26/2019

A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment

Empowerment is an information-theoretic method that can be used to intri...
research
07/12/2021

Explore and Control with Adversarial Surprise

Reinforcement learning (RL) provides a framework for learning goal-direc...
research
12/05/2019

Learning Human Objectives by Evaluating Hypothetical Behavior

We seek to align agent behavior with a user's objectives in a reinforcem...
research
02/06/2022

Learning Synthetic Environments and Reward Networks for Reinforcement Learning

We introduce Synthetic Environments (SEs) and Reward Networks (RNs), rep...
research
12/03/2019

Optimal Farsighted Agents Tend to Seek Power

Some researchers have speculated that capable reinforcement learning (RL...
research
09/14/2021

Benchmarking the Spectrum of Agent Capabilities

Evaluating the general abilities of intelligent agents requires complex ...

Please sign up or login with your details

Forgot password? Click here to reset