Seeking entropy: complex behavior from intrinsic motivation to occupy action-state path space

05/20/2022
by   Jorge Ramírez-Ruiz, et al.
0

Intrinsic motivation generates behaviors that do not necessarily lead to immediate reward, but help exploration and learning. Here we show that agents having the sole goal of maximizing occupancy of future actions and states, that is, moving and exploring on the long term, are capable of complex behavior without any reference to external rewards. We find that action-state path entropy is the only measure consistent with additivity and other intuitive properties of expected future action-state path occupancy. We provide analytical expressions that relate the optimal policy with the optimal state-value function, from where we prove uniqueness of the solution of the associated Bellman equation and convergence of our algorithm to the optimal state-value function. Using discrete and continuous state tasks, we show that `dancing', hide-and-seek and a basic form of altruistic behavior naturally result from entropy seeking without external rewards. Intrinsically motivated agents can objectively determine what states constitute rewards, exploiting them to ultimately maximize action-state path entropy.

READ FULL TEXT

page 5

page 7

page 8

research
06/19/2019

QXplore: Q-learning Exploration by Maximizing Temporal Difference Error

A major challenge in reinforcement learning for continuous state-action ...
research
06/09/2018

Explainable Deterministic MDPs

We present a method for a certain class of Markov Decision Processes (MD...
research
12/19/2013

Avoiding Confusion between Predictors and Inhibitors in Value Function Approximation

In reinforcement learning, the goal is to seek rewards and avoid punishm...
research
06/24/2019

In Hindsight: A Smooth Reward for Steady Exploration

In classical Q-learning, the objective is to maximize the sum of discoun...
research
03/31/2023

Environmental path-entropy and collective motion

Inspired by the swarming or flocking of animal systems we study groups o...
research
01/18/2021

Learning Successor States and Goal-Dependent Values: A Mathematical Viewpoint

In reinforcement learning, temporal difference-based algorithms can be s...
research
10/27/2020

Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles

Learning complex behaviors through interaction requires coordinated long...

Please sign up or login with your details

Forgot password? Click here to reset