Unsupervised Real-Time Control through Variational Empowerment

10/13/2017
by   Maximilian Karl, et al.
0

We introduce a methodology for efficiently computing a lower bound to empowerment, allowing it to be used as an unsupervised cost function for policy learning in real-time control. Empowerment, being the channel capacity between actions and states, maximises the influence of an agent on its near future. It has been shown to be a good model of biological behaviour in the absence of an extrinsic goal. But empowerment is also prohibitively hard to compute, especially in nonlinear continuous spaces. We introduce an efficient, amortised method for learning empowerment-maximising policies. We demonstrate that our algorithm can reliably handle continuous dynamical systems using system dynamics learned from raw data. The resulting policies consistently drive the agents into states where they can use their full potential.

READ FULL TEXT

page 6

page 8

research
12/23/2019

Hamilton-Jacobi-Bellman Equations for Q-Learning in Continuous Time

In this paper, we introduce Hamilton-Jacobi-Bellman (HJB) equations for ...
research
02/22/2022

A Benchmark Comparison of Learned Control Policies for Agile Quadrotor Flight

Quadrotors are highly nonlinear dynamical systems that require carefully...
research
03/14/2019

On Applications of Bootstrap in Continuous Space Reinforcement Learning

In decision making problems for continuous state and action spaces, line...
research
08/03/2018

Structured Neural Network Dynamics for Model-based Control

We present a structured neural network architecture that is inspired by ...
research
02/26/2020

SACBP: Belief Space Planning for Continuous-Time Dynamical Systems via Stochastic Sequential Action Control

We propose a novel belief space planning technique for continuous dynami...
research
01/22/2020

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can ...
research
11/07/2022

C3PO: Learning to Achieve Arbitrary Goals via Massively Entropic Pretraining

Given a particular embodiment, we propose a novel method (C3PO) that lea...

Please sign up or login with your details

Forgot password? Click here to reset