From Pixels to Torques: Policy Learning with Deep Dynamical Models

02/08/2015
by   Niklas Wahlström, et al.
0

Data-efficient learning in continuous state-action spaces using very high-dimensional observations remains a key challenge in developing fully autonomous systems. In this paper, we consider one instance of this challenge, the pixels to torques problem, where an agent must learn a closed-loop control policy from pixel information only. We introduce a data-efficient, model-based reinforcement learning algorithm that learns such a closed-loop policy directly from pixel information. The key ingredient is a deep dynamical model that uses deep auto-encoders to learn a low-dimensional embedding of images jointly with a predictive model in this low-dimensional feature space. Joint learning ensures that not only static but also dynamic properties of the data are accounted for. This is crucial for long-term predictions, which lie at the core of the adaptive model predictive control strategy that we use for closed-loop control. Compared to state-of-the-art reinforcement learning methods for continuous states and actions, our approach learns quickly, scales to high-dimensional state spaces and is an important step toward fully autonomous learning from pixels to torques.

READ FULL TEXT

page 2

page 6

page 7

research
10/08/2015

Data-Efficient Learning of Feedback Policies from Image Pixels using Deep Dynamical Models

Data-efficient reinforcement learning (RL) in continuous state-action sp...
research
10/28/2014

Learning deep dynamical models from image pixels

Modeling dynamical systems is important in many disciplines, e.g., contr...
research
01/09/2020

Closed-loop deep learning: generating forward models with back-propagation

A reflex is a simple closed loop control approach which tries to minimis...
research
04/07/2021

GEM: Group Enhanced Model for Learning Dynamical Control Systems

Learning the dynamics of a physical system wherein an autonomous agent o...
research
01/12/2019

Learning Accurate Extended-Horizon Predictions of High Dimensional Trajectories

We present a novel predictive model architecture based on the principles...
research
04/11/2016

A statistical learning strategy for closed-loop control of fluid flows

This work discusses a closed-loop control strategy for complex systems u...
research
02/12/2022

Learning by Doing: Controlling a Dynamical System using Causality, Control, and Reinforcement Learning

Questions in causality, control, and reinforcement learning go beyond th...

Please sign up or login with your details

Forgot password? Click here to reset