Learning Continuous Control Policies by Stochastic Value Gradients

10/30/2015
by   Nicolas Heess, et al.
0

We present a unified framework for learning continuous control policies using backpropagation. It supports stochastic control by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise. The product is a spectrum of general policy gradient algorithms that range from model-free methods with value functions to model-based methods without value functions. We use learned models but only require observations from the environment in- stead of observations from model-predicted trajectories, minimizing the impact of compounded model errors. We apply these algorithms first to a toy stochastic control problem and then to several physics-based control problems in simulation. One of these variants, SVG(1), shows the effectiveness of learning models, value functions, and policies simultaneously in continuous domains.

READ FULL TEXT
research
09/09/2019

Deterministic Value-Policy Gradients

Reinforcement learning algorithms such as the deep deterministic policy ...
research
11/30/2020

Model-based controlled learning of MDP policies with an application to lost-sales inventory control

Recent literature established that neural networks can represent good MD...
research
03/02/2016

Continuous Deep Q-Learning with Model-based Acceleration

Model-free reinforcement learning has been successfully applied to a ran...
research
08/28/2020

On the model-based stochastic value gradient for continuous reinforcement learning

Model-based reinforcement learning approaches add explicit domain knowle...
research
03/03/2023

Learning Stabilization Control from Observations by Learning Lyapunov-like Proxy Models

The deployment of Reinforcement Learning to robotics applications faces ...
research
06/28/2023

Continuous-Time q-learning for McKean-Vlasov Control Problems

This paper studies the q-learning, recently coined as the continuous-tim...
research
10/11/2022

Learning Control Policies for Region Stabilization in Stochastic Systems

We consider the problem of learning control policies in stochastic syste...

Please sign up or login with your details

Forgot password? Click here to reset