Deep Value Model Predictive Control

10/08/2019
by   Farbod Farshidian, et al.
0

In this paper, we introduce an actor-critic algorithm called Deep Value Model Predictive Control (DMPC), which combines model-based trajectory optimization with value function estimation. The DMPC actor is a Model Predictive Control (MPC) optimizer with an objective function defined in terms of a value function estimated by the critic. We show that our MPC actor is an importance sampler, which minimizes an upper bound of the cross-entropy to the state distribution of the optimal sampling policy. In our experiments with a Ballbot system, we show that our algorithm can work with sparse and binary reward signals to efficiently solve obstacle avoidance and target reaching tasks. Compared to previous work, we show that including the value function in the running cost of the trajectory optimizer speeds up the convergence. We also discuss the necessary strategies to robustify the algorithm in practice.

READ FULL TEXT

page 8

page 15

research
08/03/2021

Variational Actor-Critic Algorithms

We introduce a class of variational actor-critic algorithms based on a v...
research
12/21/2021

Soft Actor-Critic with Cross-Entropy Policy Optimization

Soft Actor-Critic (SAC) is one of the state-of-the-art off-policy reinfo...
research
06/10/2018

Distributional Advantage Actor-Critic

In traditional reinforcement learning, an agent maximizes the reward col...
research
02/24/2021

Safe Learning-based Gradient-free Model Predictive Control Based on Cross-entropy Method

In this paper, a safe and learning-based control framework for model pre...
research
05/16/2020

Model-Augmented Actor-Critic: Backpropagating through Paths

Current model-based reinforcement learning approaches use the model simp...
research
03/20/2013

On Constructing the Value Function for Optimal Trajectory Problem and its Application to Image Processing

We proposed an algorithm for solving Hamilton-Jacobi equation associated...
research
02/09/2021

Learning State Representations from Random Deep Action-conditional Predictions

In this work, we study auxiliary prediction tasks defined by temporal-di...

Please sign up or login with your details

Forgot password? Click here to reset