Policy Search using Dynamic Mirror Descent MPC for Model Free Off Policy RL

10/23/2021
by   Soumya Rani Samineni, et al.
0

Recent works in Reinforcement Learning (RL) combine model-free (Mf)-RL algorithms with model-based (Mb)-RL approaches to get the best from both: asymptotic performance of Mf-RL and high sample-efficiency of Mb-RL. Inspired by these works, we propose a hierarchical framework that integrates online learning for the Mb-trajectory optimization with off-policy methods for the Mf-RL. In particular, two loops are proposed, where the Dynamic Mirror Descent based Model Predictive Control (DMD-MPC) is used as the inner loop to obtain an optimal sequence of actions. These actions are in turn used to significantly accelerate the outer loop Mf-RL. We show that our formulation is generic for a broad class of MPC based policies and objectives, and includes some of the well-known Mb-Mf approaches. Based on the framework we define two algorithms to increase sample efficiency of Off Policy RL and to guide end to end RL algorithms for online adaption respectively. Thus we finally introduce two novel algorithms: Dynamic-Mirror Descent Model Predictive RL(DeMoRL), which uses the method of elite fractions for the inner loop and Soft Actor-Critic (SAC) as the off-policy RL for the outer loop and Dynamic-Mirror Descent Model Predictive Layer(DeMo Layer), a special case of the hierarchical framework which guides linear policies trained using Augmented Random Search(ARS). Our experiments show faster convergence of the proposed DeMo RL, and better or equal performance compared to other Mf-Mb approaches on benchmark MuJoCo control tasks. The DeMo Layer was tested on classical Cartpole and custom-built Quadruped trained using Linear Policy.

READ FULL TEXT

page 27

page 28

research
08/22/2022

Event-Triggered Model Predictive Control with Deep Reinforcement Learning for Autonomous Driving

Event-triggered model predictive control (eMPC) is a popular optimal con...
research
06/16/2023

Actor-Critic Model Predictive Control

Despite its success, Model Predictive Control (MPC) often requires inten...
research
10/02/2019

Improving Sample Efficiency in Model-Free Reinforcement Learning from Images

Training an agent to solve control tasks directly from high-dimensional ...
research
03/24/2021

CLAMGen: Closed-Loop Arm Motion Generation via Multi-view Vision-Based RL

We propose a vision-based reinforcement learning (RL) approach for close...
research
11/06/2019

Improving reinforcement learning algorithms: towards optimal learning rate policies

This paper investigates to what extent we can improve reinforcement lear...
research
09/20/2019

NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning

One of the key challenges arising when compilers vectorize loops for tod...
research
05/31/2022

A Meta Reinforcement Learning Approach for Predictive Autoscaling in the Cloud

Predictive autoscaling (autoscaling with workload forecasting) is an imp...

Please sign up or login with your details

Forgot password? Click here to reset