Bridging the gap between QP-based and MPC-based RL

05/18/2022
by   Shambhuraj Sawant, et al.
0

Reinforcement learning methods typically use Deep Neural Networks to approximate the value functions and policies underlying a Markov Decision Process. Unfortunately, DNN-based RL suffers from a lack of explainability of the resulting policy. In this paper, we instead approximate the policy and value functions using an optimization problem, taking the form of Quadratic Programs (QPs). We propose simple tools to promote structures in the QP, pushing it to resemble a linear MPC scheme. A generic unstructured QP offers high flexibility for learning, while a QP having the structure of an MPC scheme promotes the explainability of the resulting policy, additionally provides ways for its analysis. The tools we propose allow for continuously adjusting the trade-off between the former and the latter during learning. We illustrate the workings of our proposed method with the resulting structure using a point-mass task.

READ FULL TEXT

page 5

page 6

research
02/02/2021

Stability-Constrained Markov Decision Processes Using MPC

In this paper, we consider solving discounted Markov Decision Processes ...
research
01/04/2023

Learning-based MPC from Big Data Using Reinforcement Learning

This paper presents an approach for learning Model Predictive Control (M...
research
09/15/2021

Infusing model predictive control into meta-reinforcement learning for mobile robots in dynamic environments

The successful operation of mobile robots requires them to rapidly adapt...
research
12/14/2020

Safe Reinforcement Learning with Stability Safety Guarantees Using Robust MPC

Reinforcement Learning offers tools to optimize policies based on the da...
research
12/07/2021

Tailored neural networks for learning optimal value functions in MPC

Learning-based predictive control is a promising alternative to optimiza...
research
06/09/2020

In Proximity of ReLU DNN, PWA Function, and Explicit MPC

Rectifier (ReLU) deep neural networks (DNN) and their connection with pi...
research
02/19/2019

Hyperbolic Discounting and Learning over Multiple Horizons

Reinforcement learning (RL) typically defines a discount factor as part ...

Please sign up or login with your details

Forgot password? Click here to reset