Value Summation: A Novel Scoring Function for MPC-based Model-based Reinforcement Learning

09/16/2022
by   Mehran Raisi, et al.
0

This paper proposes a novel scoring function for the planning module of MPC-based model-based reinforcement learning methods to address the inherent bias of using the reward function to score trajectories. The proposed method enhances the learning efficiency of existing MPC-based MBRL methods using the discounted sum of values. The method utilizes optimal trajectories to guide policy learning and updates its state-action value function based on real-world and augmented on-board data. The learning efficiency of the proposed method is evaluated in selected MuJoCo Gym environments as well as in learning locomotion skills for a simulated model of the Cassie robot. The results demonstrate that the proposed method outperforms the current state-of-the-art algorithms in terms of learning efficiency and average reward return.

READ FULL TEXT
research
10/23/2020

Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning

Sample efficiency has been one of the major challenges for deep reinforc...
research
05/12/2021

Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning

Model based reinforcement learning (MBRL) uses an imperfect model of the...
research
03/30/2023

Switching Pushing Skill Combined MPC and Deep Reinforcement Learning for Planar Non-prehensile Manipulation

In this paper, a novel switching pushing skill algorithm is proposed to ...
research
07/05/2021

Hybrid and dynamic policy gradient optimization for bipedal robot locomotion

Controlling a non-statically bipedal robot is challenging due to the com...
research
01/19/2021

Meta-Reinforcement Learning for Adaptive Motor Control in Changing Robot Dynamics and Environments

This work developed a meta-learning approach that adapts the control pol...
research
12/10/2020

Blending MPC Value Function Approximation for Efficient Reinforcement Learning

Model-Predictive Control (MPC) is a powerful tool for controlling comple...
research
09/30/2019

Off-policy Multi-step Q-learning

In the past few years, off-policy reinforcement learning methods have sh...

Please sign up or login with your details

Forgot password? Click here to reset