Bootstrap Advantage Estimation for Policy Optimization in Reinforcement Learning

10/13/2022
by   Md Masudur Rahman, et al.
0

This paper proposes an advantage estimation approach based on data augmentation for policy optimization. Unlike using data augmentation on the input to learn value and policy function as existing methods use, our method uses data augmentation to compute a bootstrap advantage estimation. This Bootstrap Advantage Estimation (BAE) is then used for learning and updating the gradient of policy and value function. To demonstrate the effectiveness of our approach, we conducted experiments on several environments. These environments are from three benchmarks: Procgen, Deepmind Control, and Pybullet, which include both image and vector-based observations; discrete and continuous action spaces. We observe that our method reduces the policy and the value loss better than the Generalized advantage estimation (GAE) method and eventually improves cumulative return. Furthermore, our method performs better than two recently proposed data augmentation techniques (RAD and DRAC). Overall, our method performs better empirically than baselines in sample efficiency and generalization, where the agent is tested in unseen environments.

READ FULL TEXT

page 1

page 3

page 4

research
06/29/2021

Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation

The generalization gap in reinforcement learning (RL) has been a signifi...
research
02/21/2022

Don't Touch What Matters: Task-Aware Lipschitz Data Augmentation for Visual Reinforcement Learning

One of the key challenges in visual Reinforcement Learning (RL) is to le...
research
12/14/2022

Robust Policy Optimization in Deep Reinforcement Learning

The policy gradient method enjoys the simplicity of the objective where ...
research
02/17/2021

Time Matters in Using Data Augmentation for Vision-based Deep Reinforcement Learning

Data augmentation technique from computer vision has been widely conside...
research
04/26/2023

CROP: Towards Distributional-Shift Robust Reinforcement Learning using Compact Reshaped Observation Processing

The safe application of reinforcement learning (RL) requires generalizat...
research
08/25/2020

Ensuring Monotonic Policy Improvement in Entropy-regularized Value-based Reinforcement Learning

This paper aims to establish an entropy-regularized value-based reinforc...
research
06/03/2022

Estimation of Over-parameterized Models via Fitting to Future Observations

From a model-building perspective, in this paper we propose a paradigm s...

Please sign up or login with your details

Forgot password? Click here to reset