Robust Policy Gradient against Strong Data Corruption

02/11/2021
by   Xuezhou Zhang, et al.
7

We study the problem of robust reinforcement learning under adversarial corruption on both rewards and transitions. Our attack model assumes an adaptive adversary who can arbitrarily corrupt the reward and transition at every step within an episode, for at most ϵ-fraction of the learning episodes. Our attack model is strictly stronger than those considered in prior works. Our first result shows that no algorithm can find a better than O(ϵ)-optimal policy under our attack model. Next, we show that surprisingly the natural policy gradient (NPG) method retains a natural robustness property if the reward corruption is bounded, and can find an O(√(ϵ))-optimal policy. Consequently, we develop a Filtered Policy Gradient (FPG) algorithm that can tolerate even unbounded reward corruption and can find an O(ϵ^1/4)-optimal policy. We emphasize that FPG is the first that can achieve a meaningful learning guarantee when a constant fraction of episodes are corrupted. Complimentary to the theoretical results, we show that a neural implementation of FPG achieves strong robust learning performance on the MuJoCo continuous control benchmarks.

READ FULL TEXT
research
02/17/2021

On the Convergence and Sample Efficiency of Variance-Reduced Policy Gradient Method

Policy gradient gives rise to a rich class of reinforcement learning (RL...
research
07/12/2023

Reward Selection with Noisy Observations

We study a fundamental problem in optimization under uncertainty. There ...
research
03/22/2023

Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality

A novel Policy Gradient (PG) algorithm, called Matryoshka Policy Gradien...
research
05/28/2021

A nearly Blackwell-optimal policy gradient method

For continuing environments, reinforcement learning methods commonly max...
research
03/11/2021

Policy Search with Rare Significant Events: Choosing the Right Partner to Cooperate with

This paper focuses on a class of reinforcement learning problems where s...
research
06/11/2021

Corruption-Robust Offline Reinforcement Learning

We study the adversarial robustness in offline reinforcement learning. G...
research
01/28/2022

On the Hidden Biases of Policy Mirror Ascent in Continuous Action Spaces

We focus on parameterized policy search for reinforcement learning over ...

Please sign up or login with your details

Forgot password? Click here to reset