Objective Robustness in Deep Reinforcement Learning
We study objective robustness failures, a type of out-of-distribution robustness failure in reinforcement learning (RL). Objective robustness failures occur when an RL agent retains its capabilities off-distribution yet pursues the wrong objective. We provide the first explicit empirical demonstrations of objective robustness failures and argue that this type of failure is critical to address.
READ FULL TEXT