Can Agents Run Relay Race with Strangers? Generalization of RL to Out-of-Distribution Trajectories

04/26/2023
by   Li-Cheng Lan, et al.
0

In this paper, we define, evaluate, and improve the “relay-generalization” performance of reinforcement learning (RL) agents on the out-of-distribution “controllable” states. Ideally, an RL agent that generally masters a task should reach its goal starting from any controllable state of the environment instead of memorizing a small set of trajectories. For example, a self-driving system should be able to take over the control from humans in the middle of driving and must continue to drive the car safely. To practically evaluate this type of generalization, we start the test agent from the middle of other independently well-trained stranger agents' trajectories. With extensive experimental evaluation, we show the prevalence of generalization failure on controllable states from stranger agents. For example, in the Humanoid environment, we observed that a well-trained Proximal Policy Optimization (PPO) agent, with only 3.9% failure rate during regular testing, failed on 81.6% of the states generated by well-trained stranger PPO agents. To improve "relay generalization," we propose a novel method called Self-Trajectory Augmentation (STA), which will reset the environment to the agent's old states according to the Q function during training. After applying STA to the Soft Actor Critic's (SAC) training procedure, we reduced the failure rate of SAC under relay-evaluation by more than three times in most settings without impacting agent performance and increasing the needed number of environment interactions. Our code is available at https://github.com/lan-lc/STA.

READ FULL TEXT
research
08/05/2019

DoorGym: A Scalable Door Opening Environment And Baseline Agent

Reinforcement Learning (RL) has brought forth ideas of autonomous robots...
research
10/29/2022

DeFIX: Detecting and Fixing Failure Scenarios with Reinforcement Learning in Imitation Learning Based Autonomous Driving

Safely navigating through an urban environment without violating any tra...
research
06/08/2020

Hallucinating Value: A Pitfall of Dyna-style Planning with Imperfect Environment Models

Dyna-style reinforcement learning (RL) agents improve sample efficiency ...
research
07/15/2022

Bootstrap State Representation using Style Transfer for Better Generalization in Deep Reinforcement Learning

Deep Reinforcement Learning (RL) agents often overfit the training envir...
research
05/04/2023

Simple Noisy Environment Augmentation for Reinforcement Learning

Data augmentation is a widely used technique for improving model perform...
research
07/26/2021

Playtesting: What is Beyond Personas

Playtesting is an essential step in the game design process. Game design...
research
06/04/2023

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

Reinforcement learning agents may sometimes develop habits that are effe...

Please sign up or login with your details

Forgot password? Click here to reset