PAnDR: Fast Adaptation to New Environments from Offline Experiences via Decoupling Policy and Environment Representations

04/06/2022
by   Tong Sang, et al.
0

Deep Reinforcement Learning (DRL) has been a promising solution to many complex decision-making problems. Nevertheless, the notorious weakness in generalization among environments prevent widespread application of DRL agents in real-world scenarios. Although advances have been made recently, most prior works assume sufficient online interaction on training environments, which can be costly in practical cases. To this end, we focus on an offline-training-online-adaptation setting, in which the agent first learns from offline experiences collected in environments with different dynamics and then performs online policy adaptation in environments with new dynamics. In this paper, we propose Policy Adaptation with Decoupled Representations (PAnDR) for fast policy adaptation. In offline training phase, the environment representation and policy representation are learned through contrastive learning and policy recovery, respectively. The representations are further refined by mutual information optimization to make them more decoupled and complete. With learned representations, a Policy-Dynamics Value Function (PDVF) (Raileanu et al., 2020) network is trained to approximate the values for different combinations of policies and environments. In online adaptation phase, with the environment context inferred from few experiences collected in new environments, the policy is optimized by gradient ascent with respect to the PDVF. Our experiments show that PAnDR outperforms existing algorithms in several representative policy adaptation problems.

READ FULL TEXT

page 18

page 19

page 20

research
07/06/2020

Fast Adaptation via Policy-Dynamics Value Functions

Standard RL algorithms assume fixed environment dynamics and require a s...
research
07/08/2020

Self-Supervised Policy Adaptation during Deployment

In most real world scenarios, a policy trained by reinforcement learning...
research
04/12/2022

Offline Distillation for Robot Lifelong Learning with Imbalanced Experience

Robots will experience non-stationary environment dynamics throughout th...
research
02/04/2023

Locally Constrained Policy Optimization for Online Reinforcement Learning in Non-Stationary Input-Driven Environments

We study online Reinforcement Learning (RL) in non-stationary input-driv...
research
02/01/2022

Generalizing to New Physical Systems via Context-Informed Dynamics Model

Data-driven approaches to modeling physical systems fail to generalize t...
research
05/29/2023

Experience Filter: Using Past Experiences on Unseen Tasks or Environments

One of the bottlenecks of training autonomous vehicle (AV) agents is the...
research
09/06/2022

Cross apprenticeship learning framework: Properties and solution approaches

Apprenticeship learning is a framework in which an agent learns a policy...

Please sign up or login with your details

Forgot password? Click here to reset