Offline Distillation for Robot Lifelong Learning with Imbalanced Experience

by   Wenxuan Zhou, et al.

Robots will experience non-stationary environment dynamics throughout their lifetime: the robot dynamics can change due to wear and tear, or its surroundings may change over time. Eventually, the robots should perform well in all of the environment variations it has encountered. At the same time, it should still be able to learn fast in a new environment. We investigate two challenges in such a lifelong learning setting: first, existing off-policy algorithms struggle with the trade-off between being conservative to maintain good performance in the old environment and learning efficiently in the new environment. We propose the Offline Distillation Pipeline to break this trade-off by separating the training procedure into interleaved phases of online interaction and offline distillation. Second, training with the combined datasets from multiple environments across the lifetime might create a significant performance drop compared to training on the datasets individually. Our hypothesis is that both the imbalanced quality and size of the datasets exacerbate the extrapolation error of the Q-function during offline training over the "weaker" dataset. We propose a simple fix to the issue by keeping the policy closer to the dataset during the distillation phase. In the experiments, we demonstrate these challenges and the proposed solutions with a simulated bipedal robot walking task across various environment changes. We show that the Offline Distillation Pipeline achieves better performance across all the encountered environments without affecting data collection. We also provide a comprehensive empirical study to support our hypothesis on the data imbalance issue.


page 2

page 10


Context is Everything: Implicit Identification for Dynamics Adaptation

Understanding environment dynamics is necessary for robots to act safely...

Efficient Self-Supervised Data Collection for Offline Robot Learning

A practical approach to robot reinforcement learning is to first collect...

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Offline reinforcement learning (RL) offers an appealing approach to real...

Contextual Conservative Q-Learning for Offline Reinforcement Learning

Offline reinforcement learning learns an effective policy on offline dat...

Continual Reinforcement Learning deployed in Real-life using Policy Distillation and Sim2Real Transfer

We focus on the problem of teaching a robot to solve tasks presented seq...

Dual-Arm Adversarial Robot Learning

Robot learning is a very promising topic for the future of automation an...

Please sign up or login with your details

Forgot password? Click here to reset