Goal-Conditioned Q-Learning as Knowledge Distillation

08/28/2022
by   Alexander Levine, et al.
0

Many applications of reinforcement learning can be formalized as goal-conditioned environments, where, in each episode, there is a "goal" that affects the rewards obtained during that episode but does not affect the dynamics. Various techniques have been proposed to improve performance in goal-conditioned environments, such as automatic curriculum generation and goal relabeling. In this work, we explore a connection between off-policy reinforcement learning in goal-conditioned settings and knowledge distillation. In particular: the current Q-value function and the target Q-value estimate are both functions of the goal, and we would like to train the Q-value function to match its target for all goals. We therefore apply Gradient-Based Attention Transfer (Zagoruyko and Komodakis 2017), a knowledge distillation technique, to the Q-function update. We empirically show that this can improve the performance of goal-conditioned off-policy reinforcement learning when the space of goals is high-dimensional. We also show that this technique can be adapted to allow for efficient learning in the case of multiple simultaneous sparse goals, where the agent can attain a reward by achieving any one of a large set of objectives, all specified at test time. Finally, to provide theoretical support, we give examples of classes of environments where (under some assumptions) standard off-policy algorithms require at least O(d^2) observed transitions to learn an optimal policy, while our proposed technique requires only O(d) transitions, where d is the dimensionality of the goal and state space.

READ FULL TEXT

page 5

page 16

research
02/20/2019

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Reinforcement learning is a promising framework for solving control prob...
research
04/27/2020

Evolutionary Stochastic Policy Distillation

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging ...
research
03/09/2023

GOATS: Goal Sampling Adaptation for Scooping with Curriculum Reinforcement Learning

In this work, we first formulate the problem of goal-conditioned robotic...
research
09/27/2019

Automated curricula through setter-solver interactions

Reinforcement learning algorithms use correlations between policies and ...
research
07/01/2021

Goal-Conditioned Reinforcement Learning with Imagined Subgoals

Goal-conditioned reinforcement learning endows an agent with a large var...
research
03/09/2020

Transfer Reinforcement Learning under Unobserved Contextual Information

In this paper, we study a transfer reinforcement learning problem where ...
research
08/17/2022

Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning

Goal-conditioned reinforcement learning (GCRL) has a wide range of poten...

Please sign up or login with your details

Forgot password? Click here to reset