Distillation Strategies for Proximal Policy Optimization

01/23/2019
by   Sam Green, et al.
0

Vision-based deep reinforcement learning (RL), similar to deep learning, typically obtains a performance benefit by using high capacity and relatively large convolutional neural networks (CNN). However, a large network leads to higher inference costs (power, latency, silicon area, MAC count). Many inference optimization have been developed for CNNs. Some optimization techniques offer theoretical efficiency, but designing actual hardware to support them is difficult. On the other hand, "distillation" is a simple general-purpose optimization technique which is broadly applicable for transferring knowledge from a trained, high capacity, teacher network to an untrained, low capacity, student network. "DQN distillation" extended the original distillation idea to transfer information stored in a high performance, high capacity teacher Q-function trained via the Deep Q-Learning (DQN) algorithm. Our work adapts the DQN distillation work to the actor-critic Proximal Policy Optimization algorithm. PPO is simple to implement and has much higher performance than the seminal DQN algorithm. We show that a distilled PPO student can attain far higher performance compared to a DQN teacher. We also show that a low capacity distilled student is generally able to outperform a low capacity agent that directly trains in the environment. Finally, we show that distillation, followed by "fine-tuning" in the environment, enables the distilled PPO student to achieve parity with teacher performance. In general, the lessons learned in this work should transfer to other actor-critic RL algorithms.

READ FULL TEXT
research
12/29/2019

Real-time Policy Distillation in Deep Reinforcement Learning

Policy distillation in deep reinforcement learning provides an effective...
research
11/24/2020

REPAINT: Knowledge Transfer in Deep Actor-Critic Reinforcement Learning

Accelerating the learning processes for complex tasks by leveraging prev...
research
06/16/2023

Coaching a Teachable Student

We propose a novel knowledge distillation framework for effectively teac...
research
06/01/2017

Deep Mutual Learning

Model distillation is an effective and widely used technique to transfer...
research
12/01/2018

Snapshot Distillation: Teacher-Student Optimization in One Generation

Optimizing a deep neural network is a fundamental task in computer visio...
research
04/04/2021

Efficient Transformers in Reinforcement Learning using Actor-Learner Distillation

Many real-world applications such as robotics provide hard constraints o...
research
11/29/2022

Multi-Agent Reinforcement Learning for Microprocessor Design Space Exploration

Microprocessor architects are increasingly resorting to domain-specific ...

Please sign up or login with your details

Forgot password? Click here to reset