GNN-RL Compression: Topology-Aware Network Pruning using Multi-stage Graph Embedding and Reinforcement Learning

by   Sixing Yu, et al.

Model compression is an essential technique for deploying deep neural networks (DNNs) on power and memory-constrained resources. However, existing model-compression methods often rely on human expertise and focus on parameters' local importance, ignoring the rich topology information within DNNs. In this paper, we propose a novel multi-stage graph embedding technique based on graph neural networks (GNNs) to identify the DNNs' topology and use reinforcement learning (RL) to find a suitable compression policy. We performed resource-constrained (i.e., FLOPs) channel pruning and compared our approach with state-of-the-art compression methods using over-parameterized DNNs (e.g., ResNet and VGG-16) and mobile-friendly DNNs (e.g., MobileNet and ShuffleNet). We evaluated our method on various models from typical to mobile-friendly networks, such as ResNet family, VGG-16, MobileNet-v1/v2, and ShuffleNet. The results demonstrate that our method can prune dense networks (e.g., VGG-16) by up to 80 state-of-the-art methods and achieved a higher accuracy by up to 1.84 ShuffleNet-v1. Furthermore, following our approach, the pruned VGG-16 achieved a noticeable 1.38× speed up and 141 MB GPU memory reduction.


Auto Graph Encoder-Decoder for Model Compression and Network Acceleration

Model compression aims to deploy deep neural networks (DNN) to mobile de...

Optimizing Memory Placement using Evolutionary Graph Reinforcement Learning

As modern neural networks have grown to billions of parameters, meeting ...

Learning to Prune Deep Neural Networks via Reinforcement Learning

This paper proposes PuRL - a deep reinforcement learning (RL) based algo...

Compression and Localization in Reinforcement Learning for ATARI Games

Deep neural networks have become commonplace in the domain of reinforcem...

Surrogate Lagrangian Relaxation: A Path To Retrain-free Deep Neural Network Pruning

Network pruning is a widely used technique to reduce computation cost an...

EAST: Encoding-Aware Sparse Training for Deep Memory Compression of ConvNets

The implementation of Deep Convolutional Neural Networks (ConvNets) on t...

Please sign up or login with your details

Forgot password? Click here to reset