Model Compression Methods for YOLOv5: A Review

07/21/2023
by   Mohammad Jani, et al.
0

Over the past few years, extensive research has been devoted to enhancing YOLO object detectors. Since its introduction, eight major versions of YOLO have been introduced with the purpose of improving its accuracy and efficiency. While the evident merits of YOLO have yielded to its extensive use in many areas, deploying it on resource-limited devices poses challenges. To address this issue, various neural network compression methods have been developed, which fall under three main categories, namely network pruning, quantization, and knowledge distillation. The fruitful outcomes of utilizing model compression methods, such as lowering memory usage and inference time, make them favorable, if not necessary, for deploying large neural networks on hardware-constrained edge devices. In this review paper, our focus is on pruning and quantization due to their comparative modularity. We categorize them and analyze the practical results of applying those methods to YOLOv5. By doing so, we identify gaps in adapting pruning and quantization for compressing YOLOv5, and provide future directions in this area for further exploration. Among several versions of YOLO, we specifically choose YOLOv5 for its excellent trade-off between recency and popularity in literature. This is the first specific review paper that surveys pruning and quantization methods from an implementation point of view on YOLOv5. Our study is also extendable to newer versions of YOLO as implementing them on resource-limited devices poses the same challenges that persist even today. This paper targets those interested in the practical deployment of model compression methods on YOLOv5, and in exploring different compression techniques that can be used for subsequent versions of YOLO.

READ FULL TEXT

page 7

page 12

research
06/25/2021

PQK: Model Compression via Pruning, Quantization, and Knowledge Distillation

As edge devices become prevalent, deploying Deep Neural Networks (DNN) o...
research
12/15/2022

Towards Hardware-Specific Automatic Compression of Neural Networks

Compressing neural network architectures is important to allow the deplo...
research
05/22/2023

Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study

The excellent performance of deep neural networks is usually accompanied...
research
04/28/2020

Streamlining Tensor and Network Pruning in PyTorch

In order to contrast the explosion in size of state-of-the-art machine l...
research
06/21/2021

Efficient Inference via Universal LSH Kernel

Large machine learning models achieve unprecedented performance on vario...
research
07/27/2023

Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs

Deep learning often faces the challenge of efficiently processing dynami...
research
12/18/2019

TOCO: A Framework for Compressing Neural Network Models Based on Tolerance Analysis

Neural network compression methods have enabled deploying large models o...

Please sign up or login with your details

Forgot password? Click here to reset