Fast On-the-fly Retraining-free Sparsification of Convolutional Neural Networks
Modern Convolutional Neural Networks (CNNs) are complex, encompassing millions of parameters. Their deployment exerts computational, storage and energy demands, particularly on embedded platforms. Existing approaches to prune or sparsify CNNs require retraining to maintain inference accuracy. Such retraining is not feasible in some contexts. In this paper, we explore the sparsification of CNNs by proposing three model-independent methods. Our methods are applied on-the-fly and require no retraining. We show that the state-of-the-art models' weights can be reduced by up to 73 factor of 3.7x) without incurring more than 5 Additional fine-tuning gains only 8 on-the-fly methods are effective.
READ FULL TEXT