Partition Pruning: Parallelization-Aware Pruning for Deep Neural Networks

01/21/2019
by   Sina Shahhosseini, et al.
0

Parameters of recent neural networks require a huge amount of memory. These parameters are used by neural networks to perform machine learning tasks when processing inputs. To speed up inference, we develop Partition Pruning, an innovative scheme to reduce the parameters used while taking into consideration parallelization. We evaluated the performance and energy consumption of parallel inference of partitioned models, which showed a 7.72x speed up of performance and a 2.73x reduction in the energy used for computing pruned layers of TinyVGG16 in comparison to running the unpruned model on a single accelerator. In addition, our method showed a limited reduction some numbers in accuracy while partitioning fully connected layers.

READ FULL TEXT
research
10/04/2022

Energy Consumption of Neural Networks on NVIDIA Edge Boards: an Empirical Model

Recently, there has been a trend of shifting the execution of deep learn...
research
10/17/2022

Principled Pruning of Bayesian Neural Networks through Variational Free Energy Minimization

Bayesian model reduction provides an efficient approach for comparing th...
research
01/24/2021

A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks

In this work, to limit the number of required attention inference hops i...
research
12/11/2021

CHAMP: Coherent Hardware-Aware Magnitude Pruning of Integrated Photonic Neural Networks

We propose a novel hardware-aware magnitude pruning technique for cohere...
research
11/16/2016

Designing Energy-Efficient Convolutional Neural Networks using Energy-Aware Pruning

Deep convolutional neural networks (CNNs) are indispensable to state-of-...
research
02/08/2022

EvoPruneDeepTL: An Evolutionary Pruning Model for Transfer Learning based Deep Neural Networks

In recent years, Deep Learning models have shown a great performance in ...
research
05/09/2020

GPU Acceleration of Sparse Neural Networks

In this paper, we use graphics processing units(GPU) to accelerate spars...

Please sign up or login with your details

Forgot password? Click here to reset