Fine-Grained Energy Profiling for Deep Convolutional Neural Networks on the Jetson TX1

Energy-use is a key concern when migrating current deep learning applications onto low power heterogeneous devices such as a mobile device. This is because deep neural networks are typically designed and trained on high-end GPUs or servers and require additional processing steps to deploy them on low power devices. Such steps include the use of compression techniques to scale down the network size or the provision of efficient device-specific software implementations. Migration is further aggravated by the lack of tools and the inability to measure power and performance accurately and consistently across devices. We present a novel evaluation framework for measuring energy and performance for deep neural networks using ARMs Streamline Performance Analyser integrated with standard deep learning frameworks such as Caffe and CuDNNv5. We apply the framework to study the execution behaviour of SqueezeNet on the Maxwell GPU of the NVidia Jetson TX1, on an image classification task (also known as inference) and demonstrate the ability to measure energy of specific layers of the neural network.

READ FULL TEXT

page 1

page 2

research
03/29/2018

Fine-Grained Energy and Performance Profiling framework for Deep Convolutional Neural Networks

There is a huge demand for on-device execution of deep learning algorith...
research
06/23/2016

Precise deep neural network computation on imprecise low-power analog hardware

There is an urgent need for compact, fast, and power-efficient hardware ...
research
11/04/2019

Deep Compressed Pneumonia Detection for Low-Power Embedded Devices

Deep neural networks (DNNs) have been expanded into medical fields and t...
research
03/29/2022

Kernel Modulation: A Parameter-Efficient Method for Training Convolutional Neural Networks

Deep Neural Networks, particularly Convolutional Neural Networks (ConvNe...
research
11/11/2019

A Computing Kernel for Network Binarization on PyTorch

Deep Neural Networks have now achieved state-of-the-art results in a wid...
research
03/24/2016

A Reconfigurable Low Power High Throughput Architecture for Deep Network Training

General purpose computing systems are used for a large variety of applic...
research
05/22/2017

A Low-Power Accelerator for Deep Neural Networks with Enlarged Near-Zero Sparsity

It remains a challenge to run Deep Learning in devices with stringent po...

Please sign up or login with your details

Forgot password? Click here to reset