perf4sight: A toolflow to model CNN training performance on Edge GPUs

08/12/2021
by   Aditya Rajagopal, et al.
0

The increased memory and processing capabilities of today's edge devices create opportunities for greater edge intelligence. In the domain of vision, the ability to adapt a Convolutional Neural Network's (CNN) structure and parameters to the input data distribution leads to systems with lower memory footprint, latency and power consumption. However, due to the limited compute resources and memory budget on edge devices, it is necessary for the system to be able to predict the latency and memory footprint of the training process in order to identify favourable training configurations of the network topology and device combination for efficient network adaptation. This work proposes perf4sight, an automated methodology for developing accurate models that predict CNN training memory footprint and latency given a target device and network. This enables rapid identification of network topologies that can be retrained on the edge device with low resource consumption. With PyTorch as the framework and NVIDIA Jetson TX2 as the target device, the developed models predict training memory footprint and latency with 95 respectively for a wide range of networks, opening the path towards efficient network adaptation on edge GPUs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/01/2022

Low-Cost On-device Partial Domain Adaptation (LoCO-PDA): Enabling efficient CNN retraining on edge devices

With the increased deployment of Convolutional Neural Networks (CNNs) on...
research
06/15/2020

Now that I can see, I can improve: Enabling data-driven finetuning of CNNs on the edge

In today's world, a vast amount of data is being generated by edge devic...
research
07/19/2023

TinyTrain: Deep Neural Network Training at the Extreme Edge

On-device training is essential for user personalisation and privacy. Wi...
research
03/13/2023

HiSSNet: Sound Event Detection and Speaker Identification via Hierarchical Prototypical Networks for Low-Resource Headphones

Modern noise-cancelling headphones have significantly improved users' au...
research
02/03/2022

DistrEdge: Speeding up Convolutional Neural Network Inference on Distributed Edge Devices

As the number of edge devices with computing resources (e.g., embedded G...
research
07/14/2021

Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

A rising research challenge is running costly machine learning (ML) netw...
research
04/28/2018

Low-memory convolutional neural networks through incremental depth-first processing

We introduce an incremental processing scheme for convolutional neural n...

Please sign up or login with your details

Forgot password? Click here to reset