TASO: Time and Space Optimization for Memory-Constrained DNN Inference

05/21/2020
by   Yuan Wen, et al.
10

Convolutional neural networks (CNNs) are used in many embedded applications, from industrial robotics and automation systems to biometric identification on mobile devices. State-of-the-art classification is typically achieved by large networks, which are prohibitively expensive to run on mobile and embedded devices with tightly constrained memory and energy budgets. We propose an approach for ahead-of-time domain specific optimization of CNN models, based on an integer linear programming (ILP) for selecting primitive operations to implement convolutional layers. We optimize the trade-off between execution time and memory consumption by: 1) attempting to minimize execution time across the whole network by selecting data layouts and primitive operations to implement each layer; and 2) allocating an appropriate workspace that reflects the upper bound of memory footprint per layer. These two optimization strategies can be used to run any CNN on any platform with a C compiler. Our evaluation with a range of popular ImageNet neural architectures (GoogleNet, AlexNet, VGG, ResNet and SqueezeNet) on the ARM Cortex-A15 yields speedups of 8x compared to a greedy algorithm based primitive selection, reduces memory requirement by 2.2x while sacrificing only 15 solver that considers inference time only. In addition, our optimization approach exposes a range of optimal points for different configurations across the Pareto frontier of memory and latency trade-off, which can be used under arbitrary system constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2018

FastDeepIoT: Towards Understanding and Optimizing Neural Network Execution Time on Mobile and Embedded Devices

Deep neural networks show great potential as solutions to many sensing a...
research
10/20/2020

Optimising the Performance of Convolutional Neural Networks across Computing Systems using Transfer Learning

The choice of convolutional routines (primitives) to implement neural ne...
research
10/03/2017

Optimal DNN Primitive Selection with Partitioned Boolean Quadratic Programming

Deep Neural Networks (DNNs) require very large amounts of computation bo...
research
07/14/2021

Memory-Aware Fusing and Tiling of Neural Networks for Accelerated Edge Inference

A rising research challenge is running costly machine learning (ML) netw...
research
09/19/2018

Characterising Across-Stack Optimisations for Deep Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are extremely computationally deman...
research
04/30/2021

Memory-Efficient Deep Learning Inference in Trusted Execution Environments

This study identifies and proposes techniques to alleviate two key bottl...
research
02/08/2020

BitPruning: Learning Bitlengths for Aggressive and Accurate Quantization

Neural networks have demonstrably achieved state-of-the art accuracy usi...

Please sign up or login with your details

Forgot password? Click here to reset