Fused Depthwise Tiling for Memory Optimization in TinyML Deep Neural Network Inference

03/31/2023
by   Rafael Stahl, et al.
0

Memory optimization for deep neural network (DNN) inference gains high relevance with the emergence of TinyML, which refers to the deployment of DNN inference tasks on tiny, low-power microcontrollers. Applications such as audio keyword detection or radar-based gesture recognition are heavily constrained by the limited memory on such tiny devices because DNN inference requires large intermediate run-time buffers to store activations and other intermediate data, which leads to high memory usage. In this paper, we propose a new Fused Depthwise Tiling (FDT) method for the memory optimization of DNNs, which, compared to existing tiling methods, reduces memory usage without inducing any run time overhead. FDT applies to a larger variety of network layers than existing tiling methods that focus on convolutions. It improves TinyML memory optimization significantly by reducing memory of models where this was not possible before and additionally providing alternative design points for models that show high run time overhead with existing methods. In order to identify the best tiling configuration, an end-to-end flow with a new path discovery method is proposed, which applies FDT and existing tiling methods in a fully automated way, including the scheduling of the operations and planning of the layout of buffers in memory. Out of seven evaluated models, FDT achieved significant memory reduction for two models by 76.2 tiling methods could not be applied. Two other models showed a significant run time overhead with existing methods and FDT provided alternative design points with no overhead but reduced memory savings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2019

An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

End-to-end performance estimation and measurement of deep neural network...
research
08/19/2019

Fast End-to-End Wikification

Wikification of large corpora is beneficial for various NLP applications...
research
04/26/2018

Profile-guided memory optimization for deep neural networks

Recent years have seen deep neural networks (DNNs) becoming wider and de...
research
03/27/2019

A Novel Hierarchical Circuit LUT Model for SOI Technology for Rapid Prototyping

In this paper, a new look-up table (LUT) method is proposed to reduce th...
research
11/14/2022

On Consistency for Bulk-Bitwise Processing-in-Memory

Processing-in-memory (PIM) architectures allow software to explicitly in...
research
01/20/2021

RADAR: Run-time Adversarial Weight Attack Detection and Accuracy Recovery

Adversarial attacks on Neural Network weights, such as the progressive b...

Please sign up or login with your details

Forgot password? Click here to reset