Pex: Memory-efficient Microcontroller Deep Learning through Partial Execution

11/30/2022
by   Edgar Liberis, et al.
0

Embedded and IoT devices, largely powered by microcontroller units (MCUs), could be made more intelligent by leveraging on-device deep learning. One of the main challenges of neural network inference on an MCU is the extremely limited amount of read-write on-chip memory (SRAM, < 512 kB). SRAM is consumed by the neural network layer (operator) input and output buffers, which, traditionally, must be in memory (materialised) for an operator to execute. We discuss a novel execution paradigm for microcontroller deep learning, which modifies the execution of neural networks to avoid materialising full buffers in memory, drastically reducing SRAM usage with no computation overhead. This is achieved by exploiting the properties of operators, which can consume/produce a fraction of their input/output at a time. We describe a partial execution compiler, Pex, which produces memory-efficient execution schedules automatically by identifying subgraphs of operators whose execution can be split along the feature ("channel") dimension. Memory usage is reduced further by targeting memory bottlenecks with structured pruning, leading to the co-design of the network architecture and its execution schedule. Our evaluation of image and audio classification models: (a) establishes state-of-the-art performance in low SRAM usage regimes for considered tasks with up to +2.9 possible by applying partial execution alone, or up to 10.5x when using the compiler-pruning co-design, while maintaining the classification accuracy compared to prior work; (c) uses the recovered SRAM to process higher resolution inputs instead, increasing accuracy by up to +3.9 Words.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/15/2021

Differentiable Network Pruning for Microcontrollers

Embedded and personal IoT devices are powered by microcontroller units (...
research
07/04/2022

CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution

Mobile devices run deep learning models for various purposes, such as im...
research
07/20/2020

MCUNet: Tiny Deep Learning on IoT Devices

Machine learning on tiny IoT devices based on microcontroller units (MCU...
research
10/28/2021

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

Tiny deep learning on microcontroller units (MCUs) is challenging due to...
research
08/09/2023

FPGA Resource-aware Structured Pruning for Real-Time Neural Networks

Neural networks achieve state-of-the-art performance in image classifica...
research
07/20/2021

NeurObfuscator: A Full-stack Obfuscation Tool to Mitigate Neural Architecture Stealing

Neural network stealing attacks have posed grave threats to neural netwo...
research
07/29/2020

Towards a Formal Foundation of Intermittent Computing

Intermittently powered devices enable new applications in harsh or inacc...

Please sign up or login with your details

Forgot password? Click here to reset