DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

08/17/2020
by   Alessio Burrello, et al.
13

The deployment of Deep Neural Networks (DNNs) on end-nodes at the extreme edge of the Internet-of-Things is a critical enabler to support pervasive Deep Learning-enhanced applications. Low-Cost MCU-based end-nodes have limited on-chip memory and often replace caches with scratchpads, to reduce area overheads and increase energy efficiency – requiring explicit DMA-based memory transfers between different levels of the memory hierarchy. Mapping modern DNNs on these systems requires aggressive topology-dependent tiling and double-buffering. In this work, we propose DORY (Deployment Oriented to memoRY) - an automatic tool to deploy DNNs on low cost MCUs with typically less than 1MB of on-chip SRAM memory. DORY abstracts tiling as a Constraint Programming (CP) problem: it maximizes L1 memory utilization under the topological constraints imposed by each DNN layer. Then, it generates ANSI C code to orchestrate off- and on-chip transfers and computation phases. Furthermore, to maximize speed, DORY augments the CP formulation with heuristics promoting performance-effective tile sizes. As a case study for DORY, we target GreenWaves Technologies GAP8, one of the most advanced parallel ultra-low power MCU-class devices on the market. On this device, DORY achieves up to 2.5x better MAC/cycle than the GreenWaves proprietary software solution and 18.1x better than the state-of-the-art result on an STM32-F746 MCU on single layers. Using our tool, GAP-8 can perform end-to-end inference of a 1.0-MobileNet-128 network consuming just 63 pJ/MAC on average @ 4.3 fps - 15.4x better than an STM32-F746. We release all our developments - the DORY framework, the optimized backend kernels, and the related heuristics - as open-source software.

READ FULL TEXT

page 4

page 10

page 14

research
07/06/2021

Impact of On-Chip Interconnect on In-Memory Acceleration of Deep Neural Networks

With the widespread use of Deep Neural Networks (DNNs), machine learning...
research
11/27/2022

HULK-V: a Heterogeneous Ultra-low-power Linux capable RISC-V SoC

IoT applications span a wide range in performance and memory footprint, ...
research
07/03/2023

A 3 TOPS/W RISC-V Parallel Cluster for Inference of Fine-Grain Mixed-Precision Quantized Neural Networks

The emerging trend of deploying complex algorithms, such as Deep Neural ...
research
08/14/2021

SIAM: Chiplet-based Scalable In-Memory Acceleration with Mesh for Deep Neural Networks

In-memory computing (IMC) on a monolithic chip for deep learning faces d...
research
06/20/2023

LightRidge: An End-to-end Agile Design Framework for Diffractive Optical Neural Networks

To lower the barrier to diffractive optical neural networks (DONNs) desi...
research
07/16/2021

DNN is not all you need: Parallelizing Non-Neural ML Algorithms on Ultra-Low-Power IoT Processors

Machine Learning (ML) functions are becoming ubiquitous in latency- and ...
research
06/19/2022

Productive Reproducible Workflows for DNNs: A Case Study for Industrial Defect Detection

As Deep Neural Networks (DNNs) have become an increasingly ubiquitous wo...

Please sign up or login with your details

Forgot password? Click here to reset