DNN is not all you need: Parallelizing Non-Neural ML Algorithms on Ultra-Low-Power IoT Processors

07/16/2021
by   Enrico Tabanelli, et al.
0

Machine Learning (ML) functions are becoming ubiquitous in latency- and privacy-sensitive IoT applications, prompting for a shift toward near-sensor processing at the extreme edge and the consequent increasing adoption of Parallel Ultra-Low Power (PULP) IoT processors. These compute- and memory-constrained parallel architectures need to run efficiently a wide range of algorithms, including key Non-Neural ML kernels that compete favorably with Deep Neural Networks (DNNs) in terms of accuracy under severe resource constraints. In this paper, we focus on enabling efficient parallel execution of Non-Neural ML algorithms on two RISCV-based PULP platforms, namely GAP8, a commercial chip, and PULP-OPEN, a research platform running on an FPGA emulator. We optimized the parallel algorithms through a fine-grained analysis and intensive optimization to maximize the speedup, considering two alternative Floating-Point (FP) emulation libraries on GAP8 and the native FPU support on PULP-OPEN. Experimental results show that a target-optimized emulation library can lead to an average 1.61x runtime improvement compared to a standard emulation library, while the native FPU support reaches up to 32.09x. In terms of parallel speedup, our design improves the sequential execution by 7.04x on average on the targeted octa-core platforms. Lastly, we present a comparison with the ARM Cortex-M4 microcontroller (MCU), a widely adopted commercial solution for edge deployments, which is 12.87x slower than PULP-OPEN.

READ FULL TEXT

page 5

page 6

page 7

page 9

page 10

page 12

page 17

research
11/08/2019

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

The growing number of low-power smart devices in the Internet of Things ...
research
11/28/2017

A Transprecision Floating-Point Platform for Ultra-Low Power Computing

In modern low-power embedded platforms, floating-point (FP) operations e...
research
07/08/2021

MAFIA: Machine Learning Acceleration on FPGAs for IoT Applications

Recent breakthroughs in ML have produced new classes of models that allo...
research
01/09/2023

TinyVers: A Tiny Versatile System-on-chip with State-Retentive eMRAM for ML Inference at the Extreme Edge

Extreme edge devices or Internet-of-thing nodes require both ultra-low p...
research
08/29/2019

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

We present PULP-NN, an optimized computing library for a parallel ultra-...
research
03/24/2022

TCN Mapping Optimization for Ultra-Low Power Time-Series Edge Inference

Temporal Convolutional Networks (TCNs) are emerging lightweight Deep Lea...
research
08/17/2020

DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

The deployment of Deep Neural Networks (DNNs) on end-nodes at the extrem...

Please sign up or login with your details

Forgot password? Click here to reset