A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays

10/20/2021
by   Leonardo Ravaglia, et al.
0

In the last few years, research and development on Deep Learning models and techniques for ultra-low-power devices in a word, TinyML has mainly focused on a train-then-deploy assumption, with static models that cannot be adapted to newly collected data without cloud-based data collection and fine-tuning. Latent Replay-based Continual Learning (CL) techniques[1] enable online, serverless adaptation in principle, but so farthey have still been too computation and memory-hungry for ultra-low-power TinyML devices, which are typically based on microcontrollers. In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor. We rethink the baseline Latent Replay CL algorithm, leveraging quantization of the frozen stage of the model and Latent Replays (LRs) to reduce their memory cost with minimal impact on accuracy. In particular, 8-bit compression of the LR memory proves to be almost lossless (-0.26 but requires 4x less memory, while 7-bit can also be used with an additional minimal accuracy degradation (up to 5 for forward and backward propagation on the PULP processor. Our results show that by combining these techniques, continual learning can be achieved in practice using less than 64MB of memory an amount compatible with embedding in TinyML devices. On an advanced 22nm prototype of our platform, called VEGA, the proposed solution performs onaverage 65x faster than a low-power STM32 L4 microcontroller, being 37x more energy efficient enough for a lifetime of 535h when learning a new mini-batch of data once every minute.

READ FULL TEXT

page 1

page 6

page 7

page 11

page 15

research
07/22/2020

Memory-Latency-Accuracy Trade-offs for Continual Learning on a RISC-V Extreme-Edge Node

AI-powered edge devices currently lack the ability to adapt their embedd...
research
08/29/2023

On-Device Learning with Binary Neural Networks

Existing Continual Learning (CL) solutions only partially address the co...
research
12/02/2019

Latent Replay for Real-Time Continual Learning

Training deep networks on light computational devices is nowadays very c...
research
08/11/2023

Cost-effective On-device Continual Learning over Memory Hierarchy with Miro

Continual learning (CL) trains NN models incrementally from a continuous...
research
05/30/2023

Reduced Precision Floating-Point Optimization for Deep Neural Network On-Device Learning on MicroControllers

Enabling On-Device Learning (ODL) for Ultra-Low-Power Micro-Controller U...
research
11/19/2019

Online Learned Continual Compression with Stacked Quantization Module

We introduce and study the problem of Online Continual Compression, wher...
research
11/30/2019

Quantized deep learning models on low-power edge devices for robotic systems

In this work, we present a quantized deep neural network deployed on a l...

Please sign up or login with your details

Forgot password? Click here to reset