On-FPGA Training with Ultra Memory Reduction: A Low-Precision Tensor Method

04/07/2021
by   Kaiqi Zhang, et al.
0

Various hardware accelerators have been developed for energy-efficient and real-time inference of neural networks on edge devices. However, most training is done on high-performance GPUs or servers, and the huge memory and computing costs prevent training neural networks on edge devices. This paper proposes a novel tensor-based training framework, which offers orders-of-magnitude memory reduction in the training process. We propose a novel rank-adaptive tensorized neural network model, and design a hardware-friendly low-precision algorithm to train this model. We present an FPGA accelerator to demonstrate the benefits of this training method on edge devices. Our preliminary FPGA implementation achieves 59× speedup and 123× energy reduction compared to embedded CPU, and 292× memory reduction over a standard full-size training.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/11/2021

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

The deep neural network (DNN) based AI applications on the edge require ...
research
07/04/2022

TT-PINN: A Tensor-Compressed Neural PDE Solver for Edge Computing

Physics-informed neural networks (PINNs) have been increasingly employed...
research
04/27/2023

Moccasin: Efficient Tensor Rematerialization for Neural Networks

The deployment and training of neural networks on edge computing devices...
research
12/23/2020

Adaptive Precision Training for Resource Constrained Devices

Learn in-situ is a growing trend for Edge AI. Training deep neural netwo...
research
03/09/2022

A Brain-Inspired Low-Dimensional Computing Classifier for Inference on Tiny Devices

By mimicking brain-like cognition and exploiting parallelism, hyperdimen...
research
02/27/2021

ProbLP: A framework for low-precision probabilistic inference

Bayesian reasoning is a powerful mechanism for probabilistic inference i...
research
08/22/2022

Performance Modeling Sparse MTTKRP Using Optical Static Random Access Memory on FPGA

Electrical static random memory (E-SRAM) is the current standard for int...

Please sign up or login with your details

Forgot password? Click here to reset