Towards Memory-Efficient Neural Networks via Multi-Level in situ Generation

08/25/2021
by   Jiaqi Gu, et al.
0

Deep neural networks (DNN) have shown superior performance in a variety of tasks. As they rapidly evolve, their escalating computation and memory demands make it challenging to deploy them on resource-constrained edge devices. Though extensive efficient accelerator designs, from traditional electronics to emerging photonics, have been successfully demonstrated, they are still bottlenecked by expensive memory accesses due to tremendous gaps between the bandwidth/power/latency of electrical memory and computing cores. Previous solutions fail to fully-leverage the ultra-fast computational speed of emerging DNN accelerators to break through the critical memory bound. In this work, we propose a general and unified framework to trade expensive memory transactions with ultra-fast on-chip computations, directly translating to performance improvement. We are the first to jointly explore the intrinsic correlations and bit-level redundancy within DNN kernels and propose a multi-level in situ generation mechanism with mixed-precision bases to achieve on-the-fly recovery of high-resolution parameters with minimum hardware overhead. Extensive experiments demonstrate that our proposed joint method can boost the memory efficiency by 10-20x with comparable accuracy over four state-of-the-art designs, when benchmarked on ResNet-18/DenseNet-121/MobileNetV2/V3 with various tasks.

READ FULL TEXT
research
06/15/2018

RAPIDNN: In-Memory Deep Neural Network Acceleration Framework

Deep neural networks (DNN) have demonstrated effectiveness for various a...
research
07/23/2023

MARS: Exploiting Multi-Level Parallelism for DNN Workloads on Adaptive Multi-Accelerator Systems

Along with the fast evolution of deep neural networks, the hardware syst...
research
05/11/2021

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

The deep neural network (DNN) based AI applications on the edge require ...
research
08/03/2022

PalQuant: Accelerating High-precision Networks on Low-precision Accelerators

Recently low-precision deep learning accelerators (DLAs) have become pop...
research
11/25/2020

Ax-BxP: Approximate Blocked Computation for Precision-Reconfigurable Deep Neural Network Acceleration

Precision scaling has emerged as a popular technique to optimize the com...
research
04/22/2021

InstantNet: Automated Generation and Deployment of Instantaneously Switchable-Precision Networks

The promise of Deep Neural Network (DNN) powered Internet of Thing (IoT)...
research
03/27/2019

High Performance Monte Carlo Simulation of Ising Model on TPU Clusters

Large scale deep neural networks profited from an emerging class of AI a...

Please sign up or login with your details

Forgot password? Click here to reset