Towards Model-Size Agnostic, Compute-Free, Memorization-based Inference of Deep Learning

07/14/2023
by   Davide Giacomini, et al.
0

The rapid advancement of deep neural networks has significantly improved various tasks, such as image and speech recognition. However, as the complexity of these models increases, so does the computational cost and the number of parameters, making it difficult to deploy them on resource-constrained devices. This paper proposes a novel memorization-based inference (MBI) that is compute free and only requires lookups. Specifically, our work capitalizes on the inference mechanism of the recurrent attention model (RAM), where only a small window of input domain (glimpse) is processed in a one time step, and the outputs from multiple glimpses are combined through a hidden vector to determine the overall classification output of the problem. By leveraging the low-dimensionality of glimpse, our inference procedure stores key value pairs comprising of glimpse location, patch vector, etc. in a table. The computations are obviated during inference by utilizing the table to read out key-value pairs and performing compute-free inference by memorization. By exploiting Bayesian optimization and clustering, the necessary lookups are reduced, and accuracy is improved. We also present in-memory computing circuits to quickly look up the matching key vector to an input query. Compared to competitive compute-in-memory (CIM) approaches, MBI improves energy efficiency by almost 2.7 times than multilayer perceptions (MLP)-CIM and by almost 83 times than ResNet20-CIM for MNIST character recognition.

READ FULL TEXT

page 1

page 2

page 5

research
08/10/2020

TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices

Advances in deep learning have led to state-of-the-art performance acros...
research
12/13/2017

FFT-Based Deep Learning Deployment in Embedded Systems

Deep learning has delivered its powerfulness in many application domains...
research
04/07/2020

Increasing the Inference and Learning Speed of Tsetlin Machines with Clause Indexing

The Tsetlin Machine (TM) is a machine learning algorithm founded on the ...
research
02/23/2021

Memory-efficient Speech Recognition on Smart Devices

Recurrent transducer models have emerged as a promising solution for spe...
research
09/20/2023

Containing Analog Data Deluge at Edge through Frequency-Domain Compression in Collaborative Compute-in-Memory Networks

Edge computing is a promising solution for handling high-dimensional, mu...
research
04/03/2023

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

We present dual-attention neural biasing, an architecture designed to bo...
research
07/06/2021

Irregular Invertible Bloom Look-Up Tables

We consider invertible Bloom lookup tables (IBLTs) which are probabilist...

Please sign up or login with your details

Forgot password? Click here to reset