Neural Cache: Bit-Serial In-Cache Acceleration of Deep Neural Networks

05/09/2018
by   Charles Eckert, et al.
0

This paper presents the Neural Cache architecture, which re-purposes cache structures to transform them into massively parallel compute units capable of running inferences for Deep Neural Networks. Techniques to do in-situ arithmetic in SRAM arrays, create efficient data mapping and reducing data movement are proposed. The Neural Cache architecture is capable of fully executing convolutional, fully connected, and pooling layers in-cache. The proposed architecture also supports quantization in-cache. Our experimental results show that the proposed architecture can improve inference latency by 18.3x over state-of-art multi-core CPU (Xeon E5), 7.7x over server class GPU (Titan Xp), for Inception v3 model. Neural Cache improves inference throughput by 12.4x over CPU (2.2x over GPU), while reducing power consumption by 50 CPU (53

READ FULL TEXT
research
11/04/2016

Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks

Recently deep neural networks have received considerable attention due t...
research
08/02/2017

Practically efficient methods for performing bit-reversed permutation in C++11 on the x86-64 architecture

The bit-reversed permutation is a famous task in signal processing and i...
research
12/05/2021

Boosting Mobile CNN Inference through Semantic Memory

Human brains are known to be capable of speeding up visual recognition o...
research
05/30/2015

Recognition of convolutional neural network based on CUDA Technology

For the problem whether Graphic Processing Unit(GPU),the stream processo...
research
08/15/2022

ECI: a Customizable Cache Coherency Stack for Hybrid FPGA-CPU Architectures

Unlike other accelerators, FPGAs are capable of supporting cache coheren...
research
11/23/2020

Proximu: Efficiently Scaling DNN Inference in Multi-core CPUs through Near-Cache Compute

Deep Neural Network (DNN) inference is emerging as the fundamental bedro...
research
03/04/2022

AgileWatts: An Energy-Efficient CPU Core Idle-State Architecture for Latency-Sensitive Server Applications

User-facing applications running in modern datacenters exhibit irregular...

Please sign up or login with your details

Forgot password? Click here to reset