Sidebar: Scratchpad Based Communication Between CPUs and Accelerators

10/23/2019
by   Ayoosh Bansal, et al.
0

Hardware accelerators for neural networks have shown great promise for both performance and power. These accelerators are at their most efficient when optimized for a fixed functionality. But this inflexibility limits the longevity of the hardware itself as the underlying neural network algorithms and structures undergo improvements and changes. We propose and evaluate a flexible design paradigm for accelerators with a close coordination with host processors. The relatively static matrix operations are implemented in specialized accelerators while fast-evolving functions, such as activations, are computed on the host processor. This architecture is enabled by a low latency shared buffer we call Sidebar. Sidebar memory is shared between the accelerator and host, exists outside of program address space and holds intermediate data only. We show that a generalised DMA dependent flexible accelerator design performs poorly in both perf and energy as compared to an equivalent fixed function accelerator. Sidebar based accelerator design achieves near identical performance and energy to equivalent fixed function accelerator while still providing all the flexibility of computing activations on the host processor.

READ FULL TEXT

page 1

page 3

page 5

page 7

page 8

research
09/08/2019

TMA: Tera-MACs/W Neural Hardware Inference Accelerator with a Multiplier-less Massive Parallel Processor

Computationally intensive Inference tasks of Deep neural networks have e...
research
09/08/2022

Hardware Accelerator and Neural Network Co-Optimization for Ultra-Low-Power Audio Processing Devices

The increasing spread of artificial neural networks does not stop at ult...
research
08/27/2015

Automatic Nested Loop Acceleration on FPGAs Using Soft CGRA Overlay

Offloading compute intensive nested loops to execute on FPGA accelerator...
research
12/01/2020

Toward Accurate Platform-Aware Performance Modeling for Deep Neural Networks

In this paper, we provide a fine-grain machine learning-based method, Pe...
research
04/17/2023

Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU

Dedicated accelerator hardware has become essential for processing AI-ba...
research
08/29/2018

Scalable and Efficient Virtual Memory Sharing in Heterogeneous SoCs with TLB Prefetching and MMU-Aware DMA Engine

Shared virtual memory (SVM) is key in heterogeneous systems on chip (SoC...
research
07/06/2016

A configurable accelerator for manycores: the Explicitly Many-Processor Approach

A new approach to designing processor accelerators is presented. A new c...

Please sign up or login with your details

Forgot password? Click here to reset