An Application-Specific VLIW Processor with Vector Instruction Set for CNN Acceleration

04/10/2019
by   Andreas Bytyn, et al.
0

In recent years, neural networks have surpassed classical algorithms in areas such as object recognition, e.g. in the well-known ImageNet challenge. As a result, great effort is being put into developing fast and efficient accelerators, especially for Convolutional Neural Networks (CNNs). In this work we present ConvAix, a fully C-programmable processor, which – contrary to many existing architectures – does not rely on a hard-wired array of multiply-and-accumulate (MAC) units. Instead it maps computations onto independent vector lanes making use of a carefully designed vector instruction set. The presented processor is targeted towards latency-sensitive applications and is capable of executing up to 192 MAC operations per cycle. ConvAix operates at a target clock frequency of 400 MHz in 28nm CMOS, thereby offering state-of-the-art performance with proper flexibility within its target domain. Simulation results for several 2D convolutional layers from well known CNNs (AlexNet, VGG-16) show an average ALU utilization of 72.5 instructions with 16 bit fixed-point arithmetic. Compared to other well-known designs which are less flexible, ConvAix offers competitive energy efficiency of up to 497 GOP/s/W while even surpassing them in terms of area efficiency and processing speed.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/21/2021

Reconfigurable co-processor architecture with limited numerical precision to accelerate deep convolutional neural networks

Convolutional Neural Networks (CNNs) are widely used in deep learning ap...
research
06/16/2023

Sparq: A Custom RISC-V Vector Processor for Efficient Sub-Byte Quantized Inference

Convolutional Neural Networks (CNNs) are used in a wide range of applica...
research
05/02/2022

PSCNN: A 885.86 TOPS/W Programmable SRAM-based Computing-In-Memory Processor for Keyword Spotting

Computing-in-memory (CIM) has attracted significant attentions in recent...
research
05/18/2019

Low-power Programmable Processor for Fast Fourier Transform Based on Transport Triggered Architecture

This paper describes a low-power processor tailored for fast Fourier tra...
research
04/16/2019

Processing-In-Memory Acceleration of Convolutional Neural Networks for Energy-Efficiency, and Power-Intermittency Resilience

Herein, a bit-wise Convolutional Neural Network (CNN) in-memory accelera...
research
10/13/2019

eCNN: A Block-Based and Highly-Parallel CNN Accelerator for Edge Inference

Convolutional neural networks (CNNs) have recently demonstrated superior...

Please sign up or login with your details

Forgot password? Click here to reset