Origami: A 803 GOp/s/W Convolutional Network Accelerator

12/14/2015
by   Lukas Cavigelli, et al.
0

An ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow and superresolution. Hardware acceleration of these algorithms is essential to adopt these improvements in embedded and mobile computer vision systems. We present a new architecture, design and implementation as well as the first reported silicon measurements of such an accelerator, outperforming previous work in terms of power-, area- and I/O-efficiency. The manufactured device provides up to 196 GOp/s on 3.09 mm^2 of silicon in UMC 65nm technology and can achieve a power efficiency of 803 GOp/s/W. The massively reduced bandwidth requirements make it the first architecture scalable to TOp/s performance.

READ FULL TEXT

page 3

page 5

page 6

page 9

page 10

page 11

page 13

page 14

research
05/06/2018

SqueezeJet: High-level Synthesis Accelerator Design for Deep Convolutional Neural Networks

Deep convolutional neural networks have dominated the pattern recognitio...
research
03/19/2017

Algorithms for Semantic Segmentation of Multispectral Remote Sensing Imagery using Deep Learning

Deep convolutional neural networks (DCNNs) have been used to achieve sta...
research
07/07/2023

ITA: An Energy-Efficient Attention and Softmax Accelerator for Quantized Transformers

Transformer networks have emerged as the state-of-the-art approach for n...
research
11/08/2017

Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS

Many modern video processing pipelines rely on edge-aware (EA) filtering...
research
12/01/2022

TCN-CUTIE: A 1036 TOp/s/W, 2.72 uJ/Inference, 12.2 mW All-Digital Ternary Accelerator in 22 nm FDX Technology

Tiny Machine Learning (TinyML) applications impose uJ/Inference constrai...
research
02/08/2019

Software-Defined FPGA Accelerator Design for Mobile Deep Learning Applications

Recently, the field of deep learning has received great attention by the...
research
05/07/2020

Optimizing Temporal Convolutional Network inference on FPGA-based accelerators

Convolutional Neural Networks are extensively used in a wide range of ap...

Please sign up or login with your details

Forgot password? Click here to reset