Hydra: An Accelerator for Real-Time Edge-Aware Permeability Filtering in 65nm CMOS

11/08/2017
by   Manuel Eggimann, et al.
0

Many modern video processing pipelines rely on edge-aware (EA) filtering methods. However, recent high-quality methods are challenging to run in real-time on embedded hardware due to their computational load. To this end, we propose an area-efficient and real-time capable hardware implementation of a high quality EA method. In particular, we focus on the recently proposed permeability filter (PF) that delivers promising quality and performance in the domains of HDR tone mapping, disparity and optical flow estimation. We present an efficient hardware accelerator that implements a tiled variant of the PF with low on-chip memory requirements and a significantly reduced external memory bandwidth (6.4x w.r.t. the non-tiled PF). The design has been taped out in 65 nm CMOS technology, is able to filter 720p grayscale video at 24.8 Hz and achieves a high compute density of 6.7 GFLOPS/mm2 (12x higher than embedded GPUs when scaled to the same technology node). The low area and bandwidth requirements make the accelerator highly suitable for integration into SoCs where silicon area budget is constrained and external memory is typically a heavily contended resource.

READ FULL TEXT

page 2

page 4

research
05/09/2022

A Real Time Super Resolution Accelerator with Tilted Layer Fusion

Deep learning based superresolution achieves high-quality results, but i...
research
05/06/2022

OMU: A Probabilistic 3D Occupancy Mapping Accelerator for Real-time OctoMap at the Edge

Autonomous machines (e.g., vehicles, mobile robots, drones) require soph...
research
05/24/2023

Generative Adversarial Shaders for Real-Time Realism Enhancement

Application of realism enhancement methods, particularly in real-time an...
research
02/04/2019

Optimally Scheduling CNN Convolutions for Efficient Memory Access

Embedded inference engines for convolutional networks must be parsimonio...
research
01/12/2023

Neural Shadow Mapping

We present a neural extension of basic shadow mapping for fast, high qua...
research
12/14/2015

Origami: A 803 GOp/s/W Convolutional Network Accelerator

An ever increasing number of computer vision and image/video processing ...
research
08/30/2023

ACNPU: A 4.75TOPS/W 1080P@30FPS Super Resolution Accelerator with Decoupled Asymmetric Convolution

Deep learning-driven superresolution (SR) outperforms traditional techni...

Please sign up or login with your details

Forgot password? Click here to reset