Fast Integral Histogram Computations on GPU for Real-Time Video Analytics

11/06/2017
by   Mahdieh Poostchi, et al.
0

In many Multimedia content analytics frameworks feature likelihood maps represented as histograms play a critical role in the overall algorithm. Integral histograms provide an efficient computational framework for extracting multi-scale histogram-based regional descriptors in constant time which are considered as the principle building blocks of many video content analytics frameworks. We evaluate four different mappings of the integral histogram computation onto Graphics Processing Units (GPUs) using different kernel optimization strategies. Our kernels perform cumulative sums on row and column histograms in a cross-weave or wavefront scan order, use different data organization and scheduling methods that is shown to critically affect utilization of GPU resources (cores and shared memory). Tiling the 3-D array into smaller regular data blocks significantly speeds up the efficiency of the computation compared to a strip-based organization. The tiled integral histogram using a diagonal wavefront scan has the best performance of about 300.4 frames/sec for 640 x 480 images and 32 bins with a speedup factor of about 120 using GTX Titan X graphics card compared to a single threaded sequential CPU implementation. Double-buffering has been exploited to overlap computation and communication across sequence of images. Mapping integral histogram bins computations onto multiple GPUs enables us to process 32 giga bytes integral histogram data (of 64MB Image and 128 bins) with a frame rate of 0.73 Hz and speedup factor of 153X over single-threaded CPU implementation and the speedup of 45X over 16-threaded CPU implementation.

READ FULL TEXT

page 7

page 8

page 9

page 10

page 11

research
06/26/2017

GPU-acceleration for Large-scale Tree Boosting

In this paper, we present a novel massively parallel algorithm for accel...
research
05/13/2021

Efficient executions of Pipelined Conjugate Gradient Method on Heterogeneous Architectures

The Preconditioned Conjugate Gradient (PCG) method is widely used for so...
research
03/25/2021

ButterFly BFS – An Efficient Communication Pattern for Multi Node Traversals

Breadth-First Search (BFS) is a building block used in a wide array of g...
research
05/09/2017

Multi-Scale Spatially Weighted Local Histograms in O(1)

Weighting pixel contribution considering its location is a key feature i...
research
11/20/2012

Tera-scale Astronomical Data Analysis and Visualization

We present a high-performance, graphics processing unit (GPU)-based fram...
research
02/02/2021

Mobile-end Tone Mapping based on Integral Image and Integral Histogram

Wide dynamic range (WDR) image tone mapping is in high demand in many ap...
research
05/10/2023

Fast Event-based Double Integral for Real-time Robotics

Motion deblurring is a critical ill-posed problem that is important in m...

Please sign up or login with your details

Forgot password? Click here to reset