DeepAI AI Chat
Log In Sign Up

Fast MPEG-CDVS Encoder with GPU-CPU Hybrid Computing

by   Lingyu Duan, et al.
City University of Hong Kong
Nanyang Technological University
Peking University

The compact descriptors for visual search (CDVS) standard from ISO/IEC moving pictures experts group (MPEG) has succeeded in enabling the interoperability for efficient and effective image retrieval by standardizing the bitstream syntax of compact feature descriptors. However, the intensive computation of CDVS encoder unfortunately hinders its widely deployment in industry for large-scale visual search. In this paper, we revisit the merits of low complexity design of CDVS core techniques and present a very fast CDVS encoder by leveraging the massive parallel execution resources of GPU. We elegantly shift the computation-intensive and parallel-friendly modules to the state-of-the-arts GPU platforms, in which the thread block allocation and the memory access are jointly optimized to eliminate performance loss. In addition, those operations with heavy data dependence are allocated to CPU to resolve the extra but non-necessary computation burden for GPU. Furthermore, we have demonstrated the proposed fast CDVS encoder can work well with those convolution neural network approaches which has harmoniously leveraged the advantages of GPU platforms, and yielded significant performance improvements. Comprehensive experimental results over benchmarks are evaluated, which has shown that the fast CDVS encoder using GPU-CPU hybrid computing is promising for scalable visual search.


page 2

page 5

page 10


Protecting real-time GPU kernels on integrated CPU-GPU SoC platforms

Integrated CPU-GPU architecture provides excellent acceleration capabili...

Compression-Based Optimizations for Out-of-Core GPU Stencil Computation

An out-of-core stencil computation code handles large data whose size is...

SParSH-AMG: A library for hybrid CPU-GPU algebraic multigrid and preconditioned iterative methods

Hybrid CPU-GPU algorithms for Algebraic Multigrid methods (AMG) to effic...

Scalable Traffic Predictive Analysis using GPU in Big Data

The paper adopts parallel computing systems for predictive analysis in b...

A Computing Kernel for Network Binarization on PyTorch

Deep Neural Networks have now achieved state-of-the-art results in a wid...

GPU Parallel Computation of Morse-Smale Complexes

The Morse-Smale complex is a well studied topological structure that rep...

Enabling Highly Efficient Capsule Networks Processing Through A PIM-Based Architecture Design

In recent years, the CNNs have achieved great successes in the image pro...