DeepAI AI Chat
Log In Sign Up

ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM

by   Yusuke Matsui, et al.

We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. Experiments show that the proposed method consistently achieves a 10x improvement over the naive PQ with the same accuracy.


page 1

page 2

page 3

page 4


Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantizatio...

Accelerated Nearest Neighbor Search with Quick ADC

Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a f...

Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime

Deep Learning has been one of the most disruptive technological advancem...

Derived Codebooks for High-Accuracy Nearest Neighbor Search

High-dimensional Nearest Neighbor (NN) search is central in multimedia s...

Fast Implementation of Morphological Filtering Using ARM NEON Extension

In this paper we consider speedup potential of morphological image filte...

Adding 32-bit Mode to the ACL2 Model of the x86 ISA

The ACL2 model of the x86 Instruction Set Architecture was built for the...