DeepAI AI Chat
Log In Sign Up

ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM

03/03/2022
by   Yusuke Matsui, et al.
0

We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. Experiments show that the proposed method consistently achieves a 10x improvement over the naive PQ with the same accuracy.

READ FULL TEXT

page 1

page 2

page 3

page 4

08/14/2019

Differentiable Soft Quantization: Bridging Full-Precision and Low-Bit Neural Networks

Hardware-friendly network quantization (e.g., binary/uniform quantizatio...
04/24/2017

Accelerated Nearest Neighbor Search with Quick ADC

Efficient Nearest Neighbor (NN) search in high-dimensional spaces is a f...
07/18/2022

Accelerating Deep Learning Model Inference on Arm CPUs with Ultra-Low Bit Quantization and Runtime

Deep Learning has been one of the most disruptive technological advancem...
05/16/2019

Derived Codebooks for High-Accuracy Nearest Neighbor Search

High-dimensional Nearest Neighbor (NN) search is central in multimedia s...
02/19/2020

Fast Implementation of Morphological Filtering Using ARM NEON Extension

In this paper we consider speedup potential of morphological image filte...
10/10/2018

Adding 32-bit Mode to the ACL2 Model of the x86 ISA

The ACL2 model of the x86 Instruction Set Architecture was built for the...