GPU Accelerated Similarity Self-Join for Multi-Dimensional Data

09/26/2018
by   Michael Gowanlock, et al.
0

The self-join finds all objects in a dataset that are within a search distance, epsilon, of each other; therefore, the self-join is a building block of many algorithms. We advance a GPU-accelerated self-join algorithm targeted towards high dimensional data. The massive parallelism afforded by the GPU and high aggregate memory bandwidth makes the architecture well-suited for data-intensive workloads. We leverage a grid-based, GPU-tailored index to perform range queries. We propose the following optimizations: (i) a trade-off between candidate set filtering and index search overhead by exploiting properties of the index; (ii) reordering the data based on variance in each dimension to improve the filtering power of the index; and (iii) a pruning method for reducing the number of expensive distance calculations. Across most scenarios on real-world and synthetic datasets, our algorithm outperforms the parallel state-of-the-art approach. Exascale systems are converging on heterogeneous distributed-memory architectures. We show that an entity partitioning method can be utilized to achieve a balanced workload, and thus good scalability for multi-GPU or distributed-memory self-joins.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/12/2018

GPU Accelerated Self-join for the Distance Similarity Metric

The self-join finds all objects in a dataset within a threshold of each ...
research
10/10/2018

Technical Report: KNN Joins Using a Hybrid Approach: Exploiting CPU/GPU Workload Characteristics

This paper studies finding the K nearest neighbors (KNN) of all points i...
research
01/02/2019

Vector and Line Quantization for Billion-scale Similarity Search on GPUs

Billion-scale high-dimensional approximate nearest neighbour (ANN) searc...
research
09/22/2022

Computing Double Precision Euclidean Distances using GPU Tensor Cores

Tensor cores (TCs) are a type of Application-Specific Integrated Circuit...
research
06/08/2019

GSI: GPU-friendly Subgraph Isomorphism

Subgraph isomorphism is a well-known NP-hard problem that is widely used...
research
08/05/2022

Towards Fast Theta-join: A Prefiltering and Amalgamated Partitioning Approach

As one of the most useful online processing techniques, the theta-join o...
research
09/18/2023

Speculative Progressive Raycasting for Memory Constrained Isosurface Visualization of Massive Volumes

New web technologies have enabled the deployment of powerful GPU-based c...

Please sign up or login with your details

Forgot password? Click here to reset