Block-Parallel IDA* for GPUs (Extended Manuscript)

05/08/2017
by   Satoru Horie, et al.
0

We investigate GPU-based parallelization of Iterative-Deepening A* (IDA*). We show that straightforward thread-based parallelization techniques which were previously proposed for massively parallel SIMD processors perform poorly due to warp divergence and load imbalance. We propose Block-Parallel IDA* (BPIDA*), which assigns the search of a subtree to a block (a group of threads with access to fast shared memory) rather than a thread. On the 15-puzzle, BPIDA* on a NVIDIA GRID K520 with 1536 CUDA cores achieves a speedup of 4.98 compared to a highly optimized sequential IDA* implementation on a Xeon E5-2670 core.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/02/2018

Parallelization of the FFT on SO(3)

In this paper, a work-optimal parallelization of Kostelec and Rockmore's...
research
01/16/2012

Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy

Large-scale, parallel clusters composed of commodity processors are incr...
research
02/03/2022

Parallel domain discretization algorithm for RBF-FD and other meshless numerical methods for solving PDEs

In this paper, we present a novel parallel dimension-independent node po...
research
11/28/2022

Distributed Parallelization of xPU Stencil Computations in Julia

We present a straightforward approach for distributed parallelization of...
research
03/22/2019

Parallel Adaptive Sampling with almost no Synchronization

Approximation via sampling is a widespread technique whenever exact solu...
research
04/12/2018

A Scalable Shared-Memory Parallel Simplex for Large-Scale Linear Programming

We present a shared-memory parallel implementation of the Simplex tablea...
research
03/07/2022

Parallel Training of GRU Networks with a Multi-Grid Solver for Long Sequences

Parallelizing Gated Recurrent Unit (GRU) networks is a challenging task,...

Please sign up or login with your details

Forgot password? Click here to reset