One machine, one minute, three billion tetrahedra
This paper presents a new scalable parallelization scheme to generate the 3D Delaunay triangulation of a given set of points. Our first contribution is an efficient serial implementation of the incremental Delaunay insertion algorithm. A simple dedicated data structure and a number of improvements in the insertion algorithm have permitted to accelerate by a factor three reference implementations. Our second contribution is a multi-threaded version of the Delaunay kernel able to concurrently insert vertices. Moore curve coordinates are used to partition the point set, avoiding so heavy synchronization overheads. Conflicts are managed by modification of the partition with a simple rescaling of the space-filling curve. The performances of our implementation have been measured on three different processors, Intel core-i7, Intel Xeon Phi and AMD EPYC, on which we have been able to compute 3 billion tetrahedra in 53 seconds . This corresponds to a generation rate of over 55 million tetrahedra per second which is, to our best knowledge, three times the rate reached by the current fastest implementation. It is finally shown how this very efficient parallel Delaunay triangulation can be integrated in a Delaunay refinement mesh generator taking as input the boundary of the domain to mesh.
READ FULL TEXT