1 Introduction
The alpha complex of a set of points in is a subset of the Delaunay triangulation. A size parameter determines the set of simplices (tetrahedra, triangles, edges, and vertices) of the Delaunay triangulation that are included in the alpha complex. It is an elegant representation of the shape of the set of points Edelsbrunner et al. (1983); Edelsbrunner and Mücke (1994); Edelsbrunner (2011) and has found various applications, particularly in molecular modeling and molecular graphics. The atoms in a biomolecule are represented by weighted points in and the region occupied by the molecule is represented by the union of balls centered at these points. The geometric shape of a biomolecule determines its function, namely how it interacts with other biomolecules. The alpha complex represents the geometric shape of the molecule very efficiently. It has been widely used for computing and studying geometric features such as cavities and channels Liang et al. (1998b, c); Dundas et al. (2006); Masood et al. (2015); Sridharamurthy et al. (2016); Masood and Natarajan (2016); Krone et al. (2016). Further, an alpha complex based representation is also crucial for accurate computation of geometric properties like volume and surface area Liang et al. (1998a); Edelsbrunner and Koehl (2005); Mach and Koehl (2011).
Advances in imaging technology has resulted in a significant increase in the size of molecular structure data. This necessitates the developments of efficient methods for storing, processing, and querying these structures. In this paper, we study the problem of efficient construction of the alpha complex with particular focus on point distributions that are typical of biomolecules. In particular, we present a parallel algorithm for computing the alpha complex and an efficient GPU implementation that outperforms existing methods. In contrast to existing algorithms, our algorithm does not require the explicit construction of the Delaunay triangulation.
1.1 Related work
The Delaunay triangulation has been studied within the field of computational geometry for several decades and numerous algorithms have been proposed for its construction Aurenhammer et al. (2013). Below, we describe only a few methods that are most relevant to this paper. A tetrahedron belongs to the Delaunay triangulation of a set of points in if and only if it satisfies the empty circumsphere property, namely no point is contained within the circumsphere of the tetrahedron. The BowyerWatson algorithm Bowyer (1981); Watson (1981) and the incremental insertion algorithm by Guibas and Stolfi Guibas et al. (1992) are based on the above characterization of the Delaunay triangulation. In both methods, points are inserted incrementally and the triangulation is locally updated to ensure that the Delaunay property is satisfied. The incremental insertion method followed by bistellar flipping works in higher dimensions also Edelsbrunner and Shah (1996) and can construct the Delaunay triangulation in time in the worst case, where is the number of input points in . A second approach to constructing the Delaunay triangulation is based on its equivalence to the convex hull of the points lifted onto a dimensional paraboloid Edelsbrunner and Seidel (1986).
A third divideandconquer approach partitions the inputs points into two or multiple subsets, constructs the Delaunay triangulation for each partition, and merges the pieces of the triangulation finally. The merge procedure depends on the ability to order the edges incident on a vertex and hence works only in . Extension to requires that the merge procedure be executed first Cignoni et al. (1998). The divideandconquer strategy directly extends to a parallel algorithm Nanjappa (2012); Cao et al. (2014). The DeWall algorithm Cignoni et al. (1998) partitions the input point set into two halves and first constructs the triangulation of points lying within the boundary region of the two partitions. The Delaunay triangulation of the two holes is then constructed in parallel. The process is repeated recursively resulting in increased parallelism. Cao et al. Cao et al. (2014) have developed a GPU parallel algorithm, gDel3D, that constructs the Delaunay triangulation in two stages. In the first stage, points are inserted in parallel followed by flipping to obtain an approximate Delaunay triangulation. In the second stage, a star splaying procedure works locally to convert nonDelaunay tetrahedra into Delaunay tetrahedra. The algorithm can be extended to construct the weighted Delaunay triangulation for points with weights. Cao et al. report upto speed up over a sequential implementation for constructing the weighted Delaunay triangulation of 3 million weighted points.
All existing algorithms for constructing the alpha complex Edelsbrunner and Mücke (1994); Mach and Koehl (2011); Da et al. (2017) require that the Delaunay triangulation be computed first. Simplices that belong to the alpha complex are identified using a size filtration in a second step. In the case of biomolecules, the size of the alpha complex is a small fraction of the size of the Delauanay triangulation and hence the Delaunay triangulation construction is often the bottleneck in the alpha complex computation. The key difficulty lies in the absence of a direct characterization of simplices that belong to the alpha complex.
1.2 Summary of results
We propose an algorithm that avoids the expensive Delaunay triangulation computation and instead directly computes the alpha complex for biomolecules. Key contributions of this paper are summarized below:

A new characterization of the alpha complex – a set of conditions necessary and sufficient for a simplex to be a part of the alpha complex.

A new algorithm for computing the alpha complex of a set of weighted points in . The algorithm identifies simplices of the alpha complex in decreasing order of dimension without computing the complete weighted Delaunay triangulation.

An efficient CUDA based parallel implementation of this algorithm for biomolecular data that can compute the alpha complex for a 10 million point dataset in approximately 10 seconds.

A proof of correctness of the algorithm and comprehensive experimental validation to demonstrate that it outperforms existing methods.
While the experimental results presented here focus on biomolecular data, the algorithm is applicable to data from other application domains as well. In particular, the efficient GPU implementation may be used for points that arise in smoothed particle hydrodynamics (SPH) simulations, atomistic simulations in material science, and particle systems that appear in computational fluid dynamics (CFD).
2 Background
In this section, we review the necessary background on Delaunay triangulations required to describe the algorithm and also establish a new characterization of the alpha complex that does not require the Delaunay triangulation. For a detailed description of Delaunay triangulations, alpha complexes, and related structures, we refer the reader to various books on the topic Aurenhammer et al. (2013); Edelsbrunner (2001, 2010).
Let denote a set of balls or weighted points, where represents a ball centered at with radius . We limit our discussion to balls in , so . Further, we assume that the points in are in general position, i.e., no 2 points have the same location, no 3 points are collinear, no 4 points are coplanar, and no subset of 5 points are equidistant from a point in . Such configurations are called degeneracies. In practice, a degenerate input can be handled via symbolic perturbation Edelsbrunner and Mücke (1990).
2.1 Simplex and simplicial complex
A dimensional simplex is defined as the convex hull of affinely independent points. Assuming the centres of balls in are in general position, all sized subsets of form a simplex . For simplicity, we sometimes use instead of the center to refer to points incident on a simplex. For example, we may write .
A nonempty strict subset of is also a simplex but with dimension smaller than . Such a simplex is called a face of . Specifically, a dimensional face of is referred to as a facet of . A set of simplices is called a simplicial complex if: 1) a simplex implies that all faces of also belong to , and 2) for two simplices , , either or .
2.2 Power distance and weighted Voronoi diagram
The power distance between a point and a ball is defined as
The weighted Voronoi diagram is an extension of the Voronoi diagram to weighted points. It is a partition of based on proximity to input balls in terms of the power distance. Points that are closer to the ball compared to all other balls () constitute the Voronoi region of . Points equidistant from two balls , and closer to the two balls compared to other balls constitute a Voronoi face. Similarly, points equidistant from three balls and fours balls constitute Voronoi edges and Voronoi vertices of the weighted Voronoi diagram, respectively. Figure 1 shows the weighted Voronoi diagram for a set of 2D weighted points or disks on the plane. Similar to the unweighted case, the Voronoi regions of the weighted Voronoi diagram are convex and linear. However, the weights may lead to a configuration where the Voronoi region of is disjoint from . This occurs when is contained within another ball . Further, the Voronoi region of may even be empty.
2.3 Weighted Delaunay triangulation
The weighted Delaunay triangulation is the the dual of the weighted Voronoi diagram, see Figure 1. It is a simplicial complex consisting of simplices that are dual to the cells of the weighted Voronoi diagram. The following equivalent definition characterizes a simplex belonging to a Delaunay triangulation
Definition 1 (Weighted Delaunay Triangulation)
A simplex = belongs to the weighted Delaunay triangulation of if and only if there exists a point such that
 DT1:

, and
 DT2:

for .
A point that satisfies the above two conditions, DT1 and DT2, is called a witness for . We call a point that minimizes the distance and satisfies both conditions as the closest witness, denoted by . This minimum distance is called the size of the simplex . A point that minimizes the distance and satisfies DT1 is called the of simplex . The distance is called the of the simplex . Clearly, the of a simplex is lower bounded by its . Figure 2 shows the two possible scenarios, namely when and .
2.4 Alpha complex
Given a parameter , we can construct a subset of the weighted Delaunay triangulation by filtering simplices whose is less than or equal to , see Figures 1 and 1. The resulting subset is a subcomplex of the Delaunay complex and is denoted :
The following equivalent definition characterizes simplices of the alpha complex without explicitly referring to the Delaunay triangulation.
Definition 2 (Alpha complex)
A simplex = belongs to the alpha complex of if and only if there exists a point such that the following three conditions are satisfied:
 AC1:

,
 AC2:

for , and
 AC3:

or of is at most .
3 Algorithm
We now describe an algorithm to compute the alpha complex and prove its correctness. The algorithm utilizes the characterizing conditions introduced above. It first identifies the tetrahedra that belong to the alpha complex, followed by the set of triangles, edges and vertices. Figure 3 illustrates the algorithm as applied to disks on the plane.
3.1 Outline
The alpha complex consists of simplices of dimensions 0–3, , where is the set of dimensional simplices in . We initialize and construct in five steps described below:
 Step 1:

Compute the set of all simplices such that . Let this set be denoted by .
 Step 2:

For all tetrahedra , check condition AC2 using . If satisfies AC2 then insert it into .
 Step 3:

Insert all triangles that are incident on tetrahedra in into . Let , where denotes the set of facets of tetrahedra in . For all triangles , check condition AC2 using . If satisfies AC2 then insert it into .
 Step 4:

Insert all edges incident on triangles in into . Let , where denotes the set of facets of triangles in . For all edges , check condition AC2 using . If satisfies AC2 then insert it into .
 Step 5:

Insert all end points of edges in into . Let , where denotes the set of balls incident on edges in . For all balls , check condition AC2 using . If satisfies AC2 then insert it into .
Step 1 selects simplices that satisfy AC3. Step 2 recognizes tetrahedra that belong to the alpha complex by checking AC2 using . Triangle faces of these tetrahedra also belong to . The other “dangling” triangles belong to if they satisfy AC2. Step 4 identify edges similarly. First all edge faces of triangles in are inserted followed by those “dangling” edges that satisfy AC2. Vertices are identified similarly in Step 5.
3.2 Proof of correctness
We now prove that that the algorithm described above correctly computes the alpha complex of the given set of weighted points by proving the following four claims. Each claim states that the set of simplices computed in Steps 2, 3, 4 and 5 are exactly the simplices belonging to the alpha complex.
Claim 1.
Step 2 computes correctly.
Proof.
We assume that the input is nondegenerate. So, for a tetrahedron , is the only point that satisfies condition AC1. In Step 2 of the proposed algorithm, we check if AC2 holds for . If yes, then is a witness for , i.e., . Further, since and , we have size thereby satisfying AC3. Therefore, belongs to because it satisfies all three conditions. ∎
We now prove that the algorithm correctly identifies the triangles of the alpha complex.
Lemma 1.
A triangle belongs to if and only if it satisfies AC2 with .
Proof.
We first prove the backward implication, namely if satisfies AC2 with , then . Note that satisfies AC1 by definition. Further, it satisfies AC2 by assumption and hence . We also have because . So, thereby satisfying AC3. The triangle with satisfies all three conditions and hence belongs to .
We will now prove the forward implication via contradiction. Suppose there exists a triangle that belongs to but does not satisfy AC2 with . In other words, there exists a ball for which . Let denote the set of all such balls . The set of points that are equidistant from the three balls corresponding to form a line perpendicular to the plane containing called the radical axis. Each ball partitions the radical axis into two halfintervals based on whether the point on radical axis is closer to or to , see Figure 4. Let denote the half interval consisting of points that are closer to compared to . Let denote the intersection of all such half intervals . We have assumed that , so there must exist a closest witness and it has to lie within . Thus, is nonempty. In fact, for some and is exactly the end point of . This implies that is also a closest witness for the tetrahedron . So, belongs to and its size is equal to . However, this means that , a contradiction. So, the forward implication in the lemma is true. ∎
Claim 2.
Step 3 computes correctly.
Proof.
If a simplex belongs to then naturally all of its faces also belong to . The algorithm includes such triangles into and remove them from to obtain the set of free triangles that have and are not incident on any tetrahedron in . It follows directly from Lemma 1 that AC2 is a necessary and sufficient condition for a triangle in to belong to . Hence, Step 3 correctly computes the triangles belonging to . ∎
The above arguments need to be extended to prove that the edges of the alpha complex are also correctly identified.
Lemma 2.
An edge belongs to if and only if it satisfies the condition AC2 with .
Proof.
First, we assume that satisfies AC2 with . The point satisfies AC1 by definition. Further, it satisfies AC2 by assumption and hence . We also have because . So, thereby satisfying AC3. The edge with satisfies all three conditions and hence belongs to .
We will prove the forward implication via contradiction. Suppose there exists an edge that belongs to but does not satisfy AC2 with . In other words, there exists a ball such that . Let denote the set of all such balls . The set of points that are equidistant from the two balls corresponding to form a plane perpendicular to the line containing called the radical plane. Each ball partitions the radical plane into two halfplanes based on whether the point on the radical plane is closer to or to , see Figure 5. Let denote the halfplane consisting of points that are closer to compared to . Let denote the intersection of all such halfplanes . We have assumed that , so there must exist a closest witness and it has to lie within . Thus, is nonempty. In fact, lies on the envelope of because it minimizes the distance to . Let lie on the bounding line corresponding to for some . This implies that is also a closest witness for the triangle . So, belongs to and its size is equal to . However, this means that , a contradiction. So, the forward implication in the lemma is also true. ∎
Claim 3.
Step 4 computes correctly.
Proof.
All edge faces of triangles in naturally belong to . Step 4 inserts all edges incident on triangles in into as valid edges and removes them from to obtain the set of free edges . It follows directly from Lemma 2 that AC2 is a necessary and sufficient condition for an edge to belong to . Therefore, Step 4 correctly computes the edges belonging to . ∎
Claim 4.
Step 5 computes correctly.
Proof.
All vertices incident on naturally belong to . Step 5 inserts all such vertices in as valid vertices and removes them from to obtain the set of free vertices . Next, the vertices in for which the center of the ball satisfies AC2 are also inserted into . Clearly, these vertices also satisfy AC3 because they belong to . The condition AC1 is not relevant for dimensional simplices. Therefore, these vertices clearly belong to the alpha complex. Similar to Lemmas 1 and 2, it is easy to prove that checking for AC2 for is necessary and sufficient condition to decide whether a vertex in belongs to the alpha complex or not. That is, it is possible to show that vertices in alpha complex that have nonempty Voronoi regions but do not satisfy AC2 for would be incident on some edge in , and therefore must have been already detected by Step 4 and hence can not belong to . Therefore, Step 5 correctly computes the vertices belonging to . ∎
4 Parallel Algorithm for Biomolecules
Although the algorithm as described above is provably correct, a straight forward implementation will be extremely inefficient with a worst case running time of , where is the number of weighted points in . This is because Step 1 requires time to generate all possible tetrahedra. In later steps, we need effort per simplex to check AC2. However, the input corresponds to atoms in a biomolecule. We show how certain properties of biomolecules can be leveraged to develop a fast parallel implementation.
4.1 Biomolecular data characteristics
Atoms in a biomolecule are well distributed. The following three properties of biomolecules are most relevant:

The radius of an atom is bounded and very small compared to the size of the biomolecule. The typical radius of an atom in a protein molecule ranges between 1Å to 2Å Bondi (1964).

There is a lower bound on the distance between the centres of two atoms. This is called the van der Waals contact distance, beyond which the two atoms start repelling each other. In the case of atoms in protein molecules, this distance is at least 1Å. This property together with the upper bound on atomic radii ensures that no atom is completely contained inside another. This means that the weighted Voronoi regions corresponding to the atoms in a biomolecule can be always be assumed to be nonempty.

Structural biologists are interested in small values of . The two crucial values are 0Å and 1.4Å. The former corresponds to using van der Waals radius and the latter corresponds to the radius of water molecule, which acts as the solvent.
In the light of the above three properties, we can say that the number of simplices of the alpha complex that are incident on a weighted point (atom) is independent of the total number of input atoms and is bounded by a constant Halperin and Overmars (1998).
4.2 Acceleration data structure
The algorithm will benefit from an efficient method for accessing points of that belong to a local neighborhood of a given weighted point. We store the weighted points in a gridbased data structure. Let denote the radius of the largest atom and assume that the value of the parameter is available as input. First, we construct a grid with cells of size and then bin the input atoms into the grid cells. In our implementation, we do not store the grid explicitly because it may contain several empty cells. Instead, we compute the cell index for each input atom and sort the list of atoms by cell index to ensure that atoms that belong to a particular cell are stored at consecutive locations. The cell index is determined based on a row major or column major order. Alternatively, a spacefilling curves like the Hilbert curve could also be used to order the cells.
After the atoms are stored in grid cells, the alpha complex is computed in two stages. In the first stage, we employ a bottom up approach to obtain a conservative estimate of the edges, triangles, and tetrahedra belonging to the alpha complex. The false positives from the first stage are removed in a subsequent pruning stage resulting in the correct alpha complex.
4.3 Potential simplices
The first stage essentially corresponds to Step 1 of the algorithm described in the previous section. We compute the set of potential simplices for which . However, for efficiency reasons we process the simplices in the order of increasing dimension. First, we identify edges that satisfy the AC3 condition. Given the size of the grid cell, end points of edges that satisfy the condition either lie within the same grid cell or in adjacent cells. So, the grid data structure substantially reduces the time required to compute the list of potential edges . Beginning from this set of edges, we construct the set of all possible triangles and retain the triangles whose is no greater than , resulting in the set . Finally, we use the triangles in to construct the list of tetrahedra that satisfy the condition. The above procedure works because the of a simplex is always greater than or equal to the of its faces. The set of simplices identified in this stage contains all simplices of the alpha complex. False positives are pruned in the the second stage described below.
4.4 Pruning
The second stage corresponds to Steps 25 of the algorithm and processes the simplices in the decreasing order of dimension. This stage checks the characterizing condition AC2 to prune into . The tetrahedra are processed by checking if any of the input balls are closer to the than the balls incident on the tetrahedron. If yes, the tetrahedron is pruned away. Else, the tetrahedron is recognized as belonging to the alpha complex and inserted into . Triangles incident on these tetrahedra also belong to the alpha complex and are inserted into after they are removed from the list of potential triangles . Next, the triangles in are processed by checking if they satisfy AC2. If yes, they are inserted into . Else, they are pruned away. All edges incident on triangles belonging to are inserted into and removed from the set . Next, the edges in are processed by checking if they satisfy AC2. Edges that satisfy AC2 are inserted into and the others are pruned away. All the vertices in are directly inserted into without AC2 check because for biomolecular data we assume that Voronoi regions of all the atoms are nonempty. The check for condition AC2 for each simplex is again made efficient by the use of the grid data structure. Atoms that may violate AC2 lie within the same cell as that containing the or within the adjacent cell. Atoms that lie within other cells may be safely ignored.
4.5 CUDA implementation
We use the CUDA framework 6 and the thrust library 29 within CUDA to develop a parallel implementation of the algorithm that executes on the many cores of the GPU. The grid computation is implemented as a CUDA kernel where each atom is processed in parallel. The potential simplices computation and pruning stages are broken down into multiple CUDA kernels and parallelized differently in order to increase efficiency. We now describe the parallelization strategy in brief. For computing the set of potential edges , the initial enumeration of all possible edges is parallelized per atom where atom indices are used to ensure that no duplicate edges are generated. Subsequently, the condition is checked for the edges in parallel. Similarly, for computing potential triangles , the initial enumeration of all possible triangles is parallelized per atom, while the condition is checked within a separate kernel and parallelized per triangle. The same strategy is used for computing the set of potential tetrahedra . The pruning stage is parallelized per tetrahedron, triangle, and edge as required.
4.6 Handling large data sizes
Typical protein structures consist of upto 100,000 atoms. Our implementation can handle datasets of this size easily for reasonable values of . However, the size of datasets is ever increasing. Protein complexes that are available nowadays may consist of millions of atoms, necessitating smart management of GPU memory while handling such data sets.
We propose two strategies and implement one of them. The first strategy is to partition the grid by constructing an octree data structure and choosing an appropriate level in the octree to create partitions. Each partition together with its border cells can be processed independently of other partitions. So, we can copy one partition and its border to the GPU memory, compute its alpha complex, and copy the results back from GPU to CPU memory. After all the partitions are processed, the list of simplices can be concatenated followed by duplicate removal to generate the final alpha complex.
The second strategy is to partition the sorted list of atoms into equal sized chunks and to process each chunk independently. Here, we assume that the complete list of atoms together with the grid data structure fits in the GPU memory. This is a reasonable assumption considering that datasets containing several million atoms can easily fit on modern GPUs, which typically have at least 2GB video memory. Also, the main difficulty in handling large protein structures is managing the large lists of simplices generated within the intermediate steps of the algorithm, when compared to handling the input list of atoms or the output list of simplices. We compute the alpha complex by executing the algorithm in multiple passes. Each pass computes the alpha complex for a single chunk and copies it back to the CPU memory. We have implemented this second strategy and can handle data sizes of upto 16 million atoms on a GPU with 2GB of memory. Results are reported in the next section.
5 Experimental Results
We now present results of computational experiments that demonstrate that the parallel algorithm is fast in practice and significantly better than the stateoftheart. We also performed runtime profiling to better understand the bottlenecks and effect of the parameter on the runtime. All experiments, unless stated otherwise, were performed on a Linux system with an nVidia GTX 660 Ti graphics card running CUDA 8.0 and a 2.0GHz Intel Xeon octa core processor with 16 GB of main memory. The default number of threads per block was set at for all the CUDA kernels.
Mach and Koehl describe two techniques for computing alpha complex of biomolecules called AlphaVol and UnionBall in their paper Mach and Koehl (2011). Both approaches construct the weighted Delaunay triangulation of input atoms first followed by a filtering step to obtain the alpha complex. UnionBall
is the stateoftheart technique for alpha complex computation for biomolecules on multicore CPU. It uses heuristics and optimizations specific to biomolecular data to improve upon
AlphaVol. For biomolecules containing 5 million atoms, AlphaVol takes approximately 8600 seconds for computing the alpha complex, while UnionBall takes approximately 150 seconds. Our method computes the alpha complex in less than 3 seconds for similar sized data, see Table 1.5.1 Comparison with gReg3D
PDB id  Atoms  gReg3D  Simplex  Speed up  

Simplices  Time(ms)  Simplices  Time(ms)  
0.0  1GRM  260  932  13  6295  117  14.8  9.0 
1U71  1505  5696  13  40878  115  13.9  11.1  
3N0H  1509  5739  14  41244  137  13.9  10.0  
4HHB  4384  38796  29  150141  193  25.8  6.6  
2J1N  8142  29642  18  227719  229  13.0  12.7  
1K4C  16068  62851  27  446383  347  14.1  12.9  
2OAU  16647  123175  56  466586  344  26.4  6.2  
1AON  58674  262244  65  1650841  879  15.9  13.5  
1X9P*  217920  924086  113  6142811  2555  15.0  22.6  
1IHM*  677040  2713083  277  –  –  –  –  
4CWU*  5905140  23450403  2709  –  –  –  –  
3IYN*  5975700  24188892  2874  –  –  –  –  
1.0  1GRM  260  1598  15  6295  117  25.4  7.9 
1U71  1505  10828  17  40878  115  26.5  8.5  
3N0H  1509  10965  30  41244  137  26.6  4.6  
4HHB  4384  65987  86  150141  193  44.0  2.2  
2J1N  8142  58205  30  227719  229  25.6  7.6  
1K4C  16068  118467  52  446383  347  26.5  6.7  
2OAU  16647  199101  159  466586  344  42.7  2.2  
1AON  58674  495683  160  1650841  879  30.0  5.5  
1X9P*  217920  1653778  196  6142811  2555  26.9  13.0  
1IHM*  677040  5058507  605  –  –  –  –  
4CWU*  5905140  44411353  5118  –  –  –  –  
3IYN*  5975700  45790463  5501  –  –  –  – 
We are not aware of any algorithm that can compute the alpha complex directly without first constructing the complete Delaunay triangulation. In order to compare the performance, we chose the stateoftheart parallel algorithm for computing the weighted Delaunay triangulation in 3D, gReg3D Cao et al. (2014). The CUDA implementation of gReg3D is available in the public domain. Table 1 compares the running times of our proposed algorithm with that of gReg3D for 12 different biomolecules at and . As evident from the table, we consistently observe significant speedup over gReg3D. The observed speedup is as high as 22 for the biomolecule 1X9P at , one of the largest molecules in our dataset. Clearly, the speedup goes down for when compared to because of the increased number of simplices in the output alpha complex. We also report the number of simplices in the alpha complex compared to the total number of simplices in the Delaunay triangulation under the column ‘Simplex’. This makes it clear why the speedup goes down as is increased from 0 to 1. For example, for the protein 1AON, the fraction of alpha complex simplices increases from to as is increased from 0 to 1. Correspondingly, the speedup decreases from 13.5 to 5.5.
We repeated the experiment on a MS Windows system with an nVidia GTX 980 Ti card running CUDA 8.0 and observed similar speedups. However, the individual runtimes both for our algorithm and for gReg3D were higher on the GTX 980 Ti. We believe that the reason for this increased runtime is that the MS Windows system was utilizing the GPU resources for its various GUI tasks whereas the Linux system did not require as many GPU cycles.
The starred entries in Table 1 are results for execution using the data partitioning approach. This is necessitated because these four large molecules generate large intermediate simplex lists that can not fit into the GPU memory if all the atoms in the molecule are processed at once. We observe that gReg3D is able to successfully compute the Delaunay complex for only one out of these four large molecules and runs out of GPU memory for the remaining three molecules.
5.2 Runtime profiling
PDB id  Atoms  Simplices  Memory  Grid  Potential Simplices  Pruning  Total  

Edges  Tris  Tets  Tets  Tris  Edges  time  
0.0  1GRM  260  932  0.8  1.0  2.8  1.1  1.0  1.2  3.1  1.9  13.0 
1U71  1505  5696  0.9  0.8  2.3  1.7  1.0  1.8  2.6  1.9  13.0  
3N0H  1509  5739  0.7  0.9  2.3  1.4  2.5  1.8  2.5  1.5  13.7  
4HHB  4384  38796  1.1  0.8  2.7  2.0  6.0  8.3  4.6  3.7  29.2  
2J1N  8142  29642  1.1  1.2  3.7  1.6  1.3  2.0  3.9  3.2  18.1  
1K4C  16068  62851  1.7  2.0  4.3  1.5  1.6  3.8  7.6  4.3  26.9  
2OAU  16647  123175  2.1  1.3  4.7  4.3  5.8  21.9  9.5  6.0  55.5  
1AON  58674  262244  4.6  2.8  11.1  5.6  4.1  16.7  10.9  9.4  65.2  
1.0  1GRM  260  1598  1.2  1.4  2.7  1.2  2.0  1.8  2.7  1.8  14.7 
1U71  1505  10828  0.9  0.8  2.3  1.5  3.4  2.5  3.2  2.5  17.1  
3N0H  1509  10965  0.9  1.4  2.4  1.7  14.6  2.6  3.5  2.9  30.0  
4HHB  4384  65987  1.5  1.9  3.4  5.4  23.7  32.8  12.4  4.8  86.0  
2J1N  8142  58205  1.4  1.1  3.4  2.5  3.6  9.0  4.6  4.5  30.3  
1K4C  16068  118467  2.1  1.8  6.1  3.0  3.4  19.5  8.3  7.9  52.2  
2OAU  16647  199101  3.0  1.7  6.0  10.2  28.3  90.0  12.1  7.5  158.9  
1AON  58674  495683  6.3  2.0  12.4  9.9  12.5  87.9  17.9  11.0  159.9 
The two stages of our parallel algorithm (potential simplices and pruning) are further divided into 3 steps each, corresponding to the computation of edges, triangles, and tetrahedra respectively. A grid computation step precedes the two stages. We study the computation effort for each of these seven steps of the algorithm. We also report the time spent in memory transfers from CPU to GPU and viceversa. Thus, we report the split up of the total runtime into eight categories namely, ‘Memory transfer’, ‘Grid computation’, ‘Potential edges’, ‘Potential triangles’, ‘Potential tetrahedra’, ‘AC2 tetrahedra’, ‘AC2 triangles’ and ‘AC2 edges’.
Table 2 summarizes the observed split up of runtime for 8 different biomolecules. Figure 6 shows the actual time spent for different steps and Figure 7 shows relative time spent for each step. From these figures, it is clear that the pruning stage consumes the maximum amount of time. The pruning stage involves checking the neighboring balls for violations of the AC2 condition for each simplex. Specially, the tetrahedra pruning step (red) takes approximately of the total time required for alpha complex computation.
We performed additional experiments to determine the average split up over multiple runs. We computed the relative time spent for each step for different values of between 0.0 and 2.0. These observations are reported in Figure 8. It is clear that the memory transfers and grid computation combined do not take more than of the total time. The potential simplices estimation stage consumes of the time. However, the pruning stage is most expensive, taking up roughly of the computation time. Pruning tetrahedra step takes up of the time on average. This suggests that this step should be the focus of the optimization efforts in future. It should be noted that proportion of time spent for each step depends on the distribution of atoms in the biomolecule as well as the value of . This explains the significant deviation from the averages as shown by the error bars.
5.3 Effect of the value of
We also performed experiments to observe the runtime performance as the value of is varied between 0.0 and 2.0. Figures 9 and 10 show the results for the proteins 1K4C and 1AON, respectively. We also show how the number of simplices in the computed alpha complex increases as the value of is increased. The runtime and total number of simplices follow a nearlinear trend. However, increase in time required for pruning, especially for pruning the tetrahedra, is greater than time required for other steps of the algorithm. Note that although both graphs appear linear, this is not guaranteed behavior for other input. The scaling behavior depends on the distribution of the atoms in the molecule and on the range of values for which the experiment is conducted.
5.4 Numerical issues
PDB id  Atoms  Simplices  Misclassified Simplices  Error rate  
Edges  Triangles  Tetrahedra  Total  
0.0  1GRM  260  932  0  0  0  0  0.0000 
1U71  1505  5696  0  0  0  0  0.0000  
3N0H  1509  5739  0  0  0  0  0.0000  
4HHB  4384  38796  0  0  0  0  0.0000  
2J1N  8142  29642  0  0  0  0  0.0000  
1K4C  16068  62851  15  33  16  64  0.0010  
2OAU  16647  123175  12  21  5  38  0.0003  
1AON  58674  262244  22  39  21  82  0.0003  
1.0  1GRM  260  1598  0  0  0  0  0.0000 
1U71  1505  10828  0  0  0  0  0.0000  
3N0H  1509  10965  0  0  0  0  0.0000  
4HHB  4384  65987  0  0  0  0  0.0000  
2J1N  8142  58205  0  0  0  0  0.0000  
1K4C  16068  118467  20  34  14  68  0.0006  
2OAU  16647  199101  10  22  10  42  0.0002  
1AON  58674  495683  10  26  21  57  0.0001 
The proposed algorithm requires computation of for each simplex, which in turn requires solving systems of linear equations. These computations require higher precision than is available on the GPU. So, the results may contain numerical errors. These numerical errors ultimately manifest as misclassification of a simplex as belonging to or not. We performed extensive experimentation and observed that the alpha complex computed is correct in several cases. In cases where the results are not correct, the number of false positives and negatives (extra or missing simplices) is extremely small as compared to the number of simplices in the alpha complex. We observed a worst case error rate of in our experiments, see Table 3. This error rate is tolerable for several applications. If exact computation is required, we could use a tolerance threshold to tag some simplices as requiring further checks, which can in turn be performed on the CPU using a multiprecision library. Our implementation can be easily extended to use such a hybrid strategy. We plan to implement this in future.
6 Conclusions
We proposed a novel parallel algorithm to compute the alpha complex for biomolecular data that does not require prior computation of the complete Delaunay triangulation. The novel characterization of simplices that belong to the alpha complex may be of independent interest. The algorithm was implemented using CUDA, which exploits the characteristics of the atom distribution in biomolecules to achieve speedups of upto compared to the the stateoftheart parallel algorithm for computing the weighted Delaunay triangulation and upto speedup over the stateoftheart implementation that is optimized for biomolecules. In future work, we plan to further improve the runtime efficiency of the parallel implementation and to resolve the numerical issues using real arithmetic.
References
 Voronoi diagrams and delaunay triangulations. World Scientific. External Links: ISBN 9789814447638 Cited by: §1.1, §2.
 Van der Waals volumes and radii. The Journal of Physical Chemistry 68 (3), pp. 441–451. Cited by: item 1.
 Computing Dirichlet tessellations. The Computer Journal 24 (2), pp. 162–166. Cited by: §1.1.
 A GPU accelerated algorithm for 3D Delaunay triangulation. In Proceedings of the 18th meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 47–54. Cited by: §1.1, §5.1.
 DeWall: a fast divide and conquer Delaunay triangulation algorithm in . ComputerAided Design 30 (5), pp. 333–341. Cited by: §1.1.
 [6] (2017) CUDA Zone. Note: https://developer.nvidia.com/cudazone[Online; accessed 19December2017] Cited by: §4.5.
 3D alpha shapes. In CGAL User and Reference Manual, External Links: Link Cited by: §1.1.
 CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic acids research 34 (2), pp. W116–W118. Cited by: §1.
 On the shape of a set of points in the plane. IEEE Transactions on Information Theory 29 (4), pp. 551–559. Cited by: §1.
 The geometry of biomolecular solvation. In Discrete and Computational Geometry, pp. 243–275. Cited by: §1.
 Threedimensional alpha shapes. ACM Transactions on Graphics 13 (1), pp. 43–72. Cited by: §1.1, §1.
 Computational topology: an introduction. Amer. Math. Soc.. Cited by: §2.
 Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Transactions on Graphics 9 (1), pp. 66–104. Cited by: §2.
 Voronoi diagrams and arrangements. Discrete & Computational Geometry 1 (1), pp. 25–44. Cited by: §1.1.
 Incremental topological flipping works for regular triangulations. Algorithmica 15 (3), pp. 223–241. Cited by: §1.1.
 Geometry and topology for mesh generation. Cambridge University Press, New York, NY, USA. External Links: ISBN 0521793092 Cited by: §2.
 Alpha shapes – a survey. In Tesellations in the Sciences: Virtues, Techniques and Applications of Geometric Tilings, R. van de Weygaert, G. Vegter, J. Ritzerveld, and V. Icke (Eds.), Cited by: §1.
 Randomized incremental construction of Delaunay and Voronoi diagrams. Algorithmica 7 (1), pp. 381–413. Cited by: §1.1.
 Spheres, molecules, and hidden surface removal. Computational Geometry 11 (2), pp. 83–102. Cited by: §4.1.
 Visual analysis of biomolecular cavities: state of the art. Computer Graphics Forum 35 (3), pp. 527–551. Cited by: §1.
 Analytical shape computation of macromolecules: i. molecular area and volume through alpha shape. Proteins Structure Function and Genetics 33 (1), pp. 1–17. Cited by: §1.
 Analytical shape computation of macromolecules: II. inaccessible cavities in proteins. Proteins Structure Function and Genetics 33 (1), pp. 18–29. Cited by: §1.
 Anatomy of protein pockets and cavities. Protein Science 7 (9), pp. 1884–1897. Cited by: §1.
 Geometric measures of large biomolecules: surface, volume, and pockets. Journal of Computational Chemistry 32 (14), pp. 3023–3038. Cited by: §1.1, §1, §5.
 An integrated geometric and topological approach to connecting cavities in biomolecules. In IEEE Pacific Visualization Symposium (PacificVis), pp. 104–111. Cited by: §1.
 ChExVis: a tool for molecular channel extraction and visualization. BMC Bioinformatics 16 (1), pp. 1–19. Cited by: §1.
 Delaunay triangulation in on the GPU. Ph.D. Thesis, National University of Singapore. Cited by: §1.1.
 Extraction of robust voids and pockets in proteins. In Visualization in Medicine and Life Sciences III: Towards Making an Impact, L. Linsen, H. Hege, and B. Hamann (Eds.), pp. 329–349. External Links: ISBN 9783319245232 Cited by: §1.
 [29] (2017) Thrust. Note: https://developer.nvidia.com/thrust[Online; accessed 19December2017] Cited by: §4.5.
 Computing the ndimensional Delaunay tessellation with application to Voronoi polytopes. The Computer Journal 24 (2), pp. 167–172. Cited by: §1.1.
Comments
There are no comments yet.