Parallel Computation of Alpha Complex for Biomolecules

08/16/2019 ∙ by Talha Bin Masood, et al. ∙ BITS Pilani indian institute of science 0

Alpha complex, a subset of the Delaunay triangulation, has been extensively used as the underlying representation for biomolecular structures. We propose a GPU based parallel algorithm for the computation of the alpha complex, which exploits the knowledge of typical spatial distribution and sizes of atoms in a biomolecule. Unlike existing methods, this algorithm does not require prior construction of the Delaunay triangulation. The algorithm computes the alpha complex in two stages. The first stage proceeds in a bottom up fashion and computes a superset of the edges, triangles, and tetrahedra belonging to the alpha complex. The false positives from this estimation stage are removed in a subsequent pruning stage to obtain the correct alpha complex. Computational experiments on several biomolecules demonstrate the superior performance of the algorithm, upto 50x when compared to existing methods that are optimized for biomolecules.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 25

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The alpha complex of a set of points in is a subset of the Delaunay triangulation. A size parameter determines the set of simplices (tetrahedra, triangles, edges, and vertices) of the Delaunay triangulation that are included in the alpha complex. It is an elegant representation of the shape of the set of points Edelsbrunner et al. (1983); Edelsbrunner and Mücke (1994); Edelsbrunner (2011) and has found various applications, particularly in molecular modeling and molecular graphics. The atoms in a biomolecule are represented by weighted points in and the region occupied by the molecule is represented by the union of balls centered at these points. The geometric shape of a biomolecule determines its function, namely how it interacts with other biomolecules. The alpha complex represents the geometric shape of the molecule very efficiently. It has been widely used for computing and studying geometric features such as cavities and channels Liang et al. (1998b, c); Dundas et al. (2006); Masood et al. (2015); Sridharamurthy et al. (2016); Masood and Natarajan (2016); Krone et al. (2016). Further, an alpha complex based representation is also crucial for accurate computation of geometric properties like volume and surface area Liang et al. (1998a); Edelsbrunner and Koehl (2005); Mach and Koehl (2011).

Advances in imaging technology has resulted in a significant increase in the size of molecular structure data. This necessitates the developments of efficient methods for storing, processing, and querying these structures. In this paper, we study the problem of efficient construction of the alpha complex with particular focus on point distributions that are typical of biomolecules. In particular, we present a parallel algorithm for computing the alpha complex and an efficient GPU implementation that outperforms existing methods. In contrast to existing algorithms, our algorithm does not require the explicit construction of the Delaunay triangulation.

1.1 Related work

The Delaunay triangulation has been studied within the field of computational geometry for several decades and numerous algorithms have been proposed for its construction Aurenhammer et al. (2013). Below, we describe only a few methods that are most relevant to this paper. A tetrahedron belongs to the Delaunay triangulation of a set of points in if and only if it satisfies the empty circumsphere property, namely no point is contained within the circumsphere of the tetrahedron. The Bowyer-Watson algorithm Bowyer (1981); Watson (1981) and the incremental insertion algorithm by Guibas and Stolfi Guibas et al. (1992) are based on the above characterization of the Delaunay triangulation. In both methods, points are inserted incrementally and the triangulation is locally updated to ensure that the Delaunay property is satisfied. The incremental insertion method followed by bi-stellar flipping works in higher dimensions also Edelsbrunner and Shah (1996) and can construct the Delaunay triangulation in time in the worst case, where is the number of input points in . A second approach to constructing the Delaunay triangulation is based on its equivalence to the convex hull of the points lifted onto a -dimensional paraboloid Edelsbrunner and Seidel (1986).

A third divide-and-conquer approach partitions the inputs points into two or multiple subsets, constructs the Delaunay triangulation for each partition, and merges the pieces of the triangulation finally. The merge procedure depends on the ability to order the edges incident on a vertex and hence works only in . Extension to requires that the merge procedure be executed first Cignoni et al. (1998). The divide-and-conquer strategy directly extends to a parallel algorithm Nanjappa (2012); Cao et al. (2014). The DeWall algorithm Cignoni et al. (1998) partitions the input point set into two halves and first constructs the triangulation of points lying within the boundary region of the two partitions. The Delaunay triangulation of the two holes is then constructed in parallel. The process is repeated recursively resulting in increased parallelism. Cao et al. Cao et al. (2014) have developed a GPU parallel algorithm, gDel3D, that constructs the Delaunay triangulation in two stages. In the first stage, points are inserted in parallel followed by flipping to obtain an approximate Delaunay triangulation. In the second stage, a star splaying procedure works locally to convert non-Delaunay tetrahedra into Delaunay tetrahedra. The algorithm can be extended to construct the weighted Delaunay triangulation for points with weights. Cao et al. report upto speed up over a sequential implementation for constructing the weighted Delaunay triangulation of 3 million weighted points.

All existing algorithms for constructing the alpha complex Edelsbrunner and Mücke (1994); Mach and Koehl (2011); Da et al. (2017) require that the Delaunay triangulation be computed first. Simplices that belong to the alpha complex are identified using a size filtration in a second step. In the case of biomolecules, the size of the alpha complex is a small fraction of the size of the Delauanay triangulation and hence the Delaunay triangulation construction is often the bottleneck in the alpha complex computation. The key difficulty lies in the absence of a direct characterization of simplices that belong to the alpha complex.

1.2 Summary of results

We propose an algorithm that avoids the expensive Delaunay triangulation computation and instead directly computes the alpha complex for biomolecules. Key contributions of this paper are summarized below:

  • A new characterization of the alpha complex – a set of conditions necessary and sufficient for a simplex to be a part of the alpha complex.

  • A new algorithm for computing the alpha complex of a set of weighted points in . The algorithm identifies simplices of the alpha complex in decreasing order of dimension without computing the complete weighted Delaunay triangulation.

  • An efficient CUDA based parallel implementation of this algorithm for biomolecular data that can compute the alpha complex for a 10 million point dataset in approximately 10 seconds.

  • A proof of correctness of the algorithm and comprehensive experimental validation to demonstrate that it outperforms existing methods.

While the experimental results presented here focus on biomolecular data, the algorithm is applicable to data from other application domains as well. In particular, the efficient GPU implementation may be used for points that arise in smoothed particle hydrodynamics (SPH) simulations, atomistic simulations in material science, and particle systems that appear in computational fluid dynamics (CFD).

2 Background

In this section, we review the necessary background on Delaunay triangulations required to describe the algorithm and also establish a new characterization of the alpha complex that does not require the Delaunay triangulation. For a detailed description of Delaunay triangulations, alpha complexes, and related structures, we refer the reader to various books on the topic Aurenhammer et al. (2013); Edelsbrunner (2001, 2010).

Figure 1: 2D weighted Delaunay triangulation and alpha complex. LABEL: A set of weighted points in shown as disks. LABEL: The weighted Voronoi diagram of . Voronoi edges and vertices are highlighted in green. LABEL: The weighted Delaunay complex is the dual of the weighted Voronoi diagram. LABEL: The alpha complex for is shown in red. This is the dual of the intersection of the weighted Voronoi diagram and union of balls. LABEL: The alpha complex shown for an . It is the dual of the intersection of the weighted Voronoi diagram and union of balls after growing them to have radius .

Let denote a set of balls or weighted points, where represents a ball centered at with radius . We limit our discussion to balls in , so . Further, we assume that the points in are in general position, i.e., no 2 points have the same location, no 3 points are collinear, no 4 points are coplanar, and no subset of 5 points are equidistant from a point in . Such configurations are called degeneracies. In practice, a degenerate input can be handled via symbolic perturbation Edelsbrunner and Mücke (1990).

2.1 Simplex and simplicial complex

A -dimensional simplex is defined as the convex hull of affinely independent points. Assuming the centres of balls in are in general position, all sized subsets of form a simplex . For simplicity, we sometimes use instead of the center to refer to points incident on a simplex. For example, we may write .

A non-empty strict subset of is also a simplex but with dimension smaller than . Such a simplex is called a face of . Specifically, a -dimensional face of is referred to as a facet of . A set of simplices is called a simplicial complex if: 1) a simplex implies that all faces of also belong to , and 2) for two simplices , , either or .

2.2 Power distance and weighted Voronoi diagram

The power distance between a point and a ball is defined as

The weighted Voronoi diagram is an extension of the Voronoi diagram to weighted points. It is a partition of based on proximity to input balls in terms of the power distance. Points that are closer to the ball compared to all other balls () constitute the Voronoi region of . Points equidistant from two balls , and closer to the two balls compared to other balls constitute a Voronoi face. Similarly, points equidistant from three balls and fours balls constitute Voronoi edges and Voronoi vertices of the weighted Voronoi diagram, respectively. Figure 1 shows the weighted Voronoi diagram for a set of 2D weighted points or disks on the plane. Similar to the unweighted case, the Voronoi regions of the weighted Voronoi diagram are convex and linear. However, the weights may lead to a configuration where the Voronoi region of is disjoint from . This occurs when is contained within another ball . Further, the Voronoi region of may even be empty.

2.3 Weighted Delaunay triangulation

The weighted Delaunay triangulation is the the dual of the weighted Voronoi diagram, see Figure 1. It is a simplicial complex consisting of simplices that are dual to the cells of the weighted Voronoi diagram. The following equivalent definition characterizes a simplex belonging to a Delaunay triangulation

Definition 1 (Weighted Delaunay Triangulation)

A simplex = belongs to the weighted Delaunay triangulation of if and only if there exists a point such that

DT1:

, and

DT2:

for .

Figure 2: and of a simplex. LABEL: A set of weighted points. Two edges (bold) belong to the Delaunay triangulation. LABEL: This of edge is equal to its . Points , , and are witnesses. Each one is equidistant from and and farther away from other disks in . The distance is proportional to the length of the tangent to the disk that represents the weighted point. The next closest disk from these points is . In this case, and coincide and hence . LABEL:  is also a Delaunay edge. The location of a neighboring disk could lead to a different configuration. The point is closest to and among all the points that are equidistant from both. However is closer to as compared to and . The closest point that satisfies DT1 and DT2 is farther away, hence of is greater than its .

A point that satisfies the above two conditions, DT1 and DT2, is called a witness for . We call a point that minimizes the distance and satisfies both conditions as the closest witness, denoted by . This minimum distance is called the size of the simplex . A point that minimizes the distance and satisfies DT1 is called the   of simplex . The distance is called the of the simplex . Clearly, the of a simplex is lower bounded by its . Figure 2 shows the two possible scenarios, namely when and .

2.4 Alpha complex

Given a parameter , we can construct a subset of the weighted Delaunay triangulation by filtering simplices whose is less than or equal to , see Figures 1 and 1. The resulting subset is a subcomplex of the Delaunay complex and is denoted :

The following equivalent definition characterizes simplices of the alpha complex without explicitly referring to the Delaunay triangulation.

Definition 2 (Alpha complex)

A simplex = belongs to the alpha complex of if and only if there exists a point such that the following three conditions are satisfied:

AC1:

,

AC2:

for , and

AC3:

or of is at most .

3 Algorithm

We now describe an algorithm to compute the alpha complex and prove its correctness. The algorithm utilizes the characterizing conditions introduced above. It first identifies the tetrahedra that belong to the alpha complex, followed by the set of triangles, edges and vertices. Figure 3 illustrates the algorithm as applied to disks on the plane.

3.1 Outline

The alpha complex consists of simplices of dimensions 0–3, , where is the set of -dimensional simplices in . We initialize and construct in five steps described below:

Step 1:

Compute the set of all simplices such that . Let this set be denoted by .

Step 2:

For all tetrahedra , check condition AC2 using . If satisfies AC2 then insert it into .

Step 3:

Insert all triangles that are incident on tetrahedra in into . Let , where denotes the set of facets of tetrahedra in . For all triangles , check condition AC2 using . If satisfies AC2 then insert it into .

Step 4:

Insert all edges incident on triangles in into . Let , where denotes the set of facets of triangles in . For all edges , check condition AC2 using . If satisfies AC2 then insert it into .

Step 5:

Insert all end points of edges in into . Let , where denotes the set of balls incident on edges in . For all balls , check condition AC2 using . If satisfies AC2 then insert it into .

Step 1 selects simplices that satisfy AC3. Step 2 recognizes tetrahedra that belong to the alpha complex by checking AC2 using . Triangle faces of these tetrahedra also belong to . The other “dangling” triangles belong to if they satisfy AC2. Step 4 identify edges similarly. First all edge faces of triangles in are inserted followed by those “dangling” edges that satisfy AC2. Vertices are identified similarly in Step 5.

Figure 3: Illustration of the proposed algorithm in 2D. LABEL: The set of disks grown by the parameter . LABEL: First, compute the set of edges whose  (red). The triangles that satisfy this condition are also computed but they are not shown here. LABEL: Next, identify the triangles that satisfy AC2 (red). LABEL: Collect edges in that are not incident on triangles in into . Check if these edges satisfy AC2 with . For example, the edge does not satisfy this condition because is closer to than and . LABEL: Only one edge survives the AC2 check and thus belongs to . LABEL: The alpha complex is obtained as the union of , and .

3.2 Proof of correctness

We now prove that that the algorithm described above correctly computes the alpha complex of the given set of weighted points by proving the following four claims. Each claim states that the set of simplices computed in Steps 2, 3, 4 and 5 are exactly the simplices belonging to the alpha complex.

Claim 1.

Step 2 computes correctly.

Proof.

We assume that the input is non-degenerate. So, for a tetrahedron , is the only point that satisfies condition AC1. In Step 2 of the proposed algorithm, we check if AC2 holds for . If yes, then is a witness for , i.e., . Further, since and , we have size thereby satisfying AC3. Therefore, belongs to because it satisfies all three conditions. ∎

Figure 4: The radical axis of a triangle is drawn such that is at the origin. A ball divides the radical axis into two half intervals. Points in the half interval are closer to as compared to , i.e. for all , . Consider the set of balls that are closer to as compared to So, does not contain . The intersection of these intervals, denoted by , is equal to one of the intervals . For example, here . The end point of the interval is the closest witness for the tetrahedron .

We now prove that the algorithm correctly identifies the triangles of the alpha complex.

Lemma 1.

A triangle belongs to if and only if it satisfies AC2 with .

Proof.

We first prove the backward implication, namely if satisfies AC2 with , then . Note that satisfies AC1 by definition. Further, it satisfies AC2 by assumption and hence . We also have because . So, thereby satisfying AC3. The triangle with satisfies all three conditions and hence belongs to .

We will now prove the forward implication via contradiction. Suppose there exists a triangle that belongs to but does not satisfy AC2 with . In other words, there exists a ball for which . Let denote the set of all such balls . The set of points that are equidistant from the three balls corresponding to form a line perpendicular to the plane containing called the radical axis. Each ball partitions the radical axis into two half-intervals based on whether the point on radical axis is closer to or to , see Figure 4. Let denote the half interval consisting of points that are closer to compared to . Let denote the intersection of all such half intervals . We have assumed that , so there must exist a closest witness and it has to lie within . Thus, is non-empty. In fact, for some and is exactly the end point of . This implies that is also a closest witness for the tetrahedron . So, belongs to and its size is equal to . However, this means that , a contradiction. So, the forward implication in the lemma is true. ∎

Claim 2.

Step 3 computes correctly.

Proof.

If a simplex belongs to then naturally all of its faces also belong to . The algorithm includes such triangles into and remove them from to obtain the set of free triangles that have and are not incident on any tetrahedron in . It follows directly from Lemma 1 that AC2 is a necessary and sufficient condition for a triangle in to belong to . Hence, Step 3 correctly computes the triangles belonging to . ∎

Figure 5: The radical plane of an edge is drawn such that is at the origin. A ball divides the radical plane into two half planes. The half plane consists of points that are closer to as compared to , i.e. for all , . Let denote the set of balls that are closer to as compared to . The half planes do not contain . The intersection of these half planes, denoted by , is a convex region (yellow). The power distance from to a point is minimized at a point on the boundary of the convex region . But the boundary of is a union of line segments that bound half planes . Here, the point at which the distance is minimum lies on the boundary of the half plane . This point is the closest witness for the triangle .

The above arguments need to be extended to prove that the edges of the alpha complex are also correctly identified.

Lemma 2.

An edge belongs to if and only if it satisfies the condition AC2 with .

Proof.

First, we assume that satisfies AC2 with . The point satisfies AC1 by definition. Further, it satisfies AC2 by assumption and hence . We also have because . So, thereby satisfying AC3. The edge with satisfies all three conditions and hence belongs to .

We will prove the forward implication via contradiction. Suppose there exists an edge that belongs to but does not satisfy AC2 with . In other words, there exists a ball such that . Let denote the set of all such balls . The set of points that are equidistant from the two balls corresponding to form a plane perpendicular to the line containing called the radical plane. Each ball partitions the radical plane into two half-planes based on whether the point on the radical plane is closer to or to , see Figure 5. Let denote the half-plane consisting of points that are closer to compared to . Let denote the intersection of all such half-planes . We have assumed that , so there must exist a closest witness and it has to lie within . Thus, is non-empty. In fact, lies on the envelope of because it minimizes the distance to . Let lie on the bounding line corresponding to for some . This implies that is also a closest witness for the triangle . So, belongs to and its size is equal to . However, this means that , a contradiction. So, the forward implication in the lemma is also true. ∎

Claim 3.

Step 4 computes correctly.

Proof.

All edge faces of triangles in naturally belong to . Step 4 inserts all edges incident on triangles in into as valid edges and removes them from to obtain the set of free edges . It follows directly from Lemma 2 that AC2 is a necessary and sufficient condition for an edge to belong to . Therefore, Step 4 correctly computes the edges belonging to . ∎

Claim 4.

Step 5 computes correctly.

Proof.

All vertices incident on naturally belong to . Step 5 inserts all such vertices in as valid vertices and removes them from to obtain the set of free vertices . Next, the vertices in for which the center of the ball satisfies AC2 are also inserted into . Clearly, these vertices also satisfy AC3 because they belong to . The condition AC1 is not relevant for -dimensional simplices. Therefore, these vertices clearly belong to the alpha complex. Similar to Lemmas 1 and 2, it is easy to prove that checking for AC2 for is necessary and sufficient condition to decide whether a vertex in belongs to the alpha complex or not. That is, it is possible to show that vertices in alpha complex that have non-empty Voronoi regions but do not satisfy AC2 for would be incident on some edge in , and therefore must have been already detected by Step 4 and hence can not belong to . Therefore, Step 5 correctly computes the vertices belonging to . ∎

4 Parallel Algorithm for Biomolecules

Although the algorithm as described above is provably correct, a straight forward implementation will be extremely inefficient with a worst case running time of , where is the number of weighted points in . This is because Step 1 requires time to generate all possible tetrahedra. In later steps, we need effort per simplex to check AC2. However, the input corresponds to atoms in a biomolecule. We show how certain properties of biomolecules can be leveraged to develop a fast parallel implementation.

4.1 Biomolecular data characteristics

Atoms in a biomolecule are well distributed. The following three properties of biomolecules are most relevant:

  • The radius of an atom is bounded and very small compared to the size of the biomolecule. The typical radius of an atom in a protein molecule ranges between 1Å to 2Å Bondi (1964).

  • There is a lower bound on the distance between the centres of two atoms. This is called the van der Waals contact distance, beyond which the two atoms start repelling each other. In the case of atoms in protein molecules, this distance is at least 1Å. This property together with the upper bound on atomic radii ensures that no atom is completely contained inside another. This means that the weighted Voronoi regions corresponding to the atoms in a biomolecule can be always be assumed to be non-empty.

  • Structural biologists are interested in small values of . The two crucial values are 0Å and 1.4Å. The former corresponds to using van der Waals radius and the latter corresponds to the radius of water molecule, which acts as the solvent.

In the light of the above three properties, we can say that the number of simplices of the alpha complex that are incident on a weighted point (atom) is independent of the total number of input atoms and is bounded by a constant Halperin and Overmars (1998).

4.2 Acceleration data structure

The algorithm will benefit from an efficient method for accessing points of that belong to a local neighborhood of a given weighted point. We store the weighted points in a grid-based data structure. Let denote the radius of the largest atom and assume that the value of the parameter is available as input. First, we construct a grid with cells of size and then bin the input atoms into the grid cells. In our implementation, we do not store the grid explicitly because it may contain several empty cells. Instead, we compute the cell index for each input atom and sort the list of atoms by cell index to ensure that atoms that belong to a particular cell are stored at consecutive locations. The cell index is determined based on a row major or column major order. Alternatively, a space-filling curves like the Hilbert curve could also be used to order the cells.

After the atoms are stored in grid cells, the alpha complex is computed in two stages. In the first stage, we employ a bottom up approach to obtain a conservative estimate of the edges, triangles, and tetrahedra belonging to the alpha complex. The false positives from the first stage are removed in a subsequent pruning stage resulting in the correct alpha complex.

4.3 Potential simplices

The first stage essentially corresponds to Step 1 of the algorithm described in the previous section. We compute the set of potential simplices for which . However, for efficiency reasons we process the simplices in the order of increasing dimension. First, we identify edges that satisfy the AC3 condition. Given the size of the grid cell, end points of edges that satisfy the condition either lie within the same grid cell or in adjacent cells. So, the grid data structure substantially reduces the time required to compute the list of potential edges . Beginning from this set of edges, we construct the set of all possible triangles and retain the triangles whose is no greater than , resulting in the set . Finally, we use the triangles in to construct the list of tetrahedra that satisfy the condition. The above procedure works because the of a simplex is always greater than or equal to the of its faces. The set of simplices identified in this stage contains all simplices of the alpha complex. False positives are pruned in the the second stage described below.

4.4 Pruning

The second stage corresponds to Steps 2-5 of the algorithm and processes the simplices in the decreasing order of dimension. This stage checks the characterizing condition AC2 to prune into . The tetrahedra are processed by checking if any of the input balls are closer to the than the balls incident on the tetrahedron. If yes, the tetrahedron is pruned away. Else, the tetrahedron is recognized as belonging to the alpha complex and inserted into . Triangles incident on these tetrahedra also belong to the alpha complex and are inserted into after they are removed from the list of potential triangles . Next, the triangles in are processed by checking if they satisfy AC2. If yes, they are inserted into . Else, they are pruned away. All edges incident on triangles belonging to are inserted into and removed from the set . Next, the edges in are processed by checking if they satisfy AC2. Edges that satisfy AC2 are inserted into and the others are pruned away. All the vertices in are directly inserted into without AC2 check because for biomolecular data we assume that Voronoi regions of all the atoms are non-empty. The check for condition AC2 for each simplex is again made efficient by the use of the grid data structure. Atoms that may violate AC2 lie within the same cell as that containing the or within the adjacent cell. Atoms that lie within other cells may be safely ignored.

4.5 CUDA implementation

We use the CUDA framework 6 and the thrust library 29 within CUDA to develop a parallel implementation of the algorithm that executes on the many cores of the GPU. The grid computation is implemented as a CUDA kernel where each atom is processed in parallel. The potential simplices computation and pruning stages are broken down into multiple CUDA kernels and parallelized differently in order to increase efficiency. We now describe the parallelization strategy in brief. For computing the set of potential edges , the initial enumeration of all possible edges is parallelized per atom where atom indices are used to ensure that no duplicate edges are generated. Subsequently, the condition is checked for the edges in parallel. Similarly, for computing potential triangles , the initial enumeration of all possible triangles is parallelized per atom, while the condition is checked within a separate kernel and parallelized per triangle. The same strategy is used for computing the set of potential tetrahedra . The pruning stage is parallelized per tetrahedron, triangle, and edge as required.

4.6 Handling large data sizes

Typical protein structures consist of upto 100,000 atoms. Our implementation can handle datasets of this size easily for reasonable values of . However, the size of datasets is ever increasing. Protein complexes that are available nowadays may consist of millions of atoms, necessitating smart management of GPU memory while handling such data sets.

We propose two strategies and implement one of them. The first strategy is to partition the grid by constructing an octree data structure and choosing an appropriate level in the octree to create partitions. Each partition together with its border cells can be processed independently of other partitions. So, we can copy one partition and its border to the GPU memory, compute its alpha complex, and copy the results back from GPU to CPU memory. After all the partitions are processed, the list of simplices can be concatenated followed by duplicate removal to generate the final alpha complex.

The second strategy is to partition the sorted list of atoms into equal sized chunks and to process each chunk independently. Here, we assume that the complete list of atoms together with the grid data structure fits in the GPU memory. This is a reasonable assumption considering that datasets containing several million atoms can easily fit on modern GPUs, which typically have at least 2GB video memory. Also, the main difficulty in handling large protein structures is managing the large lists of simplices generated within the intermediate steps of the algorithm, when compared to handling the input list of atoms or the output list of simplices. We compute the alpha complex by executing the algorithm in multiple passes. Each pass computes the alpha complex for a single chunk and copies it back to the CPU memory. We have implemented this second strategy and can handle data sizes of upto 16 million atoms on a GPU with 2GB of memory. Results are reported in the next section.

5 Experimental Results

We now present results of computational experiments that demonstrate that the parallel algorithm is fast in practice and significantly better than the state-of-the-art. We also performed runtime profiling to better understand the bottlenecks and effect of the parameter on the runtime. All experiments, unless stated otherwise, were performed on a Linux system with an nVidia GTX 660 Ti graphics card running CUDA 8.0 and a 2.0GHz Intel Xeon octa core processor with 16 GB of main memory. The default number of threads per block was set at for all the CUDA kernels.

Mach and Koehl describe two techniques for computing alpha complex of biomolecules called AlphaVol and UnionBall in their paper Mach and Koehl (2011). Both approaches construct the weighted Delaunay triangulation of input atoms first followed by a filtering step to obtain the alpha complex. UnionBall

is the state-of-the-art technique for alpha complex computation for biomolecules on multi-core CPU. It uses heuristics and optimizations specific to biomolecular data to improve upon

AlphaVol. For biomolecules containing 5 million atoms, AlphaVol takes approximately 8600 seconds for computing the alpha complex, while UnionBall takes approximately 150 seconds. Our method computes the alpha complex in less than 3 seconds for similar sized data, see Table 1.

5.1 Comparison with gReg3D

PDB id Atoms gReg3D Simplex Speed up
Simplices Time(ms) Simplices Time(ms)
0.0 1GRM 260 932 13 6295 117 14.8 9.0
1U71 1505 5696 13 40878 115 13.9 11.1
3N0H 1509 5739 14 41244 137 13.9 10.0
4HHB 4384 38796 29 150141 193 25.8 6.6
2J1N 8142 29642 18 227719 229 13.0 12.7
1K4C 16068 62851 27 446383 347 14.1 12.9
2OAU 16647 123175 56 466586 344 26.4 6.2
1AON 58674 262244 65 1650841 879 15.9 13.5
1X9P* 217920 924086 113 6142811 2555 15.0 22.6
1IHM* 677040 2713083 277
4CWU* 5905140 23450403 2709
3IYN* 5975700 24188892 2874
1.0 1GRM 260 1598 15 6295 117 25.4 7.9
1U71 1505 10828 17 40878 115 26.5 8.5
3N0H 1509 10965 30 41244 137 26.6 4.6
4HHB 4384 65987 86 150141 193 44.0 2.2
2J1N 8142 58205 30 227719 229 25.6 7.6
1K4C 16068 118467 52 446383 347 26.5 6.7
2OAU 16647 199101 159 466586 344 42.7 2.2
1AON 58674 495683 160 1650841 879 30.0 5.5
1X9P* 217920 1653778 196 6142811 2555 26.9 13.0
1IHM* 677040 5058507 605
4CWU* 5905140 44411353 5118
3IYN* 5975700 45790463 5501
Table 1: Runtime comparison of the proposed algorithm with gReg3D on an nVidia GTX 660 Ti graphics card. Timings are reported in milliseconds. Simplex refers to the size of the alpha complex as a percentage of the size of the weighted Delaunay triangulation. The last column shows the speedup in runtime of our algorithm over gReg3D. ‘*’ indicates the data was partitioned and processed in chunks. ‘–’ indicates that the code could not execute due to insufficient memory.

We are not aware of any algorithm that can compute the alpha complex directly without first constructing the complete Delaunay triangulation. In order to compare the performance, we chose the state-of-the-art parallel algorithm for computing the weighted Delaunay triangulation in 3D, gReg3D Cao et al. (2014). The CUDA implementation of gReg3D is available in the public domain. Table 1 compares the running times of our proposed algorithm with that of gReg3D for 12 different biomolecules at and . As evident from the table, we consistently observe significant speedup over gReg3D. The observed speedup is as high as 22 for the biomolecule 1X9P at , one of the largest molecules in our dataset. Clearly, the speedup goes down for when compared to because of the increased number of simplices in the output alpha complex. We also report the number of simplices in the alpha complex compared to the total number of simplices in the Delaunay triangulation under the column ‘Simplex’. This makes it clear why the speedup goes down as is increased from 0 to 1. For example, for the protein 1AON, the fraction of alpha complex simplices increases from to as is increased from 0 to 1. Correspondingly, the speedup decreases from 13.5 to 5.5.

We repeated the experiment on a MS Windows system with an nVidia GTX 980 Ti card running CUDA 8.0 and observed similar speedups. However, the individual runtimes both for our algorithm and for gReg3D were higher on the GTX 980 Ti. We believe that the reason for this increased runtime is that the MS Windows system was utilizing the GPU resources for its various GUI tasks whereas the Linux system did not require as many GPU cycles.

The starred entries in Table 1 are results for execution using the data partitioning approach. This is necessitated because these four large molecules generate large intermediate simplex lists that can not fit into the GPU memory if all the atoms in the molecule are processed at once. We observe that gReg3D is able to successfully compute the Delaunay complex for only one out of these four large molecules and runs out of GPU memory for the remaining three molecules.

5.2 Runtime profiling

PDB id Atoms Simplices Memory Grid Potential Simplices Pruning Total
Edges Tris Tets Tets Tris Edges time
0.0 1GRM 260 932 0.8 1.0 2.8 1.1 1.0 1.2 3.1 1.9 13.0
1U71 1505 5696 0.9 0.8 2.3 1.7 1.0 1.8 2.6 1.9 13.0
3N0H 1509 5739 0.7 0.9 2.3 1.4 2.5 1.8 2.5 1.5 13.7
4HHB 4384 38796 1.1 0.8 2.7 2.0 6.0 8.3 4.6 3.7 29.2
2J1N 8142 29642 1.1 1.2 3.7 1.6 1.3 2.0 3.9 3.2 18.1
1K4C 16068 62851 1.7 2.0 4.3 1.5 1.6 3.8 7.6 4.3 26.9
2OAU 16647 123175 2.1 1.3 4.7 4.3 5.8 21.9 9.5 6.0 55.5
1AON 58674 262244 4.6 2.8 11.1 5.6 4.1 16.7 10.9 9.4 65.2
1.0 1GRM 260 1598 1.2 1.4 2.7 1.2 2.0 1.8 2.7 1.8 14.7
1U71 1505 10828 0.9 0.8 2.3 1.5 3.4 2.5 3.2 2.5 17.1
3N0H 1509 10965 0.9 1.4 2.4 1.7 14.6 2.6 3.5 2.9 30.0
4HHB 4384 65987 1.5 1.9 3.4 5.4 23.7 32.8 12.4 4.8 86.0
2J1N 8142 58205 1.4 1.1 3.4 2.5 3.6 9.0 4.6 4.5 30.3
1K4C 16068 118467 2.1 1.8 6.1 3.0 3.4 19.5 8.3 7.9 52.2
2OAU 16647 199101 3.0 1.7 6.0 10.2 28.3 90.0 12.1 7.5 158.9
1AON 58674 495683 6.3 2.0 12.4 9.9 12.5 87.9 17.9 11.0 159.9
Table 2: Time spent within different steps of the algorithm. Timings are reported in milliseconds for memory transfer, grid computation, computing potential simplices, and pruning. The last column shows the total time taken for all steps.
(a)
(b)
Figure 6: Time spent for different steps of the algorithm.
(a)
(b)
Figure 7: Proportion of time spent for different steps during the execution of the algorithm. The pruning stage (AC2 tetrahedra, triangles and edges checks) takes significantly more effort compared to other steps. Also, the time spent for this step increases with .
Figure 8: Split up of time spent at different steps. Average proportion of effort spent at different steps of computation was obtained after executing the algorithm for all the molecules in our dataset at various values of varying from to . As evident from the error bars, there is significant variability. But, in general the pruning stage of the algorithm, specially the tetrahedra computation step takes the maximum time.

The two stages of our parallel algorithm (potential simplices and pruning) are further divided into 3 steps each, corresponding to the computation of edges, triangles, and tetrahedra respectively. A grid computation step precedes the two stages. We study the computation effort for each of these seven steps of the algorithm. We also report the time spent in memory transfers from CPU to GPU and vice-versa. Thus, we report the split up of the total runtime into eight categories namely, ‘Memory transfer’, ‘Grid computation’, ‘Potential edges’, ‘Potential triangles’, ‘Potential tetrahedra’, ‘AC2 tetrahedra’, ‘AC2 triangles’ and ‘AC2 edges’.

Table 2 summarizes the observed split up of runtime for 8 different biomolecules. Figure 6 shows the actual time spent for different steps and Figure 7 shows relative time spent for each step. From these figures, it is clear that the pruning stage consumes the maximum amount of time. The pruning stage involves checking the neighboring balls for violations of the AC2 condition for each simplex. Specially, the tetrahedra pruning step (red) takes approximately of the total time required for alpha complex computation.

We performed additional experiments to determine the average split up over multiple runs. We computed the relative time spent for each step for different values of between 0.0 and 2.0. These observations are reported in Figure 8. It is clear that the memory transfers and grid computation combined do not take more than of the total time. The potential simplices estimation stage consumes of the time. However, the pruning stage is most expensive, taking up roughly of the computation time. Pruning tetrahedra step takes up of the time on average. This suggests that this step should be the focus of the optimization efforts in future. It should be noted that proportion of time spent for each step depends on the distribution of atoms in the biomolecule as well as the value of . This explains the significant deviation from the averages as shown by the error bars.

5.3 Effect of the value of

Figure 9: Running time for varying values of for 1K4C. The number of simplices in the output alpha complex is also shown (black line). The number of simplices increases almost linearly with as expected from the distribution of atoms in typical biomolecules. The running time also increases almost linearly with for this molecule. Also, the fraction of time spent for tetrahedra computation step (red) increases with .
Figure 10: Running time for varying values of for 1AON. The number of simplices in the output alpha complex is also shown (black line). The running time increases almost linearly with for this molecule.

We also performed experiments to observe the runtime performance as the value of is varied between 0.0 and 2.0. Figures 9 and 10 show the results for the proteins 1K4C and 1AON, respectively. We also show how the number of simplices in the computed alpha complex increases as the value of is increased. The runtime and total number of simplices follow a near-linear trend. However, increase in time required for pruning, especially for pruning the tetrahedra, is greater than time required for other steps of the algorithm. Note that although both graphs appear linear, this is not guaranteed behavior for other input. The scaling behavior depends on the distribution of the atoms in the molecule and on the range of values for which the experiment is conducted.

5.4 Numerical issues

PDB id Atoms Simplices Misclassified Simplices Error rate
Edges Triangles Tetrahedra Total
0.0 1GRM 260 932 0 0 0 0 0.0000
1U71 1505 5696 0 0 0 0 0.0000
3N0H 1509 5739 0 0 0 0 0.0000
4HHB 4384 38796 0 0 0 0 0.0000
2J1N 8142 29642 0 0 0 0 0.0000
1K4C 16068 62851 15 33 16 64 0.0010
2OAU 16647 123175 12 21 5 38 0.0003
1AON 58674 262244 22 39 21 82 0.0003
1.0 1GRM 260 1598 0 0 0 0 0.0000
1U71 1505 10828 0 0 0 0 0.0000
3N0H 1509 10965 0 0 0 0 0.0000
4HHB 4384 65987 0 0 0 0 0.0000
2J1N 8142 58205 0 0 0 0 0.0000
1K4C 16068 118467 20 34 14 68 0.0006
2OAU 16647 199101 10 22 10 42 0.0002
1AON 58674 495683 10 26 21 57 0.0001
Table 3: Incorrectly identified simplices of the alpha complex.

The proposed algorithm requires computation of for each simplex, which in turn requires solving systems of linear equations. These computations require higher precision than is available on the GPU. So, the results may contain numerical errors. These numerical errors ultimately manifest as misclassification of a simplex as belonging to or not. We performed extensive experimentation and observed that the alpha complex computed is correct in several cases. In cases where the results are not correct, the number of false positives and negatives (extra or missing simplices) is extremely small as compared to the number of simplices in the alpha complex. We observed a worst case error rate of in our experiments, see Table 3. This error rate is tolerable for several applications. If exact computation is required, we could use a tolerance threshold to tag some simplices as requiring further checks, which can in turn be performed on the CPU using a multi-precision library. Our implementation can be easily extended to use such a hybrid strategy. We plan to implement this in future.

6 Conclusions

We proposed a novel parallel algorithm to compute the alpha complex for biomolecular data that does not require prior computation of the complete Delaunay triangulation. The novel characterization of simplices that belong to the alpha complex may be of independent interest. The algorithm was implemented using CUDA, which exploits the characteristics of the atom distribution in biomolecules to achieve speedups of upto compared to the the state-of-the-art parallel algorithm for computing the weighted Delaunay triangulation and upto speedup over the state-of-the-art implementation that is optimized for biomolecules. In future work, we plan to further improve the runtime efficiency of the parallel implementation and to resolve the numerical issues using real arithmetic.

References

  • F. Aurenhammer, R. Klein, and D. Lee (2013) Voronoi diagrams and delaunay triangulations. World Scientific. External Links: ISBN 978-981-4447-63-8 Cited by: §1.1, §2.
  • A. Bondi (1964) Van der Waals volumes and radii. The Journal of Physical Chemistry 68 (3), pp. 441–451. Cited by: item 1.
  • A. Bowyer (1981) Computing Dirichlet tessellations. The Computer Journal 24 (2), pp. 162–166. Cited by: §1.1.
  • T. Cao, A. Nanjappa, M. Gao, and T. Tan (2014) A GPU accelerated algorithm for 3D Delaunay triangulation. In Proceedings of the 18th meeting of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, pp. 47–54. Cited by: §1.1, §5.1.
  • P. Cignoni, C. Montani, and R. Scopigno (1998) DeWall: a fast divide and conquer Delaunay triangulation algorithm in . Computer-Aided Design 30 (5), pp. 333–341. Cited by: §1.1.
  • [6] (2017) CUDA Zone. Note: https://developer.nvidia.com/cuda-zone[Online; accessed 19-December-2017] Cited by: §4.5.
  • T. K. F. Da, S. Loriot, and M. Yvinec (2017) 3D alpha shapes. In CGAL User and Reference Manual, External Links: Link Cited by: §1.1.
  • J. Dundas, Z. Ouyang, J. Tseng, A. Binkowski, Y. Turpaz, and J. Liang (2006) CASTp: computed atlas of surface topography of proteins with structural and topographical mapping of functionally annotated residues. Nucleic acids research 34 (2), pp. W116–W118. Cited by: §1.
  • H. Edelsbrunner, D. Kirkpatrick, and R. Seidel (1983) On the shape of a set of points in the plane. IEEE Transactions on Information Theory 29 (4), pp. 551–559. Cited by: §1.
  • H. Edelsbrunner and P. Koehl (2005) The geometry of biomolecular solvation. In Discrete and Computational Geometry, pp. 243–275. Cited by: §1.
  • H. Edelsbrunner and E.P. Mücke (1994) Three-dimensional alpha shapes. ACM Transactions on Graphics 13 (1), pp. 43–72. Cited by: §1.1, §1.
  • H. Edelsbrunner (2010) Computational topology: an introduction. Amer. Math. Soc.. Cited by: §2.
  • H. Edelsbrunner and E. P. Mücke (1990) Simulation of simplicity: a technique to cope with degenerate cases in geometric algorithms. ACM Transactions on Graphics 9 (1), pp. 66–104. Cited by: §2.
  • H. Edelsbrunner and R. Seidel (1986) Voronoi diagrams and arrangements. Discrete & Computational Geometry 1 (1), pp. 25–44. Cited by: §1.1.
  • H. Edelsbrunner and N. R. Shah (1996) Incremental topological flipping works for regular triangulations. Algorithmica 15 (3), pp. 223–241. Cited by: §1.1.
  • H. Edelsbrunner (2001) Geometry and topology for mesh generation. Cambridge University Press, New York, NY, USA. External Links: ISBN 0-521-79309-2 Cited by: §2.
  • H. Edelsbrunner (2011) Alpha shapes – a survey. In Tesellations in the Sciences: Virtues, Techniques and Applications of Geometric Tilings, R. van de Weygaert, G. Vegter, J. Ritzerveld, and V. Icke (Eds.), Cited by: §1.
  • L. J. Guibas, D. E. Knuth, and M. Sharir (1992) Randomized incremental construction of Delaunay and Voronoi diagrams. Algorithmica 7 (1), pp. 381–413. Cited by: §1.1.
  • D. Halperin and M. H. Overmars (1998) Spheres, molecules, and hidden surface removal. Computational Geometry 11 (2), pp. 83–102. Cited by: §4.1.
  • M. Krone, B. Kozlikova, N. Lindow, M. Baaden, D. Baum, J. Parulek, H.-C. Hege, and I. Viola (2016) Visual analysis of biomolecular cavities: state of the art. Computer Graphics Forum 35 (3), pp. 527–551. Cited by: §1.
  • J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, and S. Subramaniam (1998a) Analytical shape computation of macromolecules: i. molecular area and volume through alpha shape. Proteins Structure Function and Genetics 33 (1), pp. 1–17. Cited by: §1.
  • J. Liang, H. Edelsbrunner, P. Fu, P.V. Sudhakar, and S. Subramaniam (1998b) Analytical shape computation of macromolecules: II. inaccessible cavities in proteins. Proteins Structure Function and Genetics 33 (1), pp. 18–29. Cited by: §1.
  • J. Liang, H. Edelsbrunner, and C. Woodward (1998c) Anatomy of protein pockets and cavities. Protein Science 7 (9), pp. 1884–1897. Cited by: §1.
  • P. Mach and P. Koehl (2011) Geometric measures of large biomolecules: surface, volume, and pockets. Journal of Computational Chemistry 32 (14), pp. 3023–3038. Cited by: §1.1, §1, §5.
  • T. B. Masood and V. Natarajan (2016) An integrated geometric and topological approach to connecting cavities in biomolecules. In IEEE Pacific Visualization Symposium (PacificVis), pp. 104–111. Cited by: §1.
  • T. B. Masood, S. Sandhya, N. Chandra, and V. Natarajan (2015) ChExVis: a tool for molecular channel extraction and visualization. BMC Bioinformatics 16 (1), pp. 1–19. Cited by: §1.
  • A. Nanjappa (2012) Delaunay triangulation in on the GPU. Ph.D. Thesis, National University of Singapore. Cited by: §1.1.
  • R. Sridharamurthy, T. B. Masood, H. Doraiswamy, S. Patel, R. Varadarajan, and V. Natarajan (2016) Extraction of robust voids and pockets in proteins. In Visualization in Medicine and Life Sciences III: Towards Making an Impact, L. Linsen, H. Hege, and B. Hamann (Eds.), pp. 329–349. External Links: ISBN 978-3-319-24523-2 Cited by: §1.
  • [29] (2017) Thrust. Note: https://developer.nvidia.com/thrust[Online; accessed 19-December-2017] Cited by: §4.5.
  • D. F. Watson (1981) Computing the n-dimensional Delaunay tessellation with application to Voronoi polytopes. The Computer Journal 24 (2), pp. 167–172. Cited by: §1.1.