Many applications in science and engineering require computing “features” in a shape that is finitely represented by a simplicial complex. These features sometimes include topological features such as “holes” and “tunnels” present in the shape. A concise definition of these otherwise vague notions can be obtained by considering homology groups and their representative cycles. In particular, a one-dimensional homology basis, that is, a set of independent cycles in the -skeleton of the input simplicial complex whose homology classes form a basis for the first homology group, can be taken as a representative of the “holes” and “tunnels” present in the shape. However, instead of any basis, one would like to have a homology basis whose representative cycles are small under some suitable metric, thus bringing the ‘geometry’ into picture along with topology.
When the input complex is a graph with vertices and edges, the homology basis coincides with what is called the cycle basis and its minimality is measured with respect to the total weights of the cycles assuming non-negative weights on the edges. A number of efficient algorithms have been designed to compute such a minimal cycle basis for a weighted graph [1, 6, 7, 5, 8]. The best known algorithm for this case runs in .
When the input is a simplicial complex, one dimensional homology basis is determined by the simplices of dimension up to 2. Thus, without loss of generality, we can assume that the complex has dimension at most , that is, it consists of vertices, edges, and triangles. The -skeleton of the complex is a graph (weighted if the edges are). Therefore, one can consider a minimal cycle basis in the -skeleton. However, the presence of triangles makes some of these basis elements to be trivial in the homology basis. Therefore, the computation of the minimal homology basis in a simplicial complex differs from the minimal cycle basis in a graph. In this paper, we show that the efficient algorithms of  for computing a minimal cycle basis can be adapted to computing a minimal homology basis in a simplicial complex (by combining with an algorithm  to compute the so-called annotations). In the process we improve the current best time complexity bound for computing a minimal homology basis and also extend these results to more generalized measures.
More specifically, for the special case of a combinatorial -manifold with weights on the edges, Erickson and Whittlesey  gave an -time algorithm to compute a minimal homology basis where is the total number of simplices and is the rank of the first homology group. Dey et al.  and Chen and Friedman  generalized the results above to arbitrary simplicial complexes. Busaryev et al.  improved the running time of this generalization from  to where  is the matrix multiplication exponent. This gives the best known worst-case time algorithm when . In Section 3, combining the divide and conquer approach of  with the use of annotations , we develop an improved algorithm to compute a minimal 1-dimensional homology basis for an arbitrary simplicial complex in only time. Considering , this gives the first worst-case time algorithm for the problem.
We can further improve the time complexity if we allow for approximations. An algorithm to compute a 2-approximate minimal homology basis are given in Section 4 running in expected time.
All of the above algorithms operate by computing a set of candidate cycles that necessarily includes at least one minimal homology basis and then selecting one of these minimal bases. The standard proof  of the fact that the candidate set includes a minimal basis uses the specific distance function based on the shortest path metric and a size function that assigns total weight of the edges in a cycle as its size. In Section 5, we identify general conditions for the distance and size function so that the divide and conquer algorithm still works without degrading in time complexity. This allows us to consider distance function beyond the shortest path metric and the size function beyond the total weight of edges as we illustrate with two examples. Specifically, we can now compute a minimal homology basis whose size is induced by a general map for any metric space .
2 Background and notations
In this paper, we are interested in computing a minimal basis for the 1-dimensional homology group of a simplicial complex over the field . In this section we briefly introduce some relevant concepts here; the details appear in standard books on algebraic topology such as .
Let be a connected simplicial complex. A -chain is a formal sum, where the s are the -simplices of and the s are the coefficients with . We use to denote the group of -chains which is formed by the set of -chains together with the addition. Note that there is a one-to-one correspondence between the chain group and the family of subsets of where is the set of all -simplices. Thus is isomorphic to the space where is the number of -simplices in . Naturally all -simplices in form a basis of in which the
-th bit of the coordinate vector of a-chain indicates whether the corresponding -simplex appears in the chain.
The boundary of a -simplex is the sum of all its -faces. This can be interpreted and extended to a -chain as a boundary map , where the boundary of a chain is defined as the sum of the boundaries of its simplices. A -cycle is a -chain with empty boundary, . Since commutes with addition, we have the group of -cycles, , which is the kernel of , . A -boundary is a -chain that is the boundary of a -chain, for some . The group of -boundaries is the image of , that is, . Notice that is a subgroup of . Hence we can consider the quotient which constitutes the -dimensional homology group denoted as . Each element in , called a homology class, is an equivalence class of -cycles whose difference is always in . Two cycles are said to be if they are in the same homology class.
Under coefficients, the groups , , and are all vector spaces. A basis of a vector space is a set of vectors of minimal cardinality that generates the entire vector space. We are concerned with the homology bases of and particularly in (more formally below). We use to denote the dimension of vector space and use to denote the -st Betti number of , which is the dimension of vector space .
A set of cycles , with , that generates the cycle space is called its cycle basis.
For any -cycle , let denote its homology class. A set of homology classes that constitutes a basis of is called a homology basis. For simplicity, we also say a set of cycles is a homology basis if their corresponding homology classes form a basis for .
Let be a size function that assigns a non-negative weight to each cycle . A cycle or homology basis is called minimal if is minimal among all bases of () or () respectively.
To compute a minimal homology basis of a simplicial complex , it is necessary to have a way to represent and distinguish homology classes of cycles. Annotated simplices have been used for this purpose in earlier works: For example, Erickson and Wittlesey  and Borradaile et al.  used them for computing optimal homology cycles in surface embedded graphs. Here we use a version termed as annotation from  which gives an algorithm to compute them in matrix multiplication time for general simplicial complexes. An annotation for a -simplex is a -bit binary vector, where . The annotation of a cycle , which is the sum of annotations of all simplices in , provides the coordinate vector of the homology class of in a pre-determined homology basis. More formally,
Definition 1 (Annotation)
Let be a simplicial complex and be the set of -simplices in . An annotation for -simplices is a function with the following property: any two -cycles and are homologous if and only if
Given an annotation , the annotation of any -cycle is defined by .
Proposition 2.1 ()
There is an algorithm that annotates the -simplices in a simplicial complex with simplices in time.
3 Minimal homology basis
In this section, we describe an efficient algorithm to compute a minimal homology basis of the 1-dimensional homology group . The algorithm uses the divide and conquer technique from  where they compute a minimal cycle basis in a weighted graph. The authors in  adapted it for computing optimal homology basis in surface embedded graphs. We adapt it here to simplicial complexes using edge annotations .
More specifically, let be a simplicial complex with simplices – Since we are only interested in 1-dimensional homology basis, it is sufficient to consider all simplices with dimension up to 2, namely vertices, edges, and triangles. Hence we assume that contains only simplices of dimension at most 2.
Assume that the edges in are weighted with non-negative weights.
Given any homology basis where , we define the size of a cycle as the total weights of its edges. As defined in Section 2, the problem of computing a minimal homology basis of is now to find a basis such that the sum of is the smallest.
The high-level algorithm to compute such a minimal homology basis of group proceeds as follows. First, we need to annotate all 1-simplices implemented by the algorithm of . Then we compute a candidate set of cycles which includes a minimal homology basis. At last, we extract such a minimal homology basis from the candidate set.
We now describe the step to compute a candidate set of cycles that contains a minimal homology basis. We use the shortest path tree approach which dates back to Horton’s algorithm for a minimal cycle basis of a graph . It was also applied in other earlier works, e.g. [9, 2]. We first generate a candidate set for every vertex , where is the set of vertices of . Then we take the union of all and denote as , i.e. . To compute , first we construct a shortest path tree rooted at . Let denote the unique path connecting two vertices and in . Then each nontree edge generates a cycle . The union of all such cycles constitutes the candidate set of the vertex , i.e. where is the set of tree edges in . Note that the number of cycles in is for each vertex . Hence there are candidate cycles in in total. They, together with their sizes, can be computed in time.
3.1 Computing a minimal homology basis
What remains is to compute a minimal homology basis from the candidate set . To achieve it, we modify the divide and conquer approach from  which improved the algorithm of  for computing a minimal cycle basis of a graph with non-negative weights.
This approach uses an auxiliary set of support vectors  that helps select a minimal homology basis from a larger set containing at least one minimal basis; in our case, this larger set is .
A support vector is a vector in the space of -dimensional binary vectors . The use of support vectors along with annotations requires us to perform more operations without increasing the complexity of the divide and conquer approach. Let denote the annotation of a cycle . First, we define the function:
We say a cycle is orthogonal to a support vector if and is non-orthogonal if . We would choose cycles , , iteratively from a set guaranteed to contain a minimal homology basis and add them to the minimal homology basis. During the procedure, the algorithm always maintains a set of support vectors with the following properties:
form a basis of .
If have already been computed, , .
Suppose that in addition to properties (1) and (2), we have the following additional condition to choose s, then the set constitutes a minimal homology basis.
If have already been computed, is chosen so that is the shortest cycle with .
If we keep the same support vectors, after we select a new cycle , may not hold which means the property (2) may not hold. Therefore, we update the support vectors after computing so that the orthogonality condition (2) holds. If chosen with condition (3), the cycle becomes independent of the cycles previously chosen as stated below:
For any , if property (1) and (2) hold, then for any cycle with , is independent of .
By property (2), . If is not independent of , then the annotation of the cycle can be written as , where and at least one . Since , we have . It follows that there exists at least one , , with , which contradicts with property (2). Therefore, is independent of . ∎
The set computed by maintaining properties (1), (2) and (3) is a minimal homology basis.
Taking advantage of the above theorem, we aim to compute a homology basis iteratively while maintaining conditions (1), (2), and (3).
3.1.1 Maintaining support vectors and computing shortest cycles.
Now we describe the algorithm CycleBasis() (given in Algorithm 1) that computes a minimal homology basis. In this algorithm, we first initialize each support vector so that only the -th bit is set to 1. Then the main computation is done by calling the procedure ExtendBasis().
Here the procedure ExtendBasis(, ) (Algorithm 2) is recursive which extends the current partial basis by adding new cycles. It modifies a divide and conquer approach of  to maintain properties (1), (2), and (3). It calls a routine Update to maintain orthogonality using annotations. For choosing the shortest cycle satisfying condition (3), it calls ShortestCycle() in the base case ()(See line 3 of Algorithm 2). We describe the recursion and the base case below.
At the high level, the procedure ExtendBasis(, ) recurses on by first calling itself to obtain the next cycles in the minimal homology basis in which the support vectors are updated. Then it calls the procedure Update(, ) to maintain the orthogonality property (2). It uses the already updated support vectors to update so that . At last the procedure ExtendBasis(, ) calls itself ExtendBasis(, ) to extend the basis by elements.
We describe Update(, ) and spare giving its pseudocode. Let denote the desired output vectors after the update. To ensure the property (1) and (2), we will enforce that the vector is of the form where . We just need to determine the coefficients so that where and . We will also compute for and every edge where is defined as the standard inner product of and under , which is important later when we compute the shortest cycle orthogonal to a support vector in the procedure ShortestCycle().
where recall that -bit vector is the annotation of a cycle . Let denote a matrix where row contains the bit . It is not difficult to see that , and that is invertible, which means that since the computations are under .
The next step is to update the value to for every edge in , and . Note that the coefficients are now known and the updated vectors are , . Thus for every edge , , . Let be the number of edges in and be the matrix where its entry is . Set where is the identity matrix. Let be a matrix whose entry is . Thus we have . Since the matrix and matrix are already known, the matrix can be computed in time by chopping to number of submatrices and performing matrix multiplications of two size matrices. After that, can be easily retrieved from the matrix in constant time.
Base case for selecting a shortest cycle.
We now implement the procedure ShortestCycle() for the base case to compute the shortest cycle non-orthogonal to , i.e. the shortest cycle satisfying . We assign a label to each vertex and . Labeling has been used to solve many graph related problems previously [2, 3, 8].
Given a vertex and the shortest path tree rooted at , let for any vertex denote the unique tree path in from to , and let denote the value . Let denote the parent of in tree and denote the edge between and . Then . Thus for a fixed , we can traverse the tree from the root to the leaves and compute the label for all vertices in time as for every edge is already precomputed earlier in the procedure Update and can be queried in O(1) time. Thus the total time to compute labels for all is .
Now given a fixed vertex and the shortest path tree , we consider every cycle , where is a non-tree edge. We partition the cycle into three parts: the tree path , the tree path and the edge . Thus , which can be computed in time as all labels are precomputed. Note that there are cycles in the candidate set to be computed. It results that in total time, one can compute for all cycles and find the smallest one.
3.2 Correctness and time complexity
To prove the correctness of Algorithm 1, it is crucial to guarantee that the support vectors s and the cycles s satisfy the desirable properties. First, the set of support vectors is a basis of because of the construction of s in the procedure Update. The property that , holds, because the procedure Update ensures that is taken as a non-trivial solution to a set of linear equations , , which always admits at least one solution. Similarly, for any , there exists at least one cycle such that the equation holds since both and at this point only form partial basis of a space with dimension . In the base case, ShortestCycle computes this cycle satisfying exactly this property. Then, Theorem 3.1 ensures the correctness of the algorithm.
The total running time of our algorithm is and the analysis is as follows. The time to annotate edges and construct the candidate set is from Proposition 1 and 1. When computing the basis, the time of the procedure CycleBasis is dominated by the time of ExtendBasis. For each , the time complexity of ExtendBasis(,) is bounded by the following recurrence:
Note that in the recursion, only the second parameter counts for the time complexity. Actually for each , the time complexity of ShortestCycle() in the base case is only as we argued earlier, that is, . Then the recurrence solves to . It follows that . Combined with the time for computing annotations and constructing the candidate set, the time complexity is .
4 An approximate minimal homology basis of
In this section, we present an algorithm to compute an approximate minimal 1-dimensional homology basis, where the approximation is defined as follows.
Definition 2 (Approximate minimal homology basis)
Suppose is a minimal homology basis for , and let denote the sequence of sizes of cycles in sorted in non-decreasing order. A set of cycles is a -approximate minimal homology basis for if (i) form a basis for ; and (ii) let denote the sequence of sizes of cycles in in non-decreasing order, then for any , .
In what follows, we provide a 2-approximation algorithm running in time. At the high level, we first compute a set of candidate set of cycles that guarantees to contain a 2-approximate minimal homology basis. We then extract a 2-approximate basis from the candidate set .
First, we explain the construction of a candidate set of cycles. Recall that in Section 3.1, we compute candidate cycles, each of which has the form , formed by together with the two tree-paths from root to each of the endpoint of within the shortest path tree . We now apply the algorithm by Kavitha et al.  which can compute a smaller candidate set of cycles which is guaranteed to contain a 2-approximate minimal cycle basis (not homology basis) for graph (i.e, 1-skeleton of the complex ) in expected time. Here, a cycle basis of the graph where is simply a set of cycles such that any other cycle from can be represented uniquely as a linear combination of cycles in . A minimal cycle basis is a cycle basis whose total weight is smallest among all cycle basis. A cycle basis is a -approximate minimal cycle basis if its total weight is at most times that of the minimal cycle basis, i.e, at most .
Now let the size of a cycle be the total weight of all edges in . Then, it turns out that, not only contains a -approximate minimal cycle basis w.r.t. this size, it also satisfies the following stronger property as proven in .
Proposition 2 ([12, Lemma 6.3])
There exists a minimal cycle basis such that, for any , there is a subset of the computed candidate set so that (i) and (ii) each cycle in has size at most .
Next, we prove that a candidate set satisfying conditions in Proposition 2 is guaranteed to also contain a 2-approximate minimal homology basis. We remark that if Proposition 2 does not hold, then the sole condition that contains a -approximate minimal cycle basis is not sufficient to guarantee that it also contains a -approximate minimal homology basis for any constant . A counter-example is given at the end of this section.
Given a set of cycles satisfying Proposition 2, there exists a minimal homology basis such that contains cycles with (i). form a homology basis, and (ii) , for .
Let be a minimal homology basis which satisfies Proposition 2. It is known that it contains a minimal homology basis, which we set as .
Now by Proposition 2, for each , there exists a subset such that and , .
Assume w.l.o.g. that cycles in are in non-decreasing order of their sizes.
We now prove the lemma inductively. In particular,
Claim-A: For any , we show that there exists such that for each , (Cond-1) ; and (Cond-2) are independent.
The base case is straightforward: We can simply take as any cycle from that is not null-homologous (which must exist as is not null-homologous).
Now suppose the claim holds for . Consider the case for . By induction hypothesis, there exists such that (Cond-1) and (Cond-2) hold. Now consider cycles in . Let denote the subgroup of generated by the homology classes of all cycles in . Note that spans , then the rank of is at least , which means there always exists a cycle such that is independent of . By definition of , there is an index such that which satisfies both (Cond-1) and (Cond-2). Thus Claim-A holds for as well.
The lemma then follows when . ∎
So far we have proved that the new candidate set always contains a 2-approximate minimal homology basis. What remains is to describe how to compute such an approximate basis from the candidate set . First, as in Algorithm LABEL:alg, we compute the annotation of all edges in time. Let denote the annotation of an edge in the complex ; recall that is a -bit vector with . Also recall that given a cycle , its annotation represents the homology class of this cycle, and two cycles are homologous if and only if they have the same annotation vectors.
Now order the cycles in , where , in non-decreasing order of their sizes. We will compute the annotation of all cycles in and put them in the matrix , whose -th column represents the annotation vector for the cycle . Since contains a homology basis of ( Lemma 1), .
First, we explain how to compute annotation matrix efficiently. Let denote all edges from . Let denote the matrix where ; that is, non-zero entries of the -th column encode all edges in the cycle . Let denote the matrix where the -th column encodes the annotation of edge . It is easy to see that . Instead of computing the multiplication directly, we partition the matrix top-down into submatrices each of size at most . For each of this submatrix, its multiplication with can be done in matrix multiplication time. Thus the total time to compute the multiplication takes time as . In other words, we can compute the annotation matrix in as .
We now compute a 2-approximate minimal homology basis from . Here we use so-called earliest basis. Specifically, in general, given a matrix with rank , the set of column vectors is called an earliest basis for the vector space spanned by all columns in (or simply, for ), if the column indices are the lexicographically smallest index set such that the corresponding columns of have full rank.
Proposition 3 ()
Let be an matrix of rank with entries over where , then there is an time algorithm to compute the earliest basis of .
Let be the indices of columns in the earliest basis of . This can be done in time by the above proposition as . The cycles corresponding to these columns form a homology basis by the properties of annotations .
Finally, we note that the earliest basis of has the smallest (lexicographically) sequence of size sequence. Hence its total size is at most the size of the 2-approximate minimal homology basis as specified in Lemma 1. Hence putting everything together, we conclude with the following theorem.
The algorithm above computes a 2-approximate minimal homology basis of the 1-dimensional homology group of a simplicial complex with non-negative weights in expected time.
Since an approximate minimal homology basis still forms a basis for , it means that computing it is at least as hard as computing the rank of . Currently the best algorithm for the rank computation for general simplicial complex is (the matrix multiplication time). Hence the best we can expect for computing an approximate minimal homology basis is perhaps (versus the time complexity of the exact algorithm from Section 3.1). We remark that we can also develop an algorithm that computes a -approximate minimal homology basis in time , where is an integer – indeed, as the approximation factor reaches , the time complexity becomes (which is the best time known for rank computation). The framework of this algorithm follows closely from an approach by Kavitha et al. in , and we thus omit the details here.
4.0.2 A counter-example.
Figure 1 gives an example which shows that, without Proposition 2, it is not guaranteed that a candidate set containing a -approximate minimal cycle basis includes a -approximate minimal homology basis. Let the size of a 1-cycle in shown in the figure to be the sum of all edges in the cycle. There is only one minimal cycle basis in this figure, namely and , as shown in Figure (b)b. The minimal homology basis of should be . However, consider the candidate set which contains 4 cycles as shown in Figure (c)c: and . It is easy to check that these 4 cycles in form a 2-approximate minimal cycle basis. However, the smallest homology basis contained in , namely is not a 2-approximate minimal homology basis.
We can make this example into a counter-example for any constant factor approximation, by adding more ’s (triangles) to the sequence, each of which is larger than the previous one and is also filled in. In other words, the optimal homology basis remains , while the smallest-size homology basis from the 2-approximate minimal cycle basis is .
5 Generalizing the size measure
The 1-skeleton of the simplicial complex is the set of vertices and edges in . If there are non-negative weights defined on edges in , it is natural to use the induced shortest path distance in (viewed as a weighted graph) as a metric for vertices in . One can then measure the “size” of a cycle to be the sum of edge weights. Indeed, this is the distance and the size measure considered in Sections 3 and 4. In this section, we show that the algorithmic framework in Algorithm 1 can in fact be applied to a more general family of size measures. Specifically, first, we introduce what we call the path-dominated distance between vertices of (which is not necessarily a metric). Based on such distance function, we then define a family of “size-functions” under which measure we can always compute a minimal homology basis using Algorithm 1. The shortest-path distance/size measure used in Section 3, and the geodesic ball-based measure proposed in  are both special cases of our more general concepts. We also present another natural path-dominated distance function induced by a (potentially complex) map defined on the vertex set of (where is another metric space, say ). As a result, we can use Algorithm 1 to compute the shortest 1-st homology basis of induced by a map .
5.1 Path-dominated distance
Given a connected simplicial complex , suppose we are given a distance function . We now introduce the following path-dominated distance function.
Definition 3 (Path-dominated distance)