The article in front of you is a much improved version of [W2], and is part of a series titled ’ALLSAT compressed with wildcards’. For more about this series as a whole see [W4, Section 9]. While the present article focuses on bare mathematics and algorithmics, three promising applications are outlined at the end of this introduction.
A simplicial complex (also called set ideal) based on a set is a family of subsets (called faces) such that from , , follows . Without further mention, in this article all structures will be finite. In particular all simplicial complexes contain maximal faces, called the facets of . Henceforth we stick to . A face of cardinality is a -face, and the set of all -faces is denoted as . The numbers are the face-numbers of the simplicial complex. Also important will be the minimal nonfaces of . For instance, if consists of all independent sets of a matroid then the facets are the bases of the matroid, and the minimal nonfaces are its circuits. The purpose of this article is to retrieve (from either the facets or the minimal nonfaces) the following data:
an enumeration of ;
an enumeration of for one arbitrary ;
the cardinality ;
the face-numbers for all .
We contribute to these well-researched problems both on the theorethic, but more so on the practical side. Specifically, when computational efficiency lacks a theoretic underpinning it will be evidenced otherwise. The four tasks above can be phrased in terms of Boolean functions but speaking of simplicial complexes is more catchy. Like most (unfortunately not all) authors we take enumeration as a synonym for generation, thus not to be confused with mere counting . While task matches , there is a mismatch between and . Here is why: If we change to the calculation of one , then this (essentially) is just as hard. Our main effort will go into and because we strive for a compressed enumeration in both cases.
We start compression with the don’t-care symbol ’2’ (other authors write ) which, say, in signifies that both bitstrings and are allowed. This leads to 012-rows. For instance, the modelset of a term like is the 012-row . There is a bijection between 012-rows of length and interval sublattices (sometimes called ’cubes’) of the Boolean lattice . For instance represents a 16-element cube with smallest and largest element and respectively. Each 01-row (=bitstring) can be viewed as improper 012-row. As usual is isomorphic to the powerset of . Apart from ’2’ novel types of wildcards will be introduced. For instance, we will encounter 012e-rows and later even 012men-rows like (see Table 10), where respectively mean: at least one 1 here, at least one 0 here, at least one 1 and 0 here.
Here comes the Section break up. Section 2 serves to disentangle, once and for all, so called Hypergraph Dualization from the remaining Sections of our article. In Sections 3 to 7 the simplicial complex whose facets are
will accompany us. Although is ’simply’ , making such kinds of unions disjoint requires considerable effort.
In Section 3 we exclusively rely on the minimal nonfaces of simplicial complexes. With them our four problems can be solved smoothly. In contrast Sections 4 to 7 exclusively employ the facets of simplicial complexes to solve the four problems. Specifically, our novel solution to the -hard problems in Section 4, and similarly in Section 5, challenges Binary Decision Diagrams, i.e. another framework that springs to mind for counting models of fixed cardinality.
As to the core Section 6, note that amounts to enumerating the model set of a Boolean function given in disjunctive normal form (DNF). It is known that this can be done, bitstring by bitstring, in polynomial total time. If instead of 01-rows (=bitstrings) one allows (and encourages) disjoint 012-rows, one speaks of an exclusive sum of products (ESOP). For instance, our ”naive algorithm” in 6.2 (based on binary search and depth-first search) transforms a simplicial complex, given by its facets (i.e. an anti-monotone Boolean function in DNF) into an ESOP. Yet little compression is achieved. Our ’partitioning e-algorithm’ in 6.3 does better in this respect. In fact 012-rows are generalized to 012e-rows. It still operates in polynomial total time (Theorem 2), and numerical experiments show that it compares favorably to Mathematica’s standard ESOP command.
Section 7 tackles in two ways. One is efficient but lacks a neat theoretic assessment, whereas the other solves in slow one-by-one fashion but provably (Theorem 3) works in polynomial total time. Section 8 glimpses at a ’Second partitioning e-algorithm’ which generalizes 012e-rows to 012men-rows.
We now briefly mention three applications of our algorithms. The first application concerns a popular area of Data Mining that goes under the name Frequent Set Mining. Specifically the partitioning e-algorithm can compress all frequent sets from a knowledge of either the maximal frequent sets (i.e. the facets), or the minimal infrequent sets [W2, Section 8]. The second application concerns combinatorial commutative algebra, keywords being face-numbers, -polynomial, partitionable simplicial complex [W2, Section 7], [DKM]. The third application tackles the classic inclusion-exclusion formula with its inhumanely many summands. It is vexing that on top of that, many summands are often zero. Pleasantly, the nonzero summands match a simplicial complex (aka ’nerve’) and our methods can be used to isolate the nerve beforehand. This and other features speed up classic inclusion-exclusion [W5].
2 Disentangling Hypergraph Dualization
In Sections 3 to 5 we employ an algorithm that resembles Hypergraph Dualization but should not be confused with it. Let us first recall that by Hypergraph Dualization (HD) one means the calculation of all minimal transversals of a hypergraph ( set system) . This has plenty applications and many algorithms for HD have been proposed. The major unsolved problem is whether all minimal transversals can be generated in polynomial total time.
Consider any simplicial complex with facets and so on. Putting for any it holds that
Vice versa, suppose is given by its minimal nonfaces , and so forth. It then holds that
Notice the different types of sets and being complemented in (2) and (3). It follows from (2) that the minimal ’s with , i.e. the minimal nonfaces , are exactly the minimal transversals of the hypergraph .
Recall that in Section 3 we shall enumerate (in compact fashion) using its minimal nonfaces. In view of the easy going in Section 3 some readers may scorn the later Sections, and instead imagine HD being applied beforehand in order to force back the minimal nonfaces. Trouble is, often the minimal nonfaces by far outnumber the facets. On a small scale this is testified by (1) and (4) below. The upshot is that considering the facets as the only available data, as in Sections 4 to 7, is arguably worthwhile.
3 Assessing from its minimal nonfaces
Suppose that was given not by its facets listed in (1), but by its minimal nonfaces, which are these:
For instance, is not a subset of any in (1), but each 2-element subset of is contained in some . In Subsection 3.1 we shall see how task can be carried out smoothly for (or any ) if the minimal nonfaces are known. In Subsection 3.2 the same is done for . Problems and are dealt with in 3.3.
3.1 As for , note that coincides in view of (3) with the set of all noncovers of , i.e. for all . Hence applying111This previously published algorithm will be discussed further as we go along. the (noncover) -algorithm to delivers as a disjoint union of so called -rows, in our case they are in Table 1.
Table 1: Compressing with the noncover -algorithm
By definition (say) encodes a certain set of length 9 bitstrings each one of which corresponding, in the usual way, to a face of . Specifically, we put zeros, ones, and twos. The value of for can freely be chosen as or , thus 2 is a don’t-care symbol. As to the wildcards222One may have more than one such wildcard per -row, as illustrated in 3.2.1. of type , by definition they demand “at least one 0 here”. Thus belongs to iff , and or ). Viewed as sets of bitstrings our -rows are mutually disjoint; for instance since , whereas . It is evident that using Table 1 a ’traditional’ enumeration of face-by-face is easily achieved. However, compressions as in Table 1 are more useful, e.g. for optimization purposes.
As announced in Section 2, the -algorithm bears an interesting relationship to HD. On the one hand it is superior to HD in that it yields not just its facets, but the total simplicial complex in a compressed format. By the same token it is inferior to HD in that often only the facets are required. An extra effort would be required to sieve them from the -rows. In our case to from (1) are to be found in respectively.
3.2 Here comes the theoretic assessment of .
Theorem 1: Assume the minimal nonfaces of the simplicial complex are known. Then can be represented as a disjoint union of many -rows in polynomial total time .
Proof. The minimal nonfaces in (4) suggest to view (or any as the model set of the Boolean function
This is a Horn-CNF since each clause has at most one positive literal (in fact none). Generally, if is a Horn-CNF with clauses then the Horn--algorithm of [W1, Cor.6] enumerates as a union of many disjoint -rows in total polynomial time .
When the Horn-CNF has only negative clauses, as in (5), the Horn -algorithm boils down to the noncover -algorithm that we glimpsed in Section 3.1. Notice that the total polynomial time achieved in Theorem 1 is more than can be said about competing methods; more on that in Section 5. Obviously in view of disjoint rows. In practise, as we shall see in related circumstances (Section 6.3) often the gap between and is large.
3.3 As to problem , i.e. the enumeration of all -faces from the minimal nonfaces, this can be handled by processing the rows of Table 1 individually. Trouble is, other than Theorem 1, it doesn’t yield a polynomial total time procedure to enumerate because there can be -rows for which is empty. For instance, choosing and the -row in Table 1 has . Snubbing ’the issue’ let us nevertheless refine our idea, willing to forsake a theoretic assessment. The gist of it is in 3.3.1, the subtleties follow in 3.3.2.
3.3.1To fix ideas take as . The set of all -faces in can succinctly be written as
Generally the wildcard means “exactly many ’s here”. As to , it is obviously the disjoint union of and . These rows give rise to and in Table 2. The whole of Table 2 represents as disjoint union of -rows.
Table 2: Compressing by processing Table 1 with the -algorithm
In particular, the number of 3-faces in is
We hasten to add, when only the sheer size of is required, it can be calculated faster, such as in Section 5.
3.3.2 What happens if, other than in Table 1, the -rows that constitute feature several -wildcards per row? For instance if
and , how would the esteemed reader enumerate one-by-one? This is less obvious than what it seems at first glance, but according to [W3, Thm. 2] these kind of enumerations are doable one-by-one in total polynomial time. Yet here we strive for a more compact enumeration, i.e. a disjoint union of -rows. That leads333That improves the previously introduced name ’Flag of Papua’. to the Flag of Bosnia (FoB) which comes in two Types:
Figure 1: Flag of Bosnia of Type 1 Figure 2: Flag of Bosnia of Type 0
Thus can be written as the disjoint union of the four -rows that constitute Figure 2. By “multiplying out” the six FoBes of Type 0 associated to up to in (6) we would obtain as a disjoint union of many -rows. However, many of these 360 rows feature more than ten 1’s and thus should not be created in the first place. Fortunately it is easy to predict which -rows must be built. Because of we build those concatenations of rows of FoBes which have
Our Flags of Bosnia, call them to , have rows whose numbers of 1’s are respectively. Hence, in order to find, say, all ’s with (which is , we write 5 in all possible ways as number composition, i.e. as an ordered sum of non-negative integers subject to mentioned bounds:
For instance demands that the rows
of respectively be concatenated to yield . Adding the “rigid” entries of in (6) to gives
From this particular constituent -row of results :
The sketched method will be called the -algorithm.
3.4 As to the counting problem (C), the cardinality of is readily obtained from Table 1:
Generally if is a -row with and with many -wildcards of length respectively, then
As to problem , each face-number of can again be calculated directly from Table 1, using the coefficients of some auxiliary polynomial. Details will be given in Section 5 in a very similar scenario.
4 Calculating the cardinality of from its facets
Recall from the Introduction that in Sections 4 to 7 we exclusively rely on the facets of simplicial complexes when tackling our four problems . Further recall from 3.1 that the (noncover) -algorithm outputs all noncovers of a set system in a compact fashion. In a dual way the (transversal) -algorithm of [W3] outputs in a compact way all transversals of a set system .
Consider now the simplicial complex whose facets are listed in (1). If we apply the -algorithm to then in view of (2) it outputs the set filter as a disjoint union of seven -rows:
Table 3: Compressing with the transversal -algorithm
An -wildcard requires the bitstrings to have “at least one 1 here”. Hence one calculates the cardinality of -rows as we did for -rows in (9). It follows that
This coincides with obtained in (9).
4.1 There is another way to calculate , i.e. using (classic) inclusion-exclusion. Since this involves the addition and subtraction of terms the procedure is only viable for small . In contrast, as shown in [W2, Sec.3.1] the -algorithm can handle much larger values .
5 Calculating the face-numbers of from its facets
Consider a generic -row
It is easy to see that the number Card of -element sets in equals the coefficient of in the row-polynomial
Details on the complexity of calculating these coefficients can be found in [W3, Theorem 1]. Here we simply apply the Mathematica command Expand to the polynomial induced by in Table 3 and obtain
Thus e.g. Card. Let be the number of -element transversals of , i.e. the number of -element sets of . By the above, all numbers
are readily calculated. Hence the face-numbers of (or any simplicial complex given by its facets) can be calculated with this ’subtraction trick’:
For instance . This matches (6) which was computed by other means. In view of the #P-hardness of we regard our threefold approach
call it the face-number e-algorithm, as a nice way to get the face-numbers from the facets. Apart from inclusion-exclusion (similar remarks as in 4.1 apply) and Binary Decision Diagrams (BDD’s) few if any frameworks exist for counting fixed-cardinality models of Boolean functions . True, given a BDD of this task can be done in time linear in the size of the BDD (this exercise of Knuth is discussed in [W6]), but calculating the BDD in the first place cannot be done in total polynomial time, viewing that a random has a BDD of expected size . ’That’s just theory’BDD-aficionados may say. Be it as it may, the author is unfit to orchestrate a contest between the face-number e-algorithm and BDDs because BDDs are not hardwired444The fact that BDDs are supported by Python was nevertheless of some use in [W6]. in Mathematica and he exclusively programs in Mathematica. However, another method (exclusive sums of products) is hardwired in Mathematica and will be pitted against our wildcard technique in 6.4.
6 Enumeration of from its facets: The partitioning e-algorithm
Before we present in 6.3, and numerically evaluate in 6.4, the partitioning e-algorithm for compressing from a knowledge of its facets, we review two earlier lines of of attack in 6.1 and 6.2.
6.1. We begin with the framework of -subsemilattices . If the set of meet-irreducibles (or any -generating set) is known then can be generated one-by-one in polynomial total time by a variety of algorithms. These algorithms e.g. are of interest in Formal Concept Analysis [GO]. Ganter’s NextClosure algorithm [GO,p.44] was the first and is still popular.
The relation to simplicial complexes is that they are highly specific -subsemilattices because ’closed under subsets’ implies ’-closed’. The structure of is easily detected: If is the set of all facets then
, and a moment’s thought confirms that a non-facetbelongs to iff there is an index such that with , and such that implies . However, the fine structure of should be rather irrelevant since alone uniquely determines .
In [KP], which was inspired by NextClosure, not only the individual faces but all covering pairs of faces are generated from the facets in polynomial total time. That only is relevant is reflected by the fact that in [KP] the -subsemilattice defined by plays a prominent role. In [BM], which similarly caters for algebraic combinatorists, the individual faces are organized in a tree-structure. This supports various combinatorial operations (such as contracting edges) but offers no compression.
6.2. Our second framework is Boolean functions. Specifically the complements of the facets of match the terms of a Boolean function in DNF with model set . For instance, yields in view of (1) the DNF
Indeed, if satisfies, say, the last term in (17) then
. (Of course the DNF represents the same function as the CNF in (5).)
By orthogonalizing a Boolean function one means finding an equivalent DNF such that the model sets of any two distinct terms are disjoint. Such a DNF is often called an exclusive sum of products (ESOP). In our terminology orthogonalizing means representing as a disjoint union of 012-rows. If is given as a DNF, then one way [W4] to orthogonalize is to combine binary search with depth-first search. Although this was likely done before, the author could not find a reference. In any case, it is worthwile seeing how the procedure simplifies for anti-monotone DNFs. As seen in this amounts to enumerate a simplicial complex given by its facets to . Instead of depth-first search we choose the equivalent, more visual framework of a Last-In-First-Out (LIFO) stack. The following definitions are handy. Call a 012-row (of length ) feasible if (which amounts to for some ). Call final if (which amounts to for some ).
Initially the LIFO stack only contains the feasible row . Generally always the top 012-row of the LIFO-stack is picked. Its ”first” digit 2 (with respect to a fixed ordering of the index set ) is turned to 0 and 1 respectively (binary search). This yields 012-rows and . By induction was feasible. It follows that is feasible, but not necessarily . These one or two feasible 012-rows replace on the LIFO stack. Furthermore, was not final since by induction the LIFO stack contains no final 012-rows. It follows that is not final either, but could be. If is final then it does not go on the LIFO stack (being an exception to what was said above) but rather on the ”final stack”. As soon as the LIFO stack is empty the union of the 012-rows in the final stack is disjoint and equals . We call this method the naive algorithm.
Few of the delivered 012-rows may be proper. That also depends on the particular ordering of the index set of the 012-rows. For instance, using the natural ordering the naive algorithm represents our 52-element example as a disjoint union of nineteen 012-rows. The minimum (=13) and maximum (=44) number of final 012-rows are obtained (e.g.) for the orderings and respectively. See also Section 6.4.
6.3. As far as the author can survey the ESOP landscape, wildcards beyond ’2’ (offering potentially higher compression) have not been used yet. That happens now, again merely for anti-monotone DNFs, and in the framework of simplicial complexes . Thus suppose has facets to , and by induction we have obtained for some a representation
with -rows . If is the -row matching then evidently
and so the key problem is this: For a given -row and -row recompress the set difference as disjoint union of -rows. Let us do away with the two extreme cases first. First, iff thus iff either a 1 or -wildcard of falls into zeros. Second, iff , thus iff zeros. For instance .
Table 4: Recompressing the type set difference
In all other cases we focus on the flexible (i.e. ) symbols of , thus for in Table 4 the symbols on the positions 1 to 11. The only way for to detach itself from (the ’plebs’ in) is to employ those flexible symbols of that are “above” a of , in the sense that they occupy a position which in is occupied by . For the particular and in Table 4 a bitstring detaches itself from iff ones. Depending on whether the smallest element of ones belongs to (this partition is dictated by the wildcard pattern of ), the bitstring belongs to exactly one of the sons . Notice that a variant of a Type 1 Flag of Bosnia, whose lower triangular part is rendered boldface, appears in Table 4.
The powersets of the five facets of (see (1)) are listed as the first five -rows in Table 5. Applying detachment repeatedly yields:
One verifies that
which matches (11). We call this method the partitioning e-algorithm, as opposed to the transversal -algorithm of Section 4.
Table 5: Compressing with the partitioning e-algorithm
Theorem 2: Let be the facets of an (otherwise unknown) simplicial complex . Then the partitioning e-algorithm enumerates as a union of disjoint 012e-rows in time .
Proof. By induction assume that for some the decomposition (18) has been achieved. If some 012e-row is contained in then neither nor any of its sons and grandsons will survive in the long run. Thus is a dud, i.e. causing work without benefit. Moreover, unless is cancelled right away, it is impossible to predict the algorithm’s total time. Fortunately, letting be the unique largest set in (thus is obtained by setting all ’s and ’s to ), it holds that
Testing for all whether with costs . In other words, that is the cost of pruning the righthand side of (18) from duds. What is the cost to get from a (pruned) representation (18) to a (not yet pruned) representation (19)? Because has at most sons (which is clear from Table 4), and ’writing down’ each son is obvious (i.e. costs ), the asked for cost is . Hence the overall cost is
6.4 As in 3.1, it can only be proven that , yet the numerical experiments below show that often . Specifically, for various values of we produced at random subsets (=facets) of , for simplicity all of uniform cardinality (=facet size). We compute the precise but (to save space) record only the rounded cardinality of the ensuing simplicial complex . Furthermore the number of final 012e-rows spawned by the partitioning e-algorithm, and its running time in seconds are recorded. Likewise is the number of exclusive products (= disjoint 012-rows) delivered by the Mathematica command BooleanConvert (option ’ESOP’), and is the corresponding running time. The fact that the partitioning e-algorithm is implemented in high-level Mathematica code, whereas BooleanConvert is hardwired, obviously disadvantages the partitioning e-algorithm. To what degree is hard to say but this is clear: Whenever the partitioning e-algorithm is faster than BooleanConvert, the former would look better still on a level playing field. Generally speaking, the partitioning e-algorithm dislikes many short facets (look at (w,h,fs)=(50,1000,10) ), but likes few large facets. Interestingly, in such situations it may even best the time of Mathematica’s hardwired SatisfiabilityCount: It took the partitioning e-algorithm 1114 seconds to squeeze the faces housed in 70 facets of size 300 into a mere 707518 many 012e-rows, whereas SatisfiabilityCount (which we only asked to count the faces) was stopped after fourteen fruitless hours. Whenever SatisfiabilityCount delivered a number, it coincided with the number of faces readily derived from the output of the partitioning e-algorithm. Hence, due to their very different methods of computation, both algorithms are very likely correct. A frownie :-( in Table 6 means that the algorithm ran, without finishing, for at least 5 hours. So much about versus .
As to versus , these numbers are more telling since they are independent of the particular implementations of the two algorithms. In all instances we found , for instance 637 many 012e-rows versus 11134 many 012-rows in the (2000,70,30) instance. Not only is larger than , the Mathematica command MemoryInUse (whatever its units) shows that also internally BooleanConvert is more memory-intensive than the partitioning e-algorithm. For example, in our random instance of type the before/after measurements were and for the partitioning e-algorithm, but and