    # Fast Multi-Subset Transform and Weighted Sums Over Acyclic Digraphs

The zeta and Moebius transforms over the subset lattice of n elements and the so-called subset convolution are examples of unary and binary operations on set functions. While their direct computation requires O(3^n) arithmetic operations, less naive algorithms only use 2^n poly(n) operations, nearly linear in the input size. Here, we investigate a related n-ary operation that takes n set functions as input and maps them to a new set function. This operation, we call multi-subset transform, is the core ingredient in the known inclusion–exclusion recurrence for weighted sums over acyclic digraphs, which extends Robinson's recurrence for the number of labelled acyclic digraphs. Prior to this work the best known complexity bound was the direct O(3^n). By reducing the task to multiple instances of rectangular matrix multiplication, we improve the complexity to O(2.985^n).

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

In this paper, we consider the following problem. We are given a finite set and, for each element , a function from the subsets of to some ring . The task is to compute the function given by

 g(T)=∑S⊆T∏i∈Tfi(S),T⊆U. (1)

We shall call the multi-subset transform of . While the present study of this operation on set functions stems from a particular application to weighted counting of acyclic digraphs, which we will introduce later in this section, we believe the multi-subset transform could also have applications elsewhere.

A straightforward computation of the multi-subset transform requires arithmetic operations (i.e., additions and multiplications in the ring ) when has elements. In the light of the input size , one could hope for an algorithm that requires operations. Some support for optimism is provided by the close relation to two similar operations on set functions: the zeta transform of and the subset convolution of and , given respectively by

 (fζ)(T)=∑S⊆Tf(S)and(f1∗f2)(T)=∑S⊆Tf1(S)f2(T∖S),T⊆U;

these unary and binary operations can be performed using [Yates1937, Kennes1992] and [Bjorklund2007] arithmetic operations, thus significantly beating the naive -computation. Indeed, consider the seemingly innocent replacement of “” by “” or “” in (1): either one would yield a variant that immediately (and efficiently) reduces to the zeta transform. Likewise, replacing the factor in the product by would give us an instance of subset convolution. The present authors do not see how to fix these “broken reductions”—the multi-subset transform could be a substantially harder problem not admiting a nearly linear-time algorithm. One might even be tempted to hypothesize that one cannot reduce the base of the exponential complexity below the constant . We refute this hypothesis:

The multi-subset transform can be computed using arithmetic operations.

We obtain our result by a reduction to rectangular matrix multiplication (RMM). The basic idea is to split the ground set into two halves and and divide the product over into two smaller products accordingly. In this way we can view (1) as a matrix product of dimensions . The two rectangular matrices are sparse, with at most non-zero elements out of the total . The challenge is to exploit the sparsity. Known algorithms for general sparse matrix multiplication [Yuster2005, Kaplan2006] turn out to be insufficient for getting beyond the bound (see Section 2.1 for details). Fortunately, in our case the sparsity occurs in a special, structured form that enables better control of zero-entries, and thereby a more efficient reduction to dense RMM. To get the best available constant base in the exponential bound, we call upon the recently improved fast RMM algorithms [Gall2018].

### 1.1 Application to weighted counting of acyclic digraphs

Let be the number of labeled acyclic digraphs on nodes. Robinson [Robinson1973] and Harary and Palmer [Harary1973], independently discovered the following inclusion–exclusion recurrence:

 an=n∑s=1(−1)s−1(ns)2s(n−s)an−s.

To see why the formula holds, view as the number of sink nodes (i.e., nodes with no children), each of which can choose its parent nodes freely form the remaining nodes.

Tian and He [Tian2009] generalized the recurrence to weighted counting of acyclic digraphs on a given set of nodes . Now every acyclic digraph on is assigned a modular weight, that is, a real-valued weight that factorizes into node-wise weights , where is the set of parents of node in . This counting problem has applications particularly in so-called Bayesian learning of Bayesian networks from data [Friedman2003, Tian2009, Talvitie2019]. Letting denote the weighted sum of acyclic digraphs on , we have

 aV=∑D∏i∈Vwi(Di)=∑∅≠S⊆V(−1)|S|−1(∏i∈S∑Di⊆V∖Swi(Di))aV∖S. (2)

The recurrence enables computing using arithmetic operations [Tian2009].

We will apply Theorem 1 to lower the base of the exponential bound:

The sum over acyclic digraphs with modular weights can be computed using arithmetic operations.

### 1.2 Related work

There are numerous previous applications of fast matrix multiplication algorithms to decision, optimization, and counting problems. Here we only mention a few that are most related to the present work.

Williams [Williams2005] employs fast square matrix multiplication to count all variable assignments that satisfy a given number of constraints, each involving at most two variables. By a simple reduction, this yields the fastest known algorithm for the Max-2-CSP problem. The present work is based on the same idea of viewing the product of a group of low-arity functions as a large matrix; this general idea is also studied in the doctoral thesis of the first author [Koivisto2004, Sects. 3.3 and 3.6], including reductions to RMM, however, without concrete applications.

Björklund, Kaski, and Kowalik [Bjorklund2017] apply fast RMM to show the following: Given a nonnegative integer and three mappings , , from the subsets of an -element set to some ring, one can sum up the products over all pairwise disjoint triplets of -sets using ring operations, where and are constants independent of and . Consequently, one can count the occurrences of constant-size paths (or any other small-pathwidth patterns) faster than in the “meet-in-the-middle time” [Bjorklund2017]. While the involvement of set functions and set relations bear a resemblance to those in multi-subset transform, the reduction of Björklund et al. is based on solving an appropriately constructed system of linear equations, and is thus very different from the combinatorial approach taken in the present work.

## 2 Fast multi-subset transform: proof of Theorem 1

We will develop an algorithm for multi-subset transform in several steps. In Section 2.1 we give the basic reduction to RMM and the idea of splitting the sum over into several smaller sums. Then, in Section 2.2 we present a simple implementation of the splitting idea, and get our first below- algorithm. This algorithm is improved upon in Section 2.3, yielding the claimed complexity bound. We end this section by presenting a more sophisticated splitting scheme in Section 2.4. We have not succeed to give a satisfactory analysis of its complexity. Yet, our numerical calculations suggest the bound .

We will denote by , for , the smallest value such that the product of an matrix by an can be computed using arithmetic operations for any constant ; for a formal definition of , see Gall and Urrutia [Gall2018]. Thus, the exponent of square matrix multiplication is .

We will make repeated use of the following facts about binomial coefficients:

###### Fact .

For integers and we have

 (2n)−1/2b(kn)n≤(nk)≤k∑j=0(nj)≤b(kn)n=2nH(k/n),

where

 b(x):=x−x(1−x)x−1andH(x):=log2b(x),x∈[0,1].

This can be proven using Stirling’s approximation to factorials.

###### Fact .

Let be a positive integer. The function is increasing in and strictly decreasing in .

This can be proven by observing that the ratio equals , and is thus decreasing in , and is greater or equal to exactly when .

### 2.1 Basic reduction to rectangular matrix multiplication

Assume without loss of generality that is even. Let us arbitrarily partition into two disjoint sets and , both of size . If , denote by and respectively the intersections and . Furthermore, write so that .

Armed with this notation, we write the multi-subset transform of set functions as

 g(T)=G(T1,T2):=∑S⊆UF1(T1,S)F2(T2,S),T⊆U, (3)

where we define

 Fp(Tp,S):=[S∩Up⊆Tp]∏i∈Tpfi(S),p=1,2.

Here the Iverson’s bracket notation evaluates to if is true, and to otherwise.

We can write the representation (3) in terms of a matrix product as

 G=F1F⊤2,

where is an matrix indexed in and is an matrix indexed in . As above, we will write the index pair in parentheses (not as subscripts).

Applying fast RMM without any further tricks already yields a somewhat competitive asymptotic complexity bound. To see this, recall that denotes the exponent of RMM of dimensions . Since [Gall2018], we get that , and thus , can be computed using arithmetic operations. If the lower bound was tight, we would achieve the bound .

So far, we have ignored the sparsity of the matrices . An entry is zero whenever the intersection is not contained in . Thus, out of the entries of , at most are nonzero. In general, one can compute a matrix product of dimensions using operations, provided that the matrices have at most non-zero entries, irrespective of [Kaplan2006]. This result applies to our case, but with the best known upper bound for [Gall2014], it only yields a bound . A direct reduction to multiple multiplications of sparse square matrices [Yuster2005] yields an even worse bound, (calculations omitted). Output-sensitive sparse matrix multiplication algorithms [Amossen2009] will not work either, as our output matrix is dense in general.

Luckily, in our case, we can make more efficient use of the sparsity. We will decompose the matrix product into a sum of smaller matrix products, as formulated by the following representation (the proof is trivial and omitted): Let be a set partition of . Let be the submatrix of obtained by removing all colums but those in , for and . Then

 G=M∑q=1Gq, where Gq=F1qF⊤2q.

We will also apply this decomposition after removing some rows from the matrices . Then the index sets may be different for different . To properly define the entry-wise addition in these cases, we simply make the convention that the missing entries equal zero.

To employ a fast RMM algorithm we will call a function . The function returns the product , where each is obtained from by only keeping the rows and the columns . Note that we do not show the input matrices explicitly in the function call, as the submatrices will always be extracted from and .

### 2.2 A simple below-3 algorithm

We apply Lemma 2.1 with and split the columns to those that are smaller than a threshold and to those that are at least as large:

 S1={S⊆U:|S|<σn}andS2={S⊆U:|S|≥σn}.

We assume is an integer and that . We will optimize the parameter later. The idea is to call fast RMM only for summing over the columns and to handle the remaining columns in a brute-force manner. The algorithm is given in Figure 1.

Consider first the computation of the matrix . We compute using fast RMM. The computational complexity depends on the number of columns in the matrices and . Letting be the number of columns, the required number of operations for the matrix multiplication of dimensions is , where . We have

 C=|S1|=σn∑s=0(ns)≤b(σ)n, (4)

where the inequality follows by Fact 2.

Consider then the computation of the matrix . To compute , for , it suffices to compute the sum of the products over all columns whose size is at least . Thus, the required number pairs to be considered is at most

 B:=n∑s=σn(ns)2n−s≤n(nσn)2n(1−σ)≤n(21−σb(σ))n (5)

where the penultimate inequality follows by Fact 2 (since ) and the last by Fact 2.

Let us finally combine the bounds in (4) and (5).

For any , the number of operations required by is

 O(2n(ω(2H(σ))+ϵ)/2+n2n(1−σ+H(σ))).

It remains to choose so as to optimize the bound. Clearly the first term is increasing and the second term is decreasing in . Thus, the bound is (asymptotically) minimized by choosing a that makes equal to . There are two obstacles to implement this idea: first, we only know upper bounds for , for various ; second, no closed-form expression is known for the best upper bounds—upper bounds for have been computed and reported only at some points [Gall2018].

Due to these complications, we resort to the following facts:

###### Fact ([Gall2018]).

The exponent of RMM satisfies .

###### Fact .

Let and . The exponent of RMM satisfies .

(This follows by reducing the larger RMM instance trivially to multiple smaller instances.)

Combining these two facts yields an upper bound:

 ω(2H(σ))≤ω(1.75)+2H(σ)−1.75≤1.271591+2H(σ).

Now, solving gives

 σ=1−1.271591/2=0.3642045.

With this choice of the complexity bound becomes .

### 2.3 A faster below-3 algorithm

Next we give a slightly faster algorithm to compute . This will allow us to choose a larger threshold , thus also rendering the computation of faster.

Instead of computing directly using fast RMM, we now compute some rows and columns of in a brute-force manner and only apply fast RMM to the remaining smaller matrix. Specifically, the agorithm only calls fast RMM to compute the entries where the sizes of and exceed . We assume that is an integer and that . We will optimize the parameter together with later. The algorithm is given in Figure 2. The correctness of the algorithm being clear, we proceed to analysing the complexity in terms of the required number of arithmetic operations.

Consider first the computation of an entry where . The number of pairs satisfying and is given by

 B′:=3hτh∑t=0(ht)2t≤3hh(hτh)2τh≤h(3⋅2τb(τ))h; (6)

the penultimate inequality follows by Fact 2 (since ) and the last inequality by Fact 2.

Similarly, computing the entries for all and such that requires at most additions and multiplications.

It remains to compute the entries for and such that . This can be computed as a product of two matrices (submatrices of and ) whose sizes are at most and , where is as before and

 R:=h∑j=τh+1(hj)≤b(τ)h, (7)

where the inequality follows by Fact 2 (since ).

Let us combine the bounds in (6) and (7): For any , the number of operations required by is

 O(n(3⋅2τb(τ))n/2+b(τ)(ω(k)+ϵ)n/2+n(21−σb(σ))n),% where k=2logb(τ)b(σ).

To set the parameters and , we resort to the bound (Fact 2.2 and Fact 2.2). Balancing the latter two terms in the bound yields the equation

 (1.271591+k)H(τ)=2(1−σ+H(σ)).

Equivalently, . Solving for and equating the first and the third term in the bound leaves us the equation

 log23+τ+H(τ)=1.271591⋅H(τ)+2H(1−0.6357955⋅H(τ)).

By numerical calculations we find one solution in the valid range, , and correspondingly . With these choices the complexity bound becomes . This completes the proof of Theorem 1.

### 2.4 A covering based algorithm

The previous algorithms were based on pruning some columns and rows of the matrices and , and applying fast RMM to the remaining multiplication of two reduced matrices. Now, we take a different approach and reduce the original problem instance into multiple, smaller RMM instances applying Lemma 2.1 with some . To this end, we cover the columns by multiple groups such that the columns in one group contain a large block of zero entries (in the same set of rows) in the matrices and .

It will be convenient to consider sets of fixed sizes. For a set and a nonnegative integer , write for the set of all -element subsets of . Let fix the sizes of the intersection of a column with the sets and . We wish to cover the set (of set pairs) by a small number of sets of the form , where the sets and are of some fixed sizes and . The following classic result [Erdos1974] shows that this covering design problem has an efficient solution:

[[Erdos1974]] Let be the minimum number of subsets of of size such that every subset of size is contained by at least one of the sets. We have

 c(v,k,s)(ks)(vs)−1≤1+ln(ks).

In particular, is within the factor of the obvious lower bound .

Although the work needed for constructing a covering does not contribute to the number of operations in the ring , a remark is in order if one is interested in the required number of other operations. The authors are not aware of any deterministic algorithm for constructing an optimal covering in time polynomial in , while asymptotically optimal randomized polynomial-time algorithms are known [Gordon1996].

Fortunately, for our purposes it suffices to run the well known greedy algorithm that iteratively picks a set that covers the largest number of yet uncovered elements. It finds a set cover whose size is within a logarithmic factor of the optimum, which is sufficient in our context. Furthermore, it can be implemented to run in time linear in the input size [Cormen2009, Ex. 35.3–3], which is in our case (with ).

From now on, we assume that for we are given a set family that has the desired coverage property, i.e., is a set cover of , so that for every column satisfying , there is a pair such that , . In what follows, we will assume that some appropriate values of , are chosen based on ; we will return back to the issue of finding good values at the end of this subsection.

For each pair , we construct a submatrix of as follows: remove from all columns not covered by , and all rows whose intersection with contains less than elements (as otherwise we cannot have and the entry vanishes). We construct a matrix analogously by removing columns and rows from . The dimensions of the matrix product are , where

 R1:=k1∑j=s1(k1j)2h−k1,C′:=(k1s1)(k2s2),R2:=k2∑j=s2(k2j)2h−k2.

Algorithm , given in Figure 3, organizes the reduction to multiple RMM instances like this using Lemma 2.1. Specifically, from the set cover of the columns it extracts a set partition by trivially keeping track of the already covered columns.

To analyze the complexity of the algorithm, let us first bound the dimensions , , and for fixed , , , . We aim at bounds of the form for some , and therefore parameterize the set sizes as

 sp=σphandkp=κph,p=1,2.

Thus . In what follows, we let evaluate to if .

We have

 R1≤Nβ1,C′≤Nα1+α2,R2≤Nβ2,

where

 (8)
###### Proof.

The bound for follows directly from the definitions of , , and from Fact 2.

For the bound on (equivalently ), suppose first that . Then using the simple inequality gives the claimed bound. Otherwise, and thus, by Fact 2, , implying the claimed bound. ∎

It remains to turn the bounds on the dimensions to a bound on the complexity of the corresponding RMM and sum up these bounds over the multiple matrix multiplication tasks. For any , the number of operations required by is , where

 γ=max0≤σ1≤10≤σ2≤1minσ1≤κ1≤1σ2≤κ2≤1H(σ1)+H(σ2)−α1−α2+β1+β2+β∗(ω(α1+α2β∗)−2), (9)

with and as defined in (8), and .

###### Proof.

Let .

Consider first the complexity of a single matrix multiplication with fixed , for . By Lemma 2.4 we obtain an upper bound by taking matrix multiplications of dimensions . This gives us the upper bound , where . Note that we used only a half of —we will need the other half for tolerating a nonzero underestimation that is due to minimizing over reals. We will return to this issue at the end of the proof.

Consider then the number of matrix multiplications for fixed , for . By Theorem 2.4 and by the approximation ratio of the greedy algorithm, the number is at most

 n4(hs1)(hs2)(k1s1)−1(k2s2)−1 ≤ n5b(σ1)hb(σ2)hb(σ1/κ1)−κ1hb(σ2/κ2)−κ2h = n5NH(σ1)+H(σ2)−α1−α2.

Here we used Fact 2 to bound the binomial coefficients, observing that .

Now, combine the above two bounds, recall that , and observe that replacing the sum over by the maximum over is compensated by adding a factor of to the bound. The algorithm can select optimal and by optimizing the upper bound, which costs yet another factor of . Due to the constant in the exponent, we can ignore the factor in the asymptotic complexity bound.

To complete the proof, we show that for any values of and (hence also for the optimal values) and for any large enough integer , there are rational numbers such that (i) are integers and (ii) , where

 Γ(σ1,σ2,κ1,κ2):=H(σ1)+H(σ2)+β1+β2+β∗(ω(α1+α2β∗)−(α1+α2β∗)−2). (10)

Note that we rearranged some terms in (9

), for a reason that will be revealed in a moment.

We will consider two cases: either or is near the boundary values or , or both are in , where is a small constant. We choose such that if or , then regardless of ,

 Γ(σ1,σ2,1,1)≤ω(1)+ϵ/2,

and symmetrically for . To see that this is possible, observe first that at we have , , and thus

 Γ(σ1,σ2,1,1) = β1+β2+β∗(ω(α1+α2β∗)−2) ≤

where if . Observe that . Since and ,

 Γ(σ1,σ2,1,1)≤α1+α2−α∗+β1+β2+ω(1)−2≤ω(1)+H(σ1).

For the latter inequality we used the facts that if and that if . Finally, we observe that tends to when tends to or .

On the other hand, we have the lower bound , since and . We may thus restrict out attention to the domain

 Λc:={(σ1,σ2,κ1,κ2):c≤σ1,σ2≤1−c,σ1≤κ1≤1,σ2≤κ2≤1}.

We now show that is continuous on . Observe first that the functions , , and are continuous on (as ). We also have that is continuous and strictly positive (as ) and that is continuous (as for all ).

Since the domain is compact, we have that is uniformly continuous on . This in turn implies that there is a such that (ii) holds whenever , implying that we can make both (i) and (ii) hold for all by putting . ∎

Now we know that the complexity of the algorithm is , but we do not know how large is. Unlike for the simpler algorithms given in the previous subsections, we cannot just select some values of the parameters and and bound from above by , as defined in (10), for we do not know the maximizing values of . Since is uniformly continuous on the domain , one could in principle prove any fixed strict upper bound on with a sufficiently large, finite computation. While at the present time the authors have not produced such a proof, evaluations of at various values of the four parameters suggest the following:

###### Conjecture .

The number of operations required by is .

## 3 Fast weighted counting of acyclic digraphs: proof of Theorem 2

Let us write the inclusion–exclusion recurrence (2) as a multi-subset transform: Without loss of generality, suppose . Let and

 g(T)=∑S⊆T∏i∈Tfi(S), (11)

where

 fi(S)=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩0if 0∉S or % |S