1 Introduction
Numerous problems in machine learning on discrete domains involve learning set functions, i.e., functions that map subsets of some ground set to the real numbers. In recommender systems, for example, such set functions express diversity among sets of articles and their relevance w.r.t. a given need Sharma et al. (2019); Balog et al. (2019); in sensor placement tasks, they express the informativeness of sets of sensors Krause et al. (2008); in combinatorial auctions, they express valuations for sets of items Brero et al. (2019). A key challenge is to estimate from a small number of observed evaluations. Without structural assumptions an exponentially large (in ) number of queries is needed. Thus, a key question is which families of set functions can be efficiently learnt, while capturing important applications. One key property is sparsity in the Fourier domain Stobbe and Krause (2012); Amrollahi et al. (2019); Weissteiner et al. (2020b).
The Fourier transform for set functions is classically known as the orthogonal WalshHadamard transform (WHT) Bernasconi et al. (1996); De Wolf (2008); Li and Ramchandran (2015); Cheraghchi and Indyk (2017). Using the WHT, it is possible to learn functions with at most nonzero Fourier coefficients with evaluations Amrollahi et al. (2019). In this paper, we consider an alternative family of nonorthogonal Fourier transforms, recently introduced in the context of discrete signal processing on set functions (DSSP) Püschel (2018); Püschel and Wendler (2020). In particular, we present the first efficient algorithms which (under mild assumptions on the Fourier coefficients), efficiently learn Fouriersparse set functions requiring at most evaluations. In contrast, naively computing the Fourier transform requires evaluations and operations Püschel and Wendler (2020).
Importantly, sparsity in the WHT domain does not imply sparsity in the alternative Fourier domains we consider, or vice versa. Thus, we significantly expand the class of set functions that can be efficiently learnt. One natural example of set functions, which are sparse in one of the nonorthogonal transforms, but not for the WHT, are certain preference functions considered by Djolonga et al. Djolonga et al. (2016) in the context of recommender systems and auctions. In recommender systems, each item may cover the set of needs that it satisfies for a customer. If needs are covered by several items at once, or items depend on each other to provide value there are substitutability or complementarity effects between the respective items. A natural way to learn such set functions is to compute their respective sparse Fourier transforms.
Contributions. In this paper we develop, analyze, and evaluate novel algorithms for computing the sparse Fourier transform under the various notions of Fourier basis introduced by Püschel Püschel (2018):

We are the first to introduce an efficient algorithm to compute the sparse Fourier transform under the recent notions of nonorthogonal Fourier basis for set functions Püschel (2018); Püschel and Wendler (2020). In contrast to the naive fast Fourier transform algorithm that requires queries and operations, our sparse Fourier transform requires at most queries and operations to compute the nonzero coefficients of a Fouriersparse set function. The algorithm works in all cases up to a null set of pathological set functions.

We then further extend our algorithm to handle an even larger class of Fouriersparse set functions with queries and operations using filtering techniques.

We demonstrate the effectiveness of our algorithms in two realworld set function learning tasks: learning surrogate objective functions for sensor placement tasks and preference elicitation in combinatorial auctions. The sensor placements obtained by our learnt surrogates are indistinguishable from the ones obtained using the compressive sensing based WHT by Stobbe and Krause Stobbe and Krause (2012). However, our algorithm does not require prior knowledge of the Fourier support and runs significantly faster. In the preference elicitation task the nonorthogonal basis naturally captures the structure of the valuation functions: only half as many Fourier coefficients were required in our basis as in the WHT basis.
Please note that all our proofs are in the appendix.
2 Fourier Transforms for Set Functions
We introduce background and definitions for set functions and associated Fourier bases, following the discreteset signal processing (DSSP) introduced by Püschel (2018); Püschel and Wendler (2020). DSSP generalizes key concepts from classical signal processing, including shift, convolution or filtering, and Fourier transform to the powerset domain. The approach follows a general procedure that derives these concepts from a suitable definition of the shift operation Püschel and Moura (2006, 2008).
Set functions. We consider a ground set . An associated set function maps each subset of to a real value:
(1) 
Each set function can be identified with a
dimensional vector
by fixing an order on the subsets. We choose the lexicographic order on the corresponding set indicator vectors.Shifts. Classical convolution (e.g., on images) is associated with the translation operator. Analogously, DSSP considers different versions of "set translations," called models 1–5. One choice (model 4) is for . The shift operators are parameterized by the powerset monoid , since the equality holds for all , and .
Convolutional filters. The corresponding linear, shiftequivariant convolution is given by
(2) 
Namely, , for all . Convolving with is a linear mapping called a filter; is also a set function.
Fourier transform and convolution theorem. The Fourier transform (FT) simultaneously diagonalizes all filters. Thus, different definitions of set shifts yield different notions of Fourier transform. For the shift chosen above the Fourier transform of takes the form
(3) 
with the inverse
(4) 
As a consequence we obtain the convolution theorem
(5) 
Interestingly, (the socalled frequency response) is computed differently than , namely as
(6) 
In matrix form, with respect to the chosen order of , the Fourier transform and its inverse are
(7) 
respectively, in which denotes the fold Kronecker product of the matrix . Thus, the Fourier transform and its inverse can be computed in operations.
The columns of form the Fourier basis and can be viewed as indexed by . The th column is given by , where if and otherwise. The basis is not orthogonal as can be seen from the triangular structure in (7).
Example and interpretation. We start by considering a special class of preference functions that, e.g., model customers in a recommender system Djolonga et al. (2016). Preference functions naturally occur in machine learning tasks on discrete domains such as recommender systems and auctions, in which they, e.g., are used to model complementary and substitution effects between goods. Goods complement each other when their combined utility is greater than the sum of their individual utilities. Analogously, goods substitute each other when their combined utility is smaller than the sum of their individual utilities. Formally, a preference function is given by
(8)  
Equation (8) is composed of a modular part parametrized by , a repulsive part parametrized by , with , and an attractive part parametrized by , with . The repulsive part captures substitution and the attractive part complementary effects.
In Lemma 1 we show that these preference functions are indeed Fouriersparse.
Lemma 1.
Preference functions of the form (8) are Fouriersparse w.r.t. model 4 with at most nonzero Fourier coefficients.
Motivated by Lemma 1, we call set functions that are Fouriersparse w.r.t. model 4 generalized preference functions. Formally, a generalized preference function is defined in terms of a collection of distinct subsets of some universe , , and a weight function . The weight of a set is . Then, the corresponding generalized preference function is
(9) 
For nonnegative weights is called a weighted coverage function Krause and Golovin (2014), but here we allow general (signed) weights. Thus, generalized preference functions are generalized coverage functions. Generalized coverage functions can be visualized by a bipartite graph, see Fig. 1. In recommender systems, could model the customerneeds covered by item . Then, the score that a customer associates to a set of items corresponds to the needs covered by the items in that set. Substitution as well as complementary effects occur if the needs covered by items overlap (e.g., ).
Interestingly, the Fourier coefficients in (3) of a generalized coverage function are
(10) 
which corresponds to the (negative) weights of the fragments of the Venndiagram of the sets (Fig. 1). If the universe contains fewer than items, some fragments will have weight zero, i.e., are Fouriersparse.
Other shifts and Fourier bases. There are several other natural definitions of shifts, each with its respective shiftequivariant convolution, associated Fourier basis, and thus notion of Fouriersparsity. Püschel and Wendler Püschel and Wendler (2020) call these variants model 1–5, with 5 being the classical definition that yields the WHT and 4 the version introduced above. Table 1 collects the key concepts, also including model 3.
The notions of Fouriersparsity can differ dramatically. For example, consider the coverage function for which there is only one element in the universe and this element is covered by all sets . Then, , and for w.r.t. model 4, and and for all w.r.t. the WHT.
Remark 1.
For the same reason the preference functions in (8) with are dense w.r.t. the WHT basis.
The Fourier bases have appeared in different contexts before. For example, (3) can be related to the Wtransform, which has been used by Chakrabarty and Huang Chakrabarty and Huang (2012) to test coverage functions.
3 Learning FourierSparse Set Functions
We now present our algorithm for learning Fouriersparse set functions w.r.t. model 4. One of our main contributions is that the presented derivation also applies to the other models. In particular, we derive the variants for models 3 and 5 from Table 1 in the appendix.
Definition 1.
A set function is called Fouriersparse if
(11) 
Thus, exactly learning a Fouriersparse set function is equivalent to computing its nonzero Fourier coefficients and associated support. Formally, we want to solve:
Problem 1 (Sparse FT).
Given oracle access to query a Fouriersparse set function , compute its Fourier support and associated Fourier coefficients.
3.1 Sparse FT with Known Support
First, we consider the simpler problem of computing the Fourier coefficients if the Fourier support (or a small enough superset ) is known. In this case, the solution boils down to selecting queries such that the linear system of equations
(12) 
admits a solution. Here, is the vector of queries, is the submatrix of obtained by selecting the rows indexed by and the columns indexed by , and are the unknown Fourier coefficients we want to compute.
Theorem 1 (Theorem 1 of Püschel and Wendler Püschel and Wendler (2020)).
Let be Fouriersparse with . Let . Then is invertible and can be perfectly reconstructed from the queries .
Consequently, we can solve Problem 1 if we have a way to discover a , which is what we do next.
3.2 Sparse FT with Unknown Support
In the following we present our algorithm to solve Problem 1. As mentioned, the key challenge is to determine the Fourier support w.r.t (3). The initial skeleton is similar to the algorithm Recover Coverage by Chakrabarty and Huang Chakrabarty and Huang (2012), who used it to test coverage functions. Here we take the novel view of Fourier analysis to expand it to a sparse Fourier transform algorithm for all set functions. Doing so creates challenges since the connection to a positive weight function is lost (see (9)). Using the framework in Section 2 we are going to analyze and address them.
Let , and consider the associated restriction of a set function on :
(13) 
The Fourier coefficients of and the restriction can be related as (proof in appendix):
(14) 
We observe that, if the Fourier coefficients on the right hand side of (14) do not cancel, knowing contains information about the sparsity of , for . To be precise, the relation
(15) 
implies that and both must be zero whenever is zero, assuming Fourier coefficients do not cancel. As a consequence, we can construct
(16) 
As a result we can solve Problem 1 with our algorithm SSFT, under mild conditions on the coefficients, by successively computing the nonzero Fourier coefficients of restricted set functions along the chain
(17) 
Remark 2 (Implementation of Ssft).
For practical reasons we only process up to subsets in line 6. In line 11, we consider a Fourier coefficient
(a hyperparameter) as zero.
Analysis. We consider set functions that are Fouriersparse (but not Fouriersparse) with support , i.e., , which is isomorphic to
(18) 
Let denote the Lebesgue measure on . Let .
Pathological set functions. SSFT fails to compute the Fourier coefficients for which despite . Thus, the set of pathological set functions can be written as the finite union of kernels
(19) 
intersected with .
Theorem 2.
Using prior notation, the set of pathological set functions for SSFT is given by
(20) 
and has Lebesgue measure zero, i.e., .
Complexity. By reusing queries and computations from the th iteration of SSFT in the th iteration, we obtain:
Theorem 3.
SSFT requires at most queries and operations.
3.3 Shrinking the Set of Pathological Fourier Coefficients
According to Theorem 2, the set of pathological Fourier coefficients for a given support has measure zero. However, unfortunately, this set includes important classes of set functions including graph cuts (in the case of unit weights) and hypergraph cuts.^{1}^{1}1As an example, consider the cut function associated with the graph , and , using .
Solution. The key idea to exclude these and further narrow down the set of pathological cases is to use the convolution theorem (5), i.e., the fact that we can modulate Fourier coefficients by filtering. Concretely, we choose a random filter such that SSFT works for
with probability one.
is then obtained from by dividing by the frequency response . We keep the associated overhead in by choosing a onehop filter, i.e., for. Motivated by the fact that, e.g., the product of a Rademacher random variable (which would lead to cancellations) and a normally distributed random variable is again normally distributed, we sample our filtering coefficients i.i.d. from a normal distribution. We call the resulting algorithm
SSFT+, shown above.Analysis. Building on the analysis of SSFT, recall that denotes the set of Fouriersparse (but not Fouriersparse) set functions and are the elements satisfying . Let
(21)  
Theorem 4.
With probability one with respect to the randomness of the filtering coefficients, the set of pathological set functions for SSFT+ has the form (using prior notation)
(22) 
Theorem 4 shows that SSFT+ correctly processes with , iff there is an element for which .
Theorem 5.
If is nonempty, is a proper subset of . In particular, implies , for all with .
Complexity. There is a tradeoff between the amount of nonzero filtering coefficients used and the size of the set of pathological set functions. For example, for the onehop filters used, computing requires queries.
Theorem 6.
The query complexity of SSFT+ is and the algorithmic complexity is .
4 Related Work
We briefly discuss related work on learning set functions.
Fouriersparse learning. There is a substantial body of research concerned with learning Fourier/WHTsparse set functions Stobbe and Krause (2012); Scheibler et al. (2013); Kocaoglu et al. (2014); Li and Ramchandran (2015); Cheraghchi and Indyk (2017); Amrollahi et al. (2019). Recently, Amrollahi et al. Amrollahi et al. (2019) have imported ideas from the hashing based sparse Fourier transform algorithm Hassanieh et al. (2012) to the set function setting. The resulting algorithms compute the WHT of WHTsparse set functions with a query complexity for general frequencies, for low degree () frequencies and for low degree set functions that are only approximately sparse. To the best of our knowledge this latest work improves on previous algorithms, such as the ones by Scheibler et al. Scheibler et al. (2013), Kocaoglu et al. Kocaoglu et al. (2014), Li and Ramchandran Li and Ramchandran (2015), and Cheraghchi and Indyk Cheraghchi and Indyk (2017), providing the best guarantees in terms of both query complexity and runtime. E.g., Scheibler et al. Scheibler et al. (2013)
utilize similar ideas like hashing/aliasing to derive sparse WHT algorithms that work under random support (the frequencies are uniformly distributed on
) and random coefficient (the coefficients are samples from continuous distributions) assumptions. Kocaoglu et al. Kocaoglu et al. (2014) propose a method to compute the WHT of a Fouriersparse set function that satisfies a socalled unique sign property using queries polynomial in and .In a different line of work, Stobbe and Krause Stobbe and Krause (2012) utilize results from compressive sensing to compute the WHT of WHTsparse set functions, for which a superset of the support is known. This approach also can be used to find a Fouriersparse approximation and has a theoretical query complexity of . In practice, it even seems to be more queryefficient than the hashing based WHT (see experimental section of Amrollahi et al. Amrollahi et al. (2019)), but suffers from the high computational complexity, which scales at least linearly with . Regrading coverage functions, to our knowledge, there has not been any work in the compressive sensing literature for the nonorthogonal Fourier bases which do not satisfy RIP properties and hence lack sparse recovery and robustness guarantees.
In summary, all prior work on Fourierbased methods for learning set functions was based on the WHT. Our work leverages the broader framework of signal processing with set functions proposed by Püschel and Wendler Püschel and Wendler (2020), which provides a larger class of Fourier transforms and thus new types of Fouriersparsity.
Other learning paradigms. Other lines of work for learning set functions include methods based on new neural architectures Dolhansky and Bilmes (2016); Zaheer et al. (2017); Weiss et al. (2017)
, methods based on backpropagation through combinatorial solvers
Djolonga and Krause (2017); Tschiatschek et al. (2018); Wang et al. (2019); Vlastelica et al. (2019), kernel based methods Buathong et al. (2020), and methods based on other succinct representations such as decision trees
Feldman et al. (2013) and disjunctive normal forms Raskhodnikova and Yaroslavtsev (2013).5 Empirical Evaluation
We evaluate the two variants of our algorithm (SSFT and SSFT+) for model 4 on three classes of realworld set functions. First, we approximate the objective functions of sensor placement tasks by Fouriersparse functions and evaluate the quality of the resulting surrogate objective functions. Second, we learn facility locations functions (i.e., preference functions) that are used to determine costeffective sensor placements in water networks Leskovec et al. (2007). Finally, we learn simulated bidders from a spectrum auctions test suite Weiss et al. (2017).
Benchmark learning algorithms. We compare our algorithm against three stateoftheart algorithms for learning WHTsparse set functions: the compressive sensing based approach CSWHT Stobbe and Krause (2012), the hashing based approach HWHT Amrollahi et al. (2019), and the robust version of the hashing based approach RWHT Amrollahi et al. (2019). For our algorithm we set and . CSWHT requires a superset of the (unknown) Fourier support, which we set to all with and the parameter for expected sparsity to . For HWHT we used the exact algorithm without lowdegree assumption and set the expected sparsity parameter to . For RWHT we used the robust algorithm without lowdegree assumption and set the expected sparsity parameter to unless specified otherwise.
5.1 Sensor Placement Tasks
We consider a discrete set of sensors located at different fixed positions measuring a quantity of interest, e.g., temperature, amount of rainfall, or traffic data, and want to find an informative subset of sensors subject to a budget constraint on the number of sensors selected (e.g., due to hardware costs). To quantify the informativeness of subsets of sensors, we fit a multivariate normal distribution to the sensor measurements Krause et al. (2008) and associate each subset of sensors with its information gain Srinivas et al. (2010)
(23) 
where is the submatrix of the covariance matrix that is indexed by the sensors and the identity matrix. We construct two covariance matrices this way for temperature measurements from 46 sensors at Intel Research Berkeley and for velocity data from 357 sensors deployed under a highway in California.
The information gain is a submodular set function and, thus, can be approximately maximized using the greedy algorithm by Nemhauser et al. Nemhauser et al. (1978): to obtain informative subsets. We do the same using Fouriersparse surrogates of : and compute . As a baseline we place sensors at random and compute . Figure 2 shows our results. The xaxes correspond to the cardinality constraint used during maximization and the yaxes to the information gain obtained by the respective informative subsets. In addition, we report next to the legend the execution time and number of queries needed by the successful experiments.
Interpretation of results. HWHT only works for the Berkeley data. For the other dataset it is not able to reconstruct enough Fourier coefficients to provide a meaningful result. The likely reason is that the target set function is not exactly Fouriersparse, which can cause an excessive amount of collisions in the hashing step. In contrast, CSWHT is noiserobust and yields sensor placements that are indistinguishable from the ones obtained by maximizing the true objective function in the first task. However, for the California data, CSWHT times out. In contrast, SSFT and RWHT work well on both tasks. In the first task, SSFT is on par with CSWHT in terms of sensor placement quality and significantly faster despite requiring more queries. On the California data, SSFT yields sensor placements of similar quality as the ones obtained by RWHT while requiring orders of magnitude fewer queries and time.
5.2 Learning Preference Functions
number of queries (in thousands)  Fourier coefficients recovered  relative reconstruction error  

B. type  SSFT  SSFT+  HWHT  SSFT  SSFT+  HWHT  SSFT  SSFT+  HWHT 
local  
regional  
national 
We now consider a class of preference functions that are used for the costeffective contamination detection in water networks Leskovec et al. (2007). The networks stem from the Battle of Water Sensor Networks (BSWN) challenge Ostfeld et al. (2008). The junctions and pipes of each BSWN network define a graph. Additionally, each BSWN network has dynamic parameters such as timevarying water consumption demand patterns, opening and closing valves, and so on.
To determine a costeffective subset of sensors (e.g., given a maximum budget), Leskovec et al. Leskovec et al. (2007) make use of facility locations functions of the form
(24) 
where is a matrix in . Each row corresponds to an event (e.g., contamination of the water network at any junction) and the entry quantifies the utility of the th sensor in case of the th event. It is straightforward to see that (24) is a preference function with and . Thus, they are sparse w.r.t. model 4 and dense w.r.t. WHT (see Lemma 1 and Remark 1).
Leskovec et al. Leskovec et al. (2007) determined three different utility matrices that take into account the fraction of events detected, the detection time, and the population affected, respectively. The matrices were obtained by costly simulating millions of possible contamination events in a 48 hour timeframe. For our experiments we select one of the utility matrices and obtain subnetworks by selecting the columns that provide the maximum utility, i.e., we select the columns with the largest .
queries  time (s)  
SSFT  
WHT  
SSFT  
RWHT  1  
2  
4  
8  
SSFT  
RWHT  1  
2  
SSFT  
SSFT  
SSFT  
SSFT 
In Table 3 we compare the sparsity of the corresponding facility locations function in model 4 against its sparsity in the WHT. For , we compute the full WHT and select the largest coefficients. For , we compute the largest WHT coefficients using RWHT. The model 4 coefficients are always computed using SSFT. If the facility locations function is sparse w.r.t. model 4 for some , we set the expected sparsity parameter of RWHT to different multiples up to the first for which the algorithm runs out of memory. We report the number of queries, time, number of Fourier coefficients , and relative reconstruction error. For RWHT experiments that require less than one hour we report average results over 10 runs (indicated by italic font). For , the relative error cannot be computed exactly and thus is obtained by sampling 100,000 sets uniformly at random and computing , where denotes the real facility locations function and the estimate.
Interpretation of results. The considered facility locations functions are indeed sparse w.r.t. model 4 and dense w.r.t. the WHT. As expected, SSFT outperforms RWHT in this scenario, which can be seen by the lower number of queries, reduced time, and an error of exactly zero for the SSFT. This experiment shows certain classes of set functions of practical relevance are better represented in the model 4 basis than in the WHT basis.
5.3 Preference Elicitation in Auctions
In combinatorial auctions a set of goods is auctioned to a set of bidders. Each bidder is modeled as a set function that maps each bundle of goods to its subjective value for this bidder. The problem of learning bidder valuation functions from queries is known as the preference elicitation problem Brero et al. (2019). Our experiment sketches an approach under the assumption of Fourier sparsity.
As common in this field Weissteiner et al. (2020a, b), we resort to simulated bidders. Specifically, we use the multiregion valuation model (MRVM) from the spectrum auctions test suite Weiss et al. (2017). In MRVM, 98 goods are auctioned off to 10 bidders of different types (3 local, 4 regional, and 3 national). We learn these bidders using the prior Fouriersparse learning algorithms, this time including SSFT+, but excluding CSWHT, since is not known in this scenario. Table 2
shows the results: means and standard deviations of the number of queries, number of Fourier coefficients, and relative error (estimated using 10,000 samples) taken over the bidder types and 25 runs.
Interpretation of results. First, we note that SSFT+ can indeed improve over SSFT for set functions that are relevant in practice. Namely, SSFT+ consistently learns sparse representations for local and regional bidders, while SSFT fails. HWHT also achieves perfect reconstruction for local and regional bidders. For the remaining bidders none of the methods achieves perfect reconstruction, which indicates that those bidders do not admit a sparse representation. Second, we observe that, for the local and regional bidders, in the nonorthogonal model 4 basis only half as many coefficients are required as in the WHT basis. Third, SSFT+ requires less queries than HWHT in the Fouriersparse cases.
6 Conclusion
We introduced an algorithm for learning set functions that are sparse with respect to various generalized, nonorthogonal Fourier bases. In doing so, our work significantly expands the set of efficiently learnable set functions. As we explained, the new notions of sparsity connect well with preference functions in recommender systems, which we consider an exciting avenue for future research.
Ethical Statement
Our approach is motivated by a range of real world applications, including modeling preferences in recommender systems and combinatorial auctions, that require the modeling, processing, and analysis of set functions, which is notoriously difficult due to their exponential size. Our work adds to the tool set that makes working with set functions computationally tractable. Since the work is of foundational and algorithmic nature we do not see any immediate ethical concerns. In case that the models estimated with our algorithms are used for making decisions (such as recommendations, or allocations in combinatorial auctions), of course additional care has to be taken to ensure that ethical requirements such as fairness are met. These questions are complementary to our work.
References
 Efficiently Learning Fourier Sparse Set Functions. In Advances in Neural Information Processing Systems, pp. 15094–15103. Cited by: Appendix E, §1, §1, §4, §4, §5.
 Transparent, Scrutable and Explainable User Models for Personalized Recommendation. In Proc. Conference on Research and Development in Information Retrieval (ACM SIGIR), pp. 265–274. Cited by: §1.
 On the Fourier analysis of Boolean functions. preprint, pp. 1–24. Cited by: Table 4, §1.

Fourier meets Möbius: fast subset convolution.
In
Proc ACM Symposium on Theory of Computing
, pp. 67–74. Cited by: Table 4.  Machine Learningpowered Iterative Combinatorial Auctions. Note: arXiv preprint arXiv:1911.08042 Cited by: §1, §5.3.

Kernels over Sets of Finite Sets using RKHS Embeddings, with Application to Bayesian (Combinatorial) Optimization
. InInternational Conference on Artificial Intelligence and Statistics
, pp. 2731–2741. Cited by: §4.  Testing coverage functions. In International Colloquium on Automata, Languages, and Programming, pp. 170–181. Cited by: Table 4, Appendix A, §2, §3.2.
 Nearly optimal deterministic algorithm for sparse walshhadamard transform. ACM Transactions on Algorithms (TALG) 13 (3), pp. 1–36. Cited by: §1, §4.
 A brief introduction to Fourier analysis on the Boolean cube. Theory of Computing, pp. 1–20. Cited by: §1.
 Differentiable learning of submodular models. In Advances in Neural Information Processing Systems, pp. 1013–1023. Cited by: §4.
 Variational inference in mixed probabilistic submodular models. In Advances in Neural Information Processing Systems, pp. 1759–1767. Cited by: Appendix A, §1, §2.
 Deep Submodular Functions: Definitions and Learning. In Advances in Neural Information Processing Systems, pp. 3404–3412. Cited by: §4.
 Representation, approximation and learning of submodular functions using lowrank decision trees. In Conference on Learning Theory, pp. 711–740. Cited by: §4.
 Nearly Optimal Sparse Fourier Transform. In Proc. ACM Symposium on Theory of Computing, pp. 563–578. Cited by: §4.
 Vector spaces as unions of proper subspaces. Linear algebra and its applications 431 (9), pp. 1681–1686. Cited by: §C.2.
 Sparse polynomial learning and graph sketching. In Advances in Neural Information Processing Systems, pp. 3122–3130. Cited by: §4.
 Submodular function maximization.. Cited by: §2.
 Nearoptimal Sensor Placements in Gaussian processes: Theory, Efficient Algorithms and Empirical Studies. Journal of Machine Learning Research 9, pp. 235–284. Cited by: §1, §5.1.
 Costeffective outbreak detection in networks. In Proc. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 420–429. Cited by: §5.2, §5.2, §5.2, §5.

An active learning framework using sparsegraph codes for sparse polynomials and graph sketching
. In Advances in Neural Information Processing Systems, pp. 2170–2178. Cited by: §1, §4.  An analysis of approximations for maximizing submodular set functions — I. Mathematical programming 14 (1), pp. 265–294. Cited by: §5.1.
 The battle of the water sensor networks (bwsn): a design challenge for engineers and algorithms. Journal of Water Resources Planning and Management 134 (6), pp. 556–568. Cited by: §5.2.
 Algebraic signal processing theory: Foundation and 1D time. IEEE Trans. on Signal Processing 56 (8), pp. 3572–3585. Cited by: §2.
 Algebraic Signal Processing Theory. Note: arXiv preprint arXiv:cs/0612077v1 Cited by: §2.
 Discrete signal processing with set functions. Note: arXiv preprint arXiv:2001.10290 Cited by: Appendix D, Appendix E, Appendix E, Appendix E, Appendix E, item 1, §1, §2, §2, §4, Theorem 1.
 A discrete signal processing framework for set functions. In Proc. International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4359–4363. Cited by: item 1, §1, §1, §2.
 Learning pseudoBoolean kDNF and Submodular Functions. In Proc. ACMSIAM Symposium on Discrete Algorithms, pp. 1356–1368. Cited by: §4.
 A Fast Hadamard Transform for Signals with Sublinear Sparsity. In Proc. Annual Allerton Conference on Communication, Control, and Computing, pp. 1250–1257. Cited by: Appendix E, §4.
 Learning from sets of items in recommender systems. ACM Trans. on Interactive Intelligent Systems (TiiS) 9 (4), pp. 1–26. Cited by: §1.
 Gaussian process optimization in the bandit setting: no regret and experimental design. In Proc. International Conference on Machine Learning (ICML), pp. 1015–1022. Cited by: §5.1.
 Learning Fourier Sparse Set Functions. In Artificial Intelligence and Statistics, pp. 1125–1133. Cited by: Appendix E, item 3, §1, §4, §4, §5.
 Differentiable submodular maximization. In Proc. International Joint Conference on Artificial Intelligence, pp. 2731–2738. Cited by: §4.
 Differentiation of Blackbox Combinatorial Solvers. arXiv preprint arXiv:1912.02175. Cited by: §4.

SATNet: Bridging deep learning and logical reasoning using a differentiable satisfiability solver
. arXiv preprint arXiv:1905.12149. Cited by: §4.  SATS: a universal spectrum auction test suite. In Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 51–59. Cited by: §4, §5.3, §5.
 Deep Learningpowered Iterative Combinatorial Auctions. In 34th AAAI Conference on Artificial Intelligence, Cited by: §5.3.
 Fourier Analysisbased Iterative Combinatorial Auctions. Note: arXiv preprint arXiv:2009.10749 Cited by: §1, §5.3.
 Sampling signals on meet/join lattices. In Proc. Global Conference on Signal and Information Processing (GlobalSIP), Cited by: Appendix E, Appendix E.
 Deep sets. In Advances in Neural Information Processing Systems, pp. 3391–3401. Cited by: §4.
Appendix
Appendix A Preference Functions
Let denote our ground set. For this section, we assume .
An important aspect of our work is that certain set functions are sparse in one basis but not in the others. In this section we show that preference functions Djolonga et al. (2016) indeed constitute a class of set functions that are sparse w.r.t. model 4 (see Table 4, which we replicate from the paper for convenience) and dense w.r.t. model 5 (= WHT basis). Preference functions naturally occur in machine learning tasks on discrete domains such as recommender systems and auctions, in which they, e.g., are used to model complementary and substitution effects between goods. Goods complement each other when their combined utility is greater than the sum of their individual utilities. E.g., a pair of shoes is more useful than the two shoes individually and a round trip has higher utility than the combined individual utilities of outward and inward flight. Analogously, goods substitute each other when their combined utility is smaller than the sum of their individual utilities. E.g., it might not be necessary to buy a pair of glasses if you already have one. Formally, a preference function is given by
(25)  
Equation (25) is composed of a modular part parametrized by , a repulsive part parametrized by , with , and an attractive part parametrized by , with .
Lemma 2.
Preference functions of the form (25) are Fouriersparse w.r.t. model 4.
Proof.
In order to prove that preference functions are sparse w.r.t. model 4 we exploit the linearity of the Fourier transform. That is, we are going to show that is Fourier sparse by showing that it is a sum of Fourier sparse set functions. In particular, there are only two types of summands (= set functions):
First, , , for , and , for , are modular set functions whose only nonzero Fourier coefficients are summed up in for and .
Second, , for , and , for , are weighted and negative weighted coverage functions, respectively. In order to see that is a weighted coverage function, observe that the codomain of is . Let denote the permutation that sorts , i.e., . Let denote the universe. We set for and for . Let . Let the set . Notice that , and, because of we have, for all ,
(26) 
where is the element in that satisfies for all . Now, observe that is equivalent to , for all . Thus, by definition of we have .
The same construction works for . Weighted coverage functions with are Fouriersparse with respect to the Wtransform Chakrabarty and Huang (2012) and Fouriersparse with respect to model 4 (one additional coefficient for ). The preference function is a sum of modular set functions, sparse weighted coverage functions that require at most additional Fourier coefficients (with ) each and sparse negative weighted coverage functions that require at most additional Fourier coefficients each. Therefore, has at most nonzero Fourier coefficients w.r.t. model 4. ∎
Remark 3.
The construction in the second part of the proof of Lemma 2 shows that preference functions with are dense w.r.t. the WHT basis, because there is an element in that is covered by all .
Appendix B SSFT: Support Discovery
In this section we prove the equations necessary for the support discovery mechanism of SSFT.
Let be a set function and let . As before we denote the restriction of to with
(27) 
Recall the problem we want to solve and our algorithms (Fig. 3) for doing so (under mild assumptions on the Fourier coefficients).
Problem 2 (Sparse Fourier transform).
Given oracle access to query a Fouriersparse set function , compute its Fourier support and associated Fourier coefficients.
Comments
There are no comments yet.