Symmetric positive definite matrices arise in many areas in a variety of guises: covariances, kernels, graph Laplacians, or otherwise. A basic computation with such matrices is evaluation of the bilinear form , where is a matrix function and , are given vectors. If , we speak of computing a bilinear inverse form (BIF) . For example, with ( canonical vector) is the diagonal entry of the inverse.
In this paper, we are interested in efficiently computing BIFs, primarily due to their importance in several machine learning contexts, e.g., evaluation of Gaussian density at a point, the Woodbury matrix inversion lemma, implementation of MCMC samplers for Determinantal Point Processes (Dpp), computation of graph centrality measures, and greedy submodular maximization (see Section 2).
When is large, it is preferable to compute iteratively rather than to first compute (using Cholesky) at a cost of operations. One could think of using conjugate gradients to solve approximately, and then obtain
. But several applications require precise bounds on numerical estimates to(e.g., in MCMC based Dpp samplers such bounds help decide whether to accept or reject a transition in each iteration–see Section 5.1), which necessitates a more finessed approach.
Gauss quadrature is one such approach. Originally proposed in  for approximating integrals, Gauss- and Gauss-type quadrature (i.e., Gauss-Lobatto  and Gauss-Radau  quadrature) have since found application to bilinear forms including computation of . Bai et al. also show that Gauss and (right) Gauss-Radau quadrature yield lower bounds, while Gauss-Lobatto and (left) Gauss-Radau yield upper bounds on the BIF .
However, despite its long history and voluminous existing work (see e.g., ), our understanding of Gauss-type quadrature for matrix problems is far from complete. For instance, it is not known whether the bounds on BIFs improve with more quadrature iterations; nor is it known how the bounds obtained from Gauss, Gauss-Radau and Gauss-Lobatto quadrature compare with each other. We do not even know how fast the iterates of Gauss-Radau or Gauss-Lobatto quadrature converge.
Contributions. We address all the aforementioned problems and make the following main contributions:
We show that the lower and upper bounds generated by Gauss-type quadrature monotonically approach the target value (Theorems 4 and 6; Corr. 7). Furthermore, we show that for the same number of iterations, Gauss-Radau quadrature yields bounds superior to those given by Gauss or Gauss-Lobatto, but somewhat surprisingly all three share the same convergence rate.
We demonstrate implications of our results for two tasks: (i) scalable Markov chain sampling from aDpp; and (ii) running a greedy algorithm for submodular optimization. In these applications, quadrature accelerates computations, and the bounds aid early stopping.
Indeed, on large-scale sparse problems our methods lead to even several orders of magnitude in speedup.
Many of these methods may be applied to computing BIFs. But they do not provide intervals bounding the target value, just approximations. Thus, a black-box use of these methods may change the execution of an algorithm whose decisions (e.g., whether to transit in a Markov Chain) rely on the BIF value to be within a specific interval. Such changes can break the correctness of the algorithm.
2 Motivating Applications
BIFs are important to numerous problems. We recount below several notable examples: in all cases, efficient computation of bounds on BIFs is key to making the algorithms practical.
Determinantal Point Processes. A determinantal point process (Dpp) is a distribution over subsets of a set (). In its L-ensemble form, a Dpp uses a positive semidefinite kernel , and to a set
assigns probabilitywhere is the submatrix of indexed by entries in . If we restrict to , we obtain a -Dpp. Dpp’s are widely used in machine learning, see e.g., the survey .
Exact sampling from a (-)Dpp requires eigendecomposition of , which is prohibitive. For large , Metropolis Hastings (MH) or Gibbs sampling are preferred and state-of-the-art. Therein the core task is to compute transition probabilities – an expression involving BIFs – which are compared with a random scalar threshold.
For MH [7, 1], the transition probabilities from a current subset (state) to are for ; and for . In a -Dpp, the moves are swaps with transition probabilities for replacing by (and ). We illustrate this application in greater detail in Section 5.1.
Submodular optimization, Sensing. Algorithms for maximizing submodular functions can equally benefit from efficient BIF bounds. Given a positive definite matrix , the set function is submodular: for all and , it holds that .
Finding the set that maximizes is a key task for MAP inference with Dpps , matrix approximations by column selection [11, 59] and sensing Krause et al. . For the latter, we model spatial phenomena (temperature, pollution) via Gaussian Processes and select locations to maximize the joint entropy of the observed variables, or the mutual information between observed and unobserved variables.
for and . The algorithms compare those gains to a random threshold, or find an item with the largest gain. In both cases, efficient BIF bounds offer speedups. They can be combined with lazy  and stochastic greedy algorithms .
Network Analysis, Centrality. When analyzing relationships and information flows between connected entities in a network, such as people, organizations, computers, smart hardwares, etc. [54, 40, 2, 17, 16, 9], an important question is to measure popularity, centrality, or importance of a node.
Several existing popularity measures can be expressed as the solution to a large-scale linear system. For example, PageRank  is the solution to , and Bonacich centrality  is the solution to , where is the adjacency matrix. When computing local estimates, i.e., only a few entries of , we obtain exactly the task of computing BIFs [61, 39]. Moreover, we may only need local estimates to an accuracy sufficient for determining which entry is larger, a setting where our quadrature based bounds on BIFs will be useful.
Scientific Computing. In computational physics BIFs are used for estimating selected entries of the inverse of a large sparse matrix. More generally, BIFs can help in estimating the trace of the inverse, a computational substep in lattice Quantum Chromodynamics [15, 22], some signal processing tasks , and in Gaussian Process (GP) Regression 
, e.g., for estimating variances. In numerical linear algebra, BIFs are used in rational approximations, evaluation of Green’s function , and selective inversion of sparse matrices [41, 42, 39]. A notable use is the design of preconditioners  and uncertainty quantification .
Benefiting from fast iterative bounds. Many of the above examples use BIFs to rank values, to identify the largest value or compare them to a scalar or to each other. In such cases, we first compute fast, crude lower and upper bounds on a BIF, refining iteratively, just as far as needed to determine the comparison. Figure 1 in Section 4.4 illustrates the evolution of these bounds, and Section 5 explains details.
3 Background on Gauss Quadrature
For convenience, we begin by recalling key aspects of Gauss quadrature,111The summary in this section is derived from various sources: [24, 4, 29]. Experts can skim this section for collecting our notation before moving onto Section 4, which contains our new results. as applied to computing , for an symmetric positive definite matrix that has simpleeigenvalues, arbitrary vectors , and a matrix function . For a more detailed account of the relevant background on Gauss-type quadratures please refer to Appendix A, or .
It suffices to consider thanks to the identity
Let be the eigendecomposition of where is orthonormal. Letting , we then have
Toward computing , a key conceptual step is to write the above sum as the Riemann-Stieltjes integral
where , , and is piecewise constant measure defined by
where denotes the th degree approximation and denotes the remainder term. In representation (3.2) the weights , , and quadrature nodes are unknown, while the values are prescribed and lie outside the interval of integration .
Different choices of these parameters yield different quadrature rules: gives Gauss quadrature ; with () gives left (right) Gauss-Radau quadrature ; with and yields Gauss-Lobatto quadrature ; while for general we obtain Gauss-Christoffel quadrature .
The weights , and nodes are chosen such that if is a polynomial of degree less than , then the interpolation is exact. For Gauss quadrature, we can recursively build the Jacobi matrix
and obtain from its spectrum the desired weights and nodes. Theorem 1 makes this more precise.
If has the eigendecomposition , then for Gauss quadrature Thm. 1 yields
Given and , our task is to compute and the Jacobi matrix . For BIFs, we have that , so (3.4) becomes , which can be computed recursively using the Lanczos algorithm . For Gauss-Radau and Gauss-Lobatto quadrature we can compute modified versions of Jacobi matrices (for left Gauss-Radau), (for right Gauss-Radau) and (for Gauss-Lobatto) based on . The corresponding nodes and weights, and thus the approximation of Gauss-Radau and Gauss-Lobatto quadratures, are then obtained from these modified Jacobi matrices, similar to Gauss quadrature. Aggregating all these computations yields an algorithm that iteratively obtains bounds on . The combined procedure, Gauss Quadrature Lanczos (GQL) , is summarily presented as Algorithm 1. The complete algorithm may be found in Appendix A.
It turns out that the bounds given by Gauss quadrature have a close relation to the approximation error of conjugate gradient (CG) applied to a suitable problem. Since we know the convergence rate of CG, we can obtain from it the following estimate on the relative error of Gauss quadrature.
Theorem 3 (Relative error Gauss quadrature).
The -th iterate of Gauss quadrature satisfies the relative error bound
where is the condition number of .
In other words, Thm. 3 shows that the iterates of Gauss quadrature have a linear (geometric) convergence rate.
4 Main Theoretical Results
In this section we summarize our main theoretical results. As before, detailed proofs may be found in Appendix B. The key questions that we answer are: (i) do the bounds on generated by GQL improve monotonically with each iteration; (ii) how tight are these bounds; and (iii) how fast do Gauss-Radau and Gauss-Lobatto iterations converge? Our answers not only fill gaps in the literature on quadrature, but provide a theoretical base for speeding up algorithms for some applications (see Sections 2 and 5).
4.1 Lower Bounds
Our first result shows that both Gauss and right Gauss-Radau quadratures give iteratively better lower bounds on . Moreover, with the same number of iterations, right Gauss-Radau yields tighter bounds.
Let . Then, yields better bounds than but worse bounds than ; more precisely,
Theorem 5 (Relative error right Gauss-Radau).
For each iteration , the right Gauss-Radau iterate satisfies
4.2 Upper Bounds
Our second result compares Gauss-Lobatto with left Gauss-Radau quadrature.
Let . Then, gives better upper bounds than but worse than ; more precisely,
This shows that bounds given by both Gauss-Lobatto and left Gauss-Radau become tighter with each iteration. For the same number of iterations, left Gauss-Radau provides a tighter bound than Gauss-Lobatto.
Combining the above two theorems, we obtain the following corollary for all four Gauss-type quadratures.
Corollary 7 (Monotonicity).
With increasing , and give increasingly better lower bounds and and give increasingly better upper bounds, that is,
4.3 Convergence rates
Our next two results state linear convergence rates for left Gauss-Radau quadrature and Gauss-Lobatto quadrature applied to computing the BIF .
Theorem 8 (Relative error left Gauss-Radau).
For each , the left Gauss-Radau iterate satisfies
Theorem 8 shows that the error again decreases linearly, and it also depends on teh accuracy of , our estimate of the smallest eigenvalue that determines the range of integration. Using the relations between left Gauss-Radau and Gauss-Lobatto, we readily obtain the following corollary.
Corollary 9 (Relative error Gauss-Lobatto).
For each , the Gauss-Lobatto iterate satisfies
Remarks All aforementioned results assumed that is strictly positive definite with simple eigenvalues. In Appendix C, we show similar results for the more general case that is only required to be symmetric, and lies in the space spanned by eigenvectors of corresponding to distinct positive eigenvalues.
4.4 Empirical Evidence
Next, we empirically verify our the theoretical results shown above. We generate a random symmetric matrix with density , where each entry is either zero or standard normal, and shift its diagonal entries to make its smallest eigenvalue , thus making positive definite. We set and . We randomly sample
from a standard normal distribution. Figure1 illustrates how the lower and upper bounds given by the four quadrature rules evolve with the number of iterations.
Figure 1 (b) and (c) show the sensitivity of the rules (except Gauss quadrature) to estimating the extremal eigenvalues. Specifically, we use and .
The plots in Figure 1 agree with the theoretical results. First, all quadrature rules are seen to yield iteratively tighter bounds. The bounds obtained by the Gauss-Radau quadrature are superior to those given by Gauss and Gauss-Lobatto quadrature (also numerically verified). Notably, the bounds given by all quadrature rules converge very fast – within 25 iterations they yield reasonably tight bounds.
It is valuable to see how the bounds are affected if we do not have good approximations to the extremal eigenvalues and . Since Gauss quadrature does not depend on the approximations and , its bounds remain the same in (a),(b),(c). Left Gauss-Radau depends on the quality of , and, with a poor approximation takes more iterations to converge (Figure 1(b)). Right Gauss-Radau depends on the quality of ; thus, if we use as our approximation, its bounds become worse (Figure 1(c)). However, its bounds are never worse than those obtained by Gauss quadrature.
Finally, Gauss-Lobatto depends on both and , so its bounds become worse whenever we lack good approximations to or . Nevertheless, its quality is lower-bounded by left Gauss-Radau as stated in Thm. 6.
5 Algorithmic Results and Applications
Our theoretical results show that Gauss-Radau quadrature provides good lower and upper bounds to BIFs. More importantly, these bounds get iteratively tighter at a linear rate, finally becoming exact (see Appendix B). However, in many applications motivating our work (see Section 2), we do not need exact values of BIFs; bounds that are tight enough suffice for the algorithms to proceed. As a result, all these applications benefit from our theoretical results that provide iteratively tighter bounds. This idea translates into a retrospective framework for accelerating methods whose progress relies on knowing an interval containing the BIF. Whenever the algorithm takes a step (transition) that depends on a BIF (e.g., as in the next section, a state transition in a sampler if the BIF exceeds a certain threshold), we compute rough bounds on its value. If the bounds suffice to take the critical decision (e.g., decide the comparison), then we stop the quadrature. If they do not suffice, we take one or more additional iterations of quadrature to tighten the bound. Algorithm 2 makes this idea explicit.
We illustrate our framework by accelerating: (i) Markov chain sampling for (-)Dpps; and (ii) maximization of a (specific) nonmonotone submodular function.
5.1 Retrospective Markov Chain (-)Dpp
First, we use our framework to accelerate iterative samplers for Determinantal Point Processes. Specifically, we discuss MH sampling ; the variant for Gibbs sampling follows analogously.
The key insight is that all state transitions of the Markov chain rely on a comparison between a scalar and a quantity involving the bilinear inverse form. Given the current set , assume we propose to add element to . The probability of transitioning to state is . To decide whether to accept this transition, we sample ; if then we accept the transition, otherwise we remain at . Hence, we need to compute just accurately enough to decide whether . To do so, we can use the aforementioned lower and upper bounds on .
Let and be lower and upper bounds for this BIF in the -th iteration of Gauss quadrature. If , then we can safely accept the transition, if , then we can safely reject the transition. Only if , we cannot make a decision yet, and therefore retrospectively perform one more iteration of Gauss quadrature to obtain tighter upper and lower bounds and . We continue until the bounds are sharp enough to safely decide whether to make the transition. Note that in each iteration we make the same decision as we would with the exact value of the BIF, and hence the resulting algorithm (Alg. 3) is an exact Markov chain for the Dpp. In each iteration, it calls Alg. 4, which uses step-wise lazy Gauss quadrature for deciding the comparison, while stopping as early as possible.
If we condition the Dpp on observing a set of a fixed cardinality , we obtain a -Dpp. The MH sampler for this process is similar, but a state transition corresponds to swapping two elements (adding and removing at the same time). Assume the current set is . If we propose to delete and add to , then the corresponding transition probability is
Again, we sample , but now we must compute two quantities, and hence two sets of lower and upper bounds: , for in the -th Gauss quadrature iteration, and , for in the -th Gauss quadrature iteration. Then if we have , we can safely accept the transition; and if we can safely reject the transition; otherwise, we tighten the bounds via additional Gauss-Radau iterations.
Refinements. We could perform one iteration for both and , but it may be that one set of bounds is already sufficiently tight, while the other is loose. A straightforward idea would be to judge the tightness of the lower and upper bounds by their difference (gap) , and decide accordingly which quadrature to iterate further.
But the bounds for and are not symmetric and contribute differently to the transition decision. In essence, we need to judge the relation between and , or, equivalently, the relation between and . Since the left hand side is “easy”, the essential part is the right hand side. Assuming that in practice the impact is larger when the gap is larger, we tighten the bounds for if , and otherwise tighen the bounds for . Details of the final algorithm with this refinement are shown in Appendix D.
5.2 Retrospective Double Greedy Algorithm
As indicated in Section 2, a number of applications, including sensing and information maximization with Gaussian Processes, rely on maximizing a submodular function given as . In general, this function may be non-monotone. In this case, an algorithm of choice is the double greedy algorithm of Buchbinder et al. .
The double greedy algorithm starts with two sets and and serially iterates through all elements to construct a near-optimal subset. At iteration , it includes element into with probability , and with probability it excludes from . The decisive value is determined by the marginal gains and :
For the log-det function, we obtain
where . In other words, at iteration the algorithm uniformly samples , and then checks if
and if true, adds to , otherwise removes it from .
This essential decision, whether to retain or discard an element, again involves bounding BIFs, for which we can take advantage of our framework, and profit from the typical sparsity of the data. Concretely, we retrospectively compute the lower and upper bounds on these BIFs, i.e., lower and upper bounds and on , and and on . If we safely add to ; if we safely remove from ; otherwise we compute a set of tighter bounds by further iterating the quadrature.
As before, the bounds for and may not contribute equally to the transition decision. We can again apply the refinement mentioned in Section 5.1: if we tighten bounds for , otherwise we tighten bounds for . The resulting algorithm is shown in Appendix E.
Data. For all datasets we add an 1E-3 times identity matrix to ensure positive definiteness.
5.3 Empirical Evidence
We perform experiments on both synthetic and real-world datasets to test the impact of our retrospective quadrature framework in applications. We focus on (-)Dpp sampling and the double greedy algorithm for the log-det objective.
5.3.1 Synthetic Datasets
We generate small sparse matrices using methods similar to Section 4.4. For (-)Dpp we generate matrices while for double greedy we use . We vary the density of the matrices from to . The running time and speedup are shown in Figure 2.
The results suggest that our framework greatly accelerates both Dpp sampling and submodular maximization. The speedups are particularly pronounced for sparse matrices. As the matrices become very sparse, the original algorithms profit from sparsity too, and the difference shrinks a little. Overall, we see that our framework has the potential to lead to substantial speedups for algorithms involving bilinear inverse forms.
5.3.2 Real Datasets
We further test our framework on real-world datasets of varying sizes. We selected 6 datasets, four of them are of small/medium size and two are large. The four small/medium-sized datasets are used in . The first two of small/medium-sized datasets, Abalone and Wine222Available at http://archive.ics.uci.edu/ml/., are popular datasets for regression, and we construct sparse kernel matrices with an RBF kernel. We set the bandwidth parameter for Abalone as and that for Wine as and the cut-off parameter as for both datasets, as in . The other two small/medium-sized datasets are GR (arXiv High Energy Physics collaboration graph) and HEP (arXiv General Relativity collaboration graph), where the kernel matrices are Laplacian matrices. The final two large datasets datasets are Epinions (Who-trusts-whom network of Epinions) and Slashdot (Slashdot social network from Feb. 2009) 333Available at https://snap.stanford.edu/data/. with large Laplacian matrices. Dataset statistics are shown in Tab. 1.
The running times in Tab. 2 suggest that the iterative bounds from quadrature significantly accelerate (-)Dpp sampling and double greedy on real data. Our algorithms lead to speedups of up to a thousand times.
On the large sparse matrices, the “standard” double greedy algorithm did not finish within 24 hours, due to the expensive matrix operations involved. With our framework, the algorithm needs only 15 minutes.
To our knowledge, these results are the first time to run Dpp and double greedy for information gain on such large datasets.
5.4 Numerical details
As seen in Alg. 1, the quadrature algorithm is built upon Lanczos iterations. Although in theory Lanczos iterations construct a set of orthogonal Lanczos vectors, in practice the constructed vectors usually lose orthogonality after some iterations due to rounding errors. One way to deal with this problem is to reorthogonalize the vectors, either completely at each iteration or selectively [parlett1979lanczos]. Also, an equivalent Lanczos iteration proposed in [paige1971computation] which uses a different expression to improve local orthogonality. Further discussion on numerical stability of the method lies beyond the scope of this paper.
For Gauss quadrature on , the convergence rate of bounds is dependent on the condition number of . We can use preconditioning techniques to get a well-conditioned submatrix and proceed with that. Concretely, observe that for non-singular ,
Thus, if is well-conditioned, we can use it with the vector in Gauss quadrature.
There exists various ways to obtain good preconditioners for an SPD matrix. A simple choice is to use . There also exists methods for efficiently constructing sparse inverse matrix [benzi1996sparse]. If happens to be an SDD matrix, we can use techniques introduced in [cheng2014scalable] to construct an approximate sparse inverse in near linear time.
In this paper we present a general and powerful computational framework for algorithms that rely on computations of bilinear inverse forms. The framework uses Gauss quadrature methods to lazily and iteratively tighten bounds, and is supported by our new theoretical results. We analyze properties of the various types of Gauss quadratures for approximating the bilinear inverse forms and show that all bounds are monotonically becoming tighter with the number of iterations; those given by Gauss-Radau are superior to those obtained from other Gauss-type quadratures; and both lower and upper bounds enjoy a linear convergence rate. We empirically verify the efficiency of our framework and are able to obtain speedups of up to a thousand times for two popular examples: maximizing information gain and sampling from determinantal point processes.
This research was partially supported by NSF CAREER award 1553284 and a Google Research Award.
- Anari et al.  Anari, Nima, Gharan, Shayan Oveis, and Rezaei, Alireza. Monte Carlo Markov chain algorithms for sampling strongly Rayleigh distributions and determinantal point processes. In COLT, 2016.
- Atzori et al.  Atzori, Luigi, Iera, Antonio, and Morabito, Giacomo. The Internet of Things: A survey. Computer networks, 54(15):2787–2805, 2010.
- Bai & Golub  Bai, Zhaojun and Golub, Gene H. Bounds for the trace of the inverse and the determinant of symmetric positive definite matrices. Annals of Numerical Mathematics, pp. 29–38, 1996.
- Bai et al.  Bai, Zhaojun, Fahey, Gark, and Golub, Gene H. Some large-scale matrix computation problems. Journal of Computational and Applied Mathematics, pp. 71–89, 1996.
- Bekas et al.  Bekas, Constantine, Kokiopoulou, Effrosyni, and Saad, Yousef. An estimator for the diagonal of a matrix. Applied numerical mathematics, pp. 1214–1229, 2007.
- Bekas et al.  Bekas, Constantine, Curioni, Alessandro, and Fedulova, Irina. Low cost high performance uncertainty quantification. In Proceedings of the 2nd Workshop on High Performance Computational Finance, 2009.
- Belabbas & Wolfe  Belabbas, Mohamed-Ali and Wolfe, Patrick J. Spectral methods in machine learning and new strategies for very large datasets. Proceedings of the National Academy of Sciences, pp. 369–374, 2009.
- Benzi & Golub  Benzi, Michele and Golub, Gene H. Bounds for the entries of matrix functions with applications to preconditioning. BIT Numerical Mathematics, pp. 417–438, 1999.
- Benzi & Klymko  Benzi, Michele and Klymko, Christine. Total communicability as a centrality measure. J. Complex Networks, pp. 124–149, 2013.
- Bonacich  Bonacich, Phillip. Power and centrality: A family of measures. American Journal of Sociology, pp. 1170–1182, 1987.
- Boutsidis et al.  Boutsidis, Christos, Mahoney, Michael W., and Drineas, Petros. An improved approximation algorithm for the column subset selection problem. In SODA, pp. 968–977, 2009.
- Brezinski  Brezinski, Claude. Error estimates for the solution of linear systems. SIAM Journal on Scientific Computing, pp. 764–781, 1999.
- Brezinski et al.  Brezinski, Claude, Fika, Paraskevi, and Mitrouli, Marilena. Estimations of the trace of powers of positive self-adjoint operators by extrapolation of the moments. Electronic Transactions on Numerical Analysis, pp. 144–155, 2012.
- Buchbinder et al.  Buchbinder, Niv, Feldman, Moran, Naor, Joseph, and Schwartz, Roy. A tight linear time (1/2)-approximation for unconstrained submodular maximization. In FOCS, 2012.
- Dong & Liu  Dong, Shao-Jing and Liu, Keh-Fei. Stochastic estimation with noise. Physics Letters B, pp. 130–136, 1994.
- Estrada & Higham  Estrada, Ernesto and Higham, Desmond J. Network properties revealed through matrix functions. SIAM Review, pp. 696–714, 2010.
- Fenu et al.  Fenu, Caterina, Martin, David R., Reichel, Lothar, and Rodriguez, Giuseppe. Network analysis via partial spectral factorization and Gauss quadrature. SIAM Journal on Scientific Computing, pp. A2046–A2068, 2013.
- Fika & Koukouvinos  Fika, Paraskevi and Koukouvinos, Christos. Stochastic estimates for the trace of functions of matrices via Hadamard matrices. Communications in Statistics-Simulation and Computation, 2015.
- Fika & Mitrouli  Fika, Paraskevi and Mitrouli, Marilena. Estimation of the bilinear form for Hermitian matrices. Linear Algebra and its Applications, 2015.
- Fika et al.  Fika, Paraskevi, Mitrouli, Marilena, and Roupa, Paraskevi. Estimates for the bilinear form with applications to linear algebra problems. Electronic Transactions on Numerical Analysis, pp. 70–89, 2014.
- Freericks  Freericks, James K. Transport in multilayered nanostructures. The Dynamical Mean-Field Theory Approach, Imperial College, London, 2006.
- Frommer et al.  Frommer, Andreas, Lippert, Thomas, Medeke, Björn, and Schilling, Klaus. Numerical Challenges in Lattice Quantum Chromodynamics: Joint Interdisciplinary Workshop of John Von Neumann Institute for Computing, Jülich, and Institute of Applied Computer Science, Wuppertal University, August 1999, volume 15. Springer Science & Business Media, 2012.
- Gauss  Gauss, Carl F. Methodus nova integralium valores per approximationem inveniendi. apvd Henricvm Dieterich, 1815.
- Gautschi  Gautschi, Walter. A survey of Gauss-Christoffel quadrature formulae. In EB Christoffel, pp. 72–147. Springer, 1981.
- Gillenwater et al.  Gillenwater, Jennifer, Kulesza, Alex, and Taskar, Ben. Near-optimal MAP inference for determinantal point processes. In NIPS, 2012.
- Gittens & Mahoney  Gittens, Alex and Mahoney, Michael W. Revisiting the Nyström method for improved large-scale machine learning. ICML, 2013.
- Golub  Golub, Gene H. Some modified matrix eigenvalue problems. SIAM Review, pp. 318–334, 1973.
- Golub & Meurant  Golub, Gene H. and Meurant, Gérard. Matrices, moments and quadrature II; how to compute the norm of the error in iterative methods. BIT Numerical Mathematics, pp. 687–705, 1997.
- Golub & Meurant  Golub, Gene H. and Meurant, Gérard. Matrices, moments and quadrature with applications. Princeton University Press, 2009.
- Golub & Welsch  Golub, Gene H. and Welsch, John H. Calculation of Gauss quadrature rules. Mathematics of computation, pp. 221–230, 1969.
- Golub et al.  Golub, Gene H., Stoll, Martin, and Wathen, Andy. Approximation of the scattering amplitude and linear systems. Elec. Tran. on Numerical Analysis, pp. 178–203, 2008.
- Hestenes & Stiefel  Hestenes, Magnus R. and Stiefel, Eduard. Methods of conjugate gradients for solving linear systems. J. Research of the National Bureau of Standards, pp. 409–436, 1952.
- Hough et al.  Hough, J. Ben, Krishnapur, Manjunath, Peres, Yuval, and Virág, Bálint. Determinantal processes and independence. Probability Surveys, 2006.
- Kang  Kang, Byungkon. Fast determinantal point process sampling with application to clustering. In NIPS, pp. 2319–2327, 2013.
- Krause et al.  Krause, Andreas, Singh, Ajit, and Guestrin, Carlos. Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies. JMLR, pp. 235–284, 2008.
- Kulesza & Taskar  Kulesza, Alex and Taskar, Ben. Determinantal point processes for machine learning. arXiv:1207.6083, 2012.
- Kwok & Adams  Kwok, James T. and Adams, Ryan P. Priors for diversity in generative latent variable models. In NIPS, pp. 2996–3004, 2012.
- Lanczos  Lanczos, Cornelius. An iteration method for the solution of the eigenvalue problem of linear differential and integral operators. United States Governm. Press Office Los Angeles, CA, 1950.
- Lee et al.  Lee, Christina E., Ozdaglar, Asuman E., and Shah, Devavrat. Solving systems of linear equations: Locally and asynchronously. arXiv, abs/1411.2647, 2014.
- Leskovec et al.  Leskovec, Jure, Lang, Kevin J., Dasgupta, Anirban, and Mahoney, Michael W. Statistical properties of community structure in large social and information networks. In WWW, pp. 695–704, 2008.
- Lin et al. [2011a] Lin, Lin, Yang, Chao, Lu, Jianfeng, and Ying, Lexing. A fast parallel algorithm for selected inversion of structured sparse matrices with application to 2D electronic structure calculations. SIAM Journal on Scientific Computing, pp. 1329–1351, 2011a.
- Lin et al. [2011b] Lin, Lin, Yang, Chao, Meza, Juan C., Lu, Jianfeng, Ying, Lexing, and E, Weinan. Selinv–an algorithm for selected inversion of a sparse symmetric matrix. ACM Transactions on Mathematical Software, 2011b.
- Lobatto  Lobatto, Rehuel. Lessen over de differentiaal-en integraal-rekening: Dl. 2 Integraal-rekening, volume 1. Van Cleef, 1852.
- Meurant  Meurant, Gérard. The computation of bounds for the norm of the error in the conjugate gradient algorithm. Numerical Algorithms, pp. 77–87, 1997.
- Meurant  Meurant, Gérard. Numerical experiments in computing bounds for the norm of the error in the preconditioned conjugate gradient algorithm. Numerical Algorithms, pp. 353–365, 1999.
- Meurant  Meurant, Gérard. The Lanczos and conjugate gradient algorithms: from theory to finite precision computations, volume 19. SIAM, 2006.
- Minoux  Minoux, Michel. Accelerated greedy algorithms for maximizing submodular set functions. In Optimization Techniques, pp. 234–243. Springer, 1978.
- Mirzasoleiman et al.  Mirzasoleiman, Baharan, Badanidiyuru, Ashwinkumar, Karbasi, Amin, Vondrák, Jan, and Krause, Andreas. Lazier than lazy greedy. In AAAI, 2015.
- Nemhauser et al.  Nemhauser, George L.., Wolsey, Laurence A., and Fisher, Marshall L. An analysis of approximations for maximizing submodular set functions–I. Mathematical Programming, pp. 265–294, 1978.
- Page et al.  Page, Lawrence, Brin, Sergey, Motwani, Rajeev, and Winograd, Terry. The PageRank citation ranking: bringing order to the web. Stanford InfoLab, 1999.
- Radau  Radau, Rodolphe. Étude sur les formules d’approximation qui servent à calculer la valeur numérique d’une intégrale définie. J. de Mathématiques Pures et Appliquées, pp. 283–336, 1880.
- Rasmussen & Williams  Rasmussen, Carl E. and Williams, Christopher K. I. Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006.
- Rocková & George  Rocková, Veronika and George, Edward I. Determinantal priors for variable selection, 2015.
- Scott  Scott, John. Social network analysis. Sage, 2012.
- Sherman & Morrison  Sherman, Jack and Morrison, Winifred J. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics, pp. 124–127, 1950.
- Shewchuk  Shewchuk, Jonathan R. An introduction to the conjugate gradient method without the agonizing pain, 1994.
- Sidje & Saad  Sidje, Roger B. and Saad, Yousef. Rational approximation to the fermi–dirac function with applications in density functional theory. Numerical Algorithms, pp. 455–479, 2011.
- Stoer & Bulirsch  Stoer, Josef and Bulirsch, Roland. Introduction to numerical analysis, volume 12. Springer Science & Business Media, 2013.
- Sviridenko et al.  Sviridenko, Maxim, Vondrák, Jan, and Ward, Justin. Optimal approximation for submodular and supermodular optimization with bounded curvature. In SODA, 2015.
- Tang & Saad  Tang, Jok M. and Saad, Yousef. A probing method for computing the diagonal of a matrix inverse. Numerical Linear Algebra with Applications, pp. 485–501, 2012.
- Wasow  Wasow, Wolfgang R. A note on the inversion of matrices by random walks. Mathematical Tables and Other Aids to Computation, pp. 78–81, 1952.
- Wilf  Wilf, Herbert S. Mathematics for the Physical Sciences. Wiley, New York, 1962.
Appendix A Further Background on Gauss Quadrature
We present below a more detailed summary of material on Gauss quadrature to make the paper self-contained.
a.1 Selecting weights and nodes
We’ve described that the Riemann-Stieltjes integral could be expressed as
where denotes the th degree approximation and denotes a remainder term. The weights , and nodes are chosen such that for all polynomials of degree less than , denoted , we have exact interpolation . One way to compute weights and nodes is to set for and then use this exact nonlinear system. But there is an easier way to obtain weights and nodes, namely by using polynomials orthogonal with respect to the measure . Specifically, we construct a sequence of orthogonal polynomials such that is a polynomial in of degree exactly , and , are orthogonal, i.e., they satisfy
The roots of are distinct, real and lie in the interval of , and form the nodes for Gauss quadrature (see, e.g., [29, Ch. 6]).
Consider the two monic polynomials whose roots serve as quadrature nodes:
where for consistency. We further denote , where the sign is taken to ensure on . Then, for , we calculate the quadrature weights as
where denotes the derivative of with respect to . When the quadrature degenerates to Gauss quadrature and we have
Although we have specified how to select nodes and weights for quadrature, these ideas cannot be applied to our problem because the measure is unknown. Indeed, calculating the measure explicitly would require knowing the entire spectrum of , which is as good as explicitly computing , hence untenable for us. The next section shows how to circumvent the difficulties due to unknown .
a.2 Gauss Quadrature Lanczos (GQL)
The key idea to circumvent our lack of knowledge of is to recursively construct polynomials called Lanczos polynomials. The construction ensures their orthogonality with respect to . Concretely, we construct Lanczos polynomials via the following three-term recurrence:
while ensuring . We can express (A.1) in matrix form by writing
where , is th canonical unit vector, and is the tridiagonal matrix
This matrix is known as the Jacobi matrix, and is closed related to Gauss quadrature. The following well-known theorem makes this relation precise.
The eigenvalues of form the nodes of Gauss-type quadratures. The weights are given by the squares of the first elements of the normalized eigenvectors of .
Thus, if has the eigendecomposition , then for Gauss quadrature Thm. 10 yields
We now specialize to our main focus, , for which we prove more precise results. In this case, (A.3) becomes . The task now is to compute , and given , to obtain the Jacobi matrix .
Fortunately, we can efficiently calculate iteratively using the Lanczos Algorithm . Suppose we have an estimate , in iteration of Lanczos, we compute the tridiagonal coefficients and and add them to this estimate to form . As to , assuming we have already computed , letting and invoking the Sherman-Morrison identity  we obtain the recursion:
where and can be recursively computed using a Cholesky-like factorization of [29, p.31].
For Gauss-Radau quadrature, we need to modify so that it has a prescribed eigenvalue. More precisely, we extend to for left Gauss-Radau (