Graph Sketching Against Adaptive Adversaries Applied to the Minimum Degree Algorithm

04/11/2018 ∙ by Matthew Fahrbach, et al. ∙ Georgia Institute of Technology Carnegie Mellon University 0

Motivated by the study of matrix elimination orderings in combinatorial scientific computing, we utilize graph sketching and local sampling to give a data structure that provides access to approximate fill degrees of a matrix undergoing elimination in O(polylog(n)) time per elimination and query. We then study the problem of using this data structure in the minimum degree algorithm, which is a widely-used heuristic for producing elimination orderings for sparse matrices by repeatedly eliminating the vertex with (approximate) minimum fill degree. This leads to a nearly-linear time algorithm for generating approximate greedy minimum degree orderings. Despite extensive studies of algorithms for elimination orderings in combinatorial scientific computing, our result is the first rigorous incorporation of randomized tools in this setting, as well as the first nearly-linear time algorithm for producing elimination orderings with provable approximation guarantees. While our sketching data structure readily works in the oblivious adversary model, by repeatedly querying and greedily updating itself, it enters the adaptive adversarial model where the underlying sketches become prone to failure due to dependency issues with their internal randomness. We show how to use an additional sampling procedure to circumvent this problem and to create an independent access sequence. Our technique for decorrelating the interleaved queries and updates to this randomized data structure may be of independent interest.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Randomization has played an increasingly fundamental role in the design of modern data structures. The current best algorithms for fully-dynamic graph connectivity [KKM13, NSW17, NS17, Wul17], shortest paths [HKN14, ACK17], graph spanners [BKS12], maximal matchings [BGS15, Sol16], and the dimensionality-reductions of large matrices [Woo14, CMP16, KLM17, KPPS17] all critically rely on randomization. An increasing majority of these data structures operate under the oblivious adversary model, which assumes that updates are generated independently of the internal randomness used in the data structure. In contrast, many applications of data structures are adaptive—meaning that subsequent updates may depend on the output of previous queries. A classical example of this paradigm is the combination of greedy algorithms with data structures, including Dijkstra’s algorithm for computing shortest paths and Kruskal’s algorithm for finding minimum spanning trees. The limitations imposed by adaptive adversaries are beginning to receive attention in the dynamic connectivity [NS17, NSW17] and spanner [BK16] literature, but even for these problems there remains a substantial gap between algorithms that work in the adaptive adversary model and those that work only against oblivious adversaries [BKS12, KKM13].

Motivated by a practically important example of adaptive invocations to data structures for greedy algorithms, we study the minimum degree algorithm for sparse matrix factorization and linear system solving [DRSL16]. This heuristic for precomputing an efficient pivot ordering is ubiquitous in numerical linear algebra libraries that handle large sparse matrices [Mat17], and relies on a graph-theoretic interpretation of Gaussian elimination. In particular, the variables and nonzeros in a linear system correspond to vertices and edges in a graph, respectively. When the variable associated with vertex is eliminated, a clique is induced on the neighborhood of , and then is deleted from the graph. This heuristic repeatedly eliminates the vertex of minimum degree in this graph, which corresponds to the variable with the fewest nonzeros in its row and column.

Computing elimination orderings that minimize the number of additional nonzeros, known as fill, has been shown to be computationally hard [Yan81, NSS00], even in parameterized settings [KST99, FV13, WAPL14, BCK16, CS17]. However, the practical performance of direct methods has greatly benefited from more efficient algorithms for analyzing elimination orderings [ADD04, DGLN04]. Tools such as elimination trees [Liu90, GNP94] can implicitly represent fill in time that is nearly-linear in the number of original nonzeros, which allows for efficient prediction and reorganization of future computation and, more importantly, memory bandwidth. In contrast to the abundance of algorithms built on examining elimination orderings via implicit representation [HP07, NS12], surprisingly little attention has been given to producing elimination orderings implicitly. In the survey by Heggernes et al. [HEKP01], the authors give an algorithm for computing a minimum degree ordering, which is more than the cost of Gaussian elimination itself and significantly more than the nearly-linear time algorithms for analyzing such orderings [GNP94].

Main Results.

We begin our study by combining implicit representations of fill with graph sketching. The nonzero entries of a partially eliminated matrix can be represented as the set of vertices reachable within two hops in a graph that undergoes edge contractions [GNP94]. This allows us to incorporate -sketches [Coh97]

, which were originally developed to estimate the cardinality of reachable sets of vertices in directed graphs. By augmenting

-sketches with suitable data structures, we obtain the following result for dynamically maintaining fill structure.

Theorem 1.1.

Against an oblivious adversary, we can maintain -approximations to the degrees of the graph representation of a matrix undergoing elimination in per operation.

We also give an exact version of this data structure for cases where the minimum degree is always small (e.g., empirical performance of Gaussian elimination on grid graphs [BMMR97]). Ignoring issues of potential dependent randomness, the approximation guarantees of this data structure provide us with an ordering that we call an approximate greedy minimum degree ordering, where at each step a vertex whose degree is close to the minimum is pivoted. It is unclear if such an ordering approximates a true minimum degree ordering, but such guarantees are more quantifiable than previous heuristics for approximating minimum degree orderings [ADD96, HEKP01].

However, using this randomized data structure in a greedy manner exposes the severe limitations of data structures that only work in the oblivious adversary model. The updates (i.e. the vertices we eliminate) depend on the output to previous minimum-degree queries, and hence its own internal randomness. The main result in this paper is an algorithm that uses dynamic sketching, as well as an additional routine for estimating degrees via local sampling, to generate an approximate greedy minimum degree sequence in nearly-linear time against adaptive adversaries.

Theorem 1.2.

Given an matrix with nonzero graph structure containing nonzeros, we can produce a -approximate greedy minimum degree ordering in time.

Techniques.

Several components of our algorithm are highly tailored to the minimum degree algorithm. For example, our dynamic sketches and local degree estimation routine depend on the implicit representation of intermediate states of Gaussian elimination [GNP94]. That said, our underlying randomized techniques (e.g., -sketches [Coh97] and wedge sampling [KP17, ELRS17]) are new additions to combinatorial scientific computing.

The primary focus of this paper is modifying the guarantees in the oblivious adversary model from Theorem 1.1 to work within a greedy loop (i.e. an adaptive adversary) to give Theorem 1.2. However, we do not accomplish this by making the queries deterministic or worst-case as in [BK16, NS17, NSW17]

. Instead, we use an external randomized routine for estimating fill degrees to create a fixed sequence of updates. The randomness within the sketching data structure then becomes independent to the update sequence, but its internal state is still highly useful for determining which vertices could have approximate minimum degree. We then efficiently construct the update sequence using recent developments for randomized graph algorithms that use exponential random variables 

[MPX13, MPVX15]. Our use of sketching can also be viewed as a pseudodeterminstic algorithm whose goal is to efficiently recover a particular sequence of vertices [GG11, GGR13]. We believe that both of these views are valuable to the study of randomness and for better understanding the relationship between oblivious and adaptive adversaries.

Organization.

In Section 2 we formalize the implicit representation of fill and variants of minimum degree orderings. In Section 3 we give an overview of our results, along with a brief description of the algorithms and techniques we employ. The use of sketching and sampling to obtain our exact and approximate algorithms are given in Section 4 and Section 5, respectively. We also detail our derandomization routine in Section 5, which is crucial for using our randomized data structure against an adaptive adversary. In Section 6 we demonstrate how to estimate fill degrees via local sampling, and in Section 7 we show how to maintain sketches as vertices are pivoted. Lastly, in Section 8 we discuss hardness results for computing the minimum degree of a vertex in a partially eliminated system and also for producing a minimum degree ordering.

2 Preliminaries

We assume that function arguments are pointers to objects instead of the objects themselves, and thus passing an object of size does not cost time and space. This is essentially the “pass by reference” construct in high-level programming languages.

2.1 Gaussian Elimination and Fill Graphs

Gaussian elimination is the process of repeatedly eliminating variables from a system of linear equations, while maintaining an equivalent system on the remaining variables. Algebraically, this involves taking an equation involving a target variable and subtracting (a scaled version of) this equation from all others involving the target variable. We assume throughout the paper that the systems are symmetric positive definite (SPD) and thus the diagonal will remain positive, allowing for any pivot order. This further implies that we can apply elimination operations to columns in order to isolate the target variable, resulting in the Schur complement.

A particularly interesting fact about Gaussian elimination is that the numerical Schur complement is unique irrespective of the pivoting order. Under the now standard assumption that nonzero elements do not cancel each other out [GL89], this commutative property also holds for the combinatorial nonzero structure. By interpreting the nonzero structure of a symmetric matrix  as an adjacency matrix for a graph , we can define the change to the nonzero structure of as a graph-theoretic operation on analogous to the Schur complement.

Our notation extends that of Gilbert, Ng, and Peyton [GNP94], who worked with known elimination orderings and treated the entire fill pattern (i.e. additional nonzeros entries) statically. Because we work with partially eliminated states, we will need to distinguish between the eliminated and remaining vertices in . We implicitly address this by letting and denote eliminated vertices and by letting , , and denote remaining vertices. The following definition of a fill graph allows us to determine the nonzero structure on the remaining variables of a partially eliminated system.

Definition 2.1.

The fill graph is a graph on the remaining vertices such that the edge if and are connected by a (possibly empty) path of eliminated vertices.

This characterization of fill means that we can readily compute the fill degree of a vertex , denoted by , in a partially eliminated state without explicitly constructing the matrix. We can also iteratively form from the original graph by repeatedly removing an eliminated vertex along with its incident edges, and then adding edges between all of the neighbors of  to form a clique. This operation gives the nonzero structure of the Schur complement.

Lemma 2.2.

For any graph and vertex , given an elimination ordering we can compute at the step when is eliminated in time.

Proof.

Mark all the vertices appearing in before as eliminated, and mark the rest as remaining. Run a breadth-first search from that terminates at remaining vertices (not including ). Let be the set of vertices where the search terminated. By the definition of we have . ∎

This kind of path finding among eliminated vertices adds an additional layer of complexity to our data structures. To overcome this, we contract eliminated vertices into their connected components (with respect to their induced subgraph in ), which leads to the component graph.

Definition 2.3.

We use to denote the component graph. The set of vertices in is formed by contracting edges between eliminated vertices, and the set of vertices that have not been eliminated is . The set of edges is implicitly given by the contractions.

Note that is quasi-bipartite, as the contraction rule implies there are no edges between vertices in . It will be useful to refer to two different kinds of neighborhoods in a component graph. For any vertex in , let be the set of neighbors of are in , and let denote the neighbors of that are in . Analogously, we use the notation and .

(a)

(b)

(c)
Figure 1: The (a) original graph, (b) fill graph, and (c) component graph after pivoting and .

For example, let us consider Figure 1. The original graph has seven vertices , and the algorithm decides to pivot and marked in red. Eliminating these vertices induces a clique on in the fill graph because each pair of vertices is connected by a path through the eliminated vertices and . Our algorithms implicitly maintain the fill graph by maintaining the component graph, where and merge to form the connected component  with edges incident to all remaining neighbors of and in the original graph. Note that an upper bound for the number of edges in a component graph is the number of edges in the original graph. We repeatedly exploit this property when proving the time and space bounds of our algorithms.

2.2 Minimum Degree Orderings

The minimum degree algorithm is a greedy heuristic for reducing the cost of solving sparse linear systems that repeatedly eliminates the variable involved in the fewest number of equations [GL89]. Although there are many situations where this is suboptimal, it is remarkably effective and widely used in practice. For example, the approximate minimum degree algorithm (AMD) [ADD96] is a heuristic for generating minimum degree orderings that plays an integral role in the sparse linear algebra packages in MATLAB [Mat17], Mathematica [Wol18], and Julia [BKSE12].

For any elimination ordering ), we let be the graph with vertices marked as eliminated and marked as remaining. We denote the corresponding sequence of fill graphs by , where and is the empty graph. Throughout the paper, we frequently use the notation when iterating over sets.

Definition 2.4.

A minimum degree ordering is an elimination ordering such that for all , the vertex has minimum fill degree in . Concretely, this means that

The data structures we use for finding the vertices with minimum fill degree are randomized, so we need to be careful to not introduce dependencies between different steps of the algorithm when several vertices are of minimum degree. To avoid this problem, we simply require that the lexicographically-least vertex be eliminated in the event of a tie.

Our notion for approximating a minimum degree ordering is based on finding a vertex at each step whose degree is close to the minimum in . Note that this is the goal of the AMD algorithm.

Definition 2.5.

A -approximate greedy minimum degree ordering is an elimination ordering such that at each step , we have

This decision process has no lookahead, and thus does not in any way approximate the minimum possible total fill incurred during Gaussian elimination, which is known to be NP-complete [Yan81].

2.3 Related Works

Gaussian Elimination and Fill.

The study of pivoting orderings is a fundamental question in combinatorial scientific computing. Work by George [Geo73] led to the study of nested dissection algorithms, which utilize separators to give provably smaller fill bounds for planar [RTL76, LRT79] and separable graphs [GT87, AY10]. A side effect of this work is the better characterization of fill via component graphs [Liu85], which is used to compute the total fill-in of a specific elimination ordering [GNP94]. This characterization is also used to construct elimination trees, which are ubiquitous in scientific computing to preallocate memory and optimize cache behaviors [Liu90].

Finding Low-Fill Orderings.

The goal of an elimination ordering is to minimize the total fill. Unfortunately, this problem is NP-complete [Yan81, BS90]. Algorithms that approximate the minimum fill-in within polynomial factors have been studied [NSS00], as well as algorithms [KST99, FV13] and hardness results [WAPL14, BCK16, CS17] for parameterized variants. Partially due to the high overhead of the previous algorithms, the minimum degree heuristic remains as one of the most widely-used methods for generating low-fill orderings [GL89].

Somewhat surprisingly, we were not able to find prior works that compute the minimum degree ordering in time faster than or works that utilize the implicit representation of fill provided by elimination trees.111 We use speculative language here due to the vastness of the literature on variants of minimum degree algorithms. On the other hand, there are various heuristics for finding minimum degree-like orderings, including multiple minimum degree (MMD) [Liu85] and the approximate minimum degree algorithm (AMD) [ADD96]. While both of these methods run extremely well in practice, they have theoretically tight performances of for MMD and for AMD [HEKP01]. Furthermore, AMD is not always guaranteed to produce a vertex of approximate minimum degree.

Removing Dependencies in Randomized Algorithms.

Our size estimators are dynamic—the choice of the pivot, which directly affects the subsequent fill graph, is a result of the randomness used to generate the pivot in the previous step—and prone to dependency problems. Independence between the access sequence and internal randomness is a common requirement in recent works on data structures for maintaining spanning trees, spanners, and matchings [BGS15, KKM13, Sol16]. Often these algorithms only have guarantees in the oblivious adversary model, which states that the adversary can choose the graph and the sequence of updates, but it cannot choose updates adaptively in response to the randomly-guided choices of the algorithm.

Recent works in randomized dimensionality-reduction have approached this issue of dependency by injecting additional randomness to preserve independence [LS15]. Quantifying the amount of randomness that is “lost” over the course of an algorithm has recently been characterized using mutual information [KNP17], but their results do not allow for us to consider adversarial vertex pivots. Our analysis also has tenuous connections to recent works utilizing matrix martingales to analyze repeated introductions of randomness into graph sparsification algorithms [KS16, KPPS17].

3 Overview

We discuss the main components of our algorithms in three parts. In Section 3.1 we explore how dynamic graph sketching can be used to construct a randomized data structure that maintains approximate degrees under vertex eliminations. In Section 3.2 we demonstrate how data structures that work against oblivious adversaries can fail against adaptive adversaries. We also describe our approach to circumvent this problem for approximate minimum degree sequences. In Section 3.3 we discuss a local degree estimation routine (the new key primitive) in the context of estimating the number of nonzero columns of a matrix via sampling. Finally, in Section 3.4 we explain the implications of our results to the study of algorithms for computing elimination orderings.

3.1 Dynamically Sketching Fill Graphs

The core problem of estimating fill degrees can be viewed as estimating the cardinality of sets undergoing unions and deletion of elements. Cardinality estimation algorithms in the streaming algorithm literature often trade accuracy for space [FM85, CM05], but our degree-approximation data structures use sketching to trade space for accuracy and more efficient update operations.

We first explain the connection between computing fill degrees and estimating the size of reachable sets. Assume for simplicity that no edges exist between the remaining vertices in the component graph . Split each remaining vertex into two vertices and , and replace every edge to a component vertex by the directed edges and . The fill degree of  is the number of remaining vertices reachable from (not including ). Cohen [Coh97] developed a nearly-linear time size-estimation framework for reachability problems using sketching and -estimators. Adapting this framework to our setting for fill graphs leads to the following kind of -sketch data structure. We refer to the set as the -neighborhood of , and we call its cardinality the -degree of .

Definition 3.1.

A 1-neighborhood -sketch of a graph is constructed as follows:

  1. Each vertex independently generates a random key uniformly from .

  2. Then each vertex determines which of its neighbors (including itself) has the smallest key. We denote this by the function

To give some intuition for how sketching is used to estimate cardinality, observe that choosing keys independently and uniformly at random essentially assigns a random vertex to be . Therefore, the key value is correlated with . This correlation is the cornerstone of sketching. If we construct independent sketches, then by concentration we can use an order statistic of over all sketches to give an -approximation of

with high probability. We gives the full details in Appendix 

A.

To maintain sketches of the fill graph as it undergoes vertex eliminations, we first need to implicitly maintain the component graph (Lemma 6.1). We demonstrate how to efficiently propagate key values in a sketch as vertices are pivoted in Section 7. For now, it is sufficient to know that each vertex in a sketch has an associated min-heap that it uses to report and update its minimizer. Because eliminating vertices leads to edge contractions in the component graph, there is an additional layer of intricacies that we need to resolve using amortized analysis.

Suppose is the vertex eliminated as we go from to . The sketch propagates this information to relevant vertices in the graph using a two-level notification mechanism. The neighbors of  are informed first, and then they notify their neighbors about the change, all the while updating the key values in their heaps. We outline the subroutine that accomplishes this:

  1. Update the min-heaps of every remaining neighbor of .

  2. For each component neighbor of , if the minimum in its heap changes, then propagate this change to the remaining neighbors of and merge with .

While it is simple enough to see that this algorithm correctly maintains key values, bounding its running time is nontrivial and requires a careful amortized analysis to show that the bottleneck operation is the merging of component vertices.

We can merge two min-heaps in time, so merging at most heaps takes time in total. To bound the cost of heap updates due to merges, we define the potential of the component graph as

where is the sum of the original degrees of vertices merged into . Using the fact that a merge operation only informs neighbors of at most one of the two merged vertices, we are able to show that the number of updates to produce is of the order of . It follows that the total number of updates is at most , which gives us a total update cost of .

3.2 Correlation and Decorrelation

We now discuss how we use the randomized sketching data structure within a greedy algorithm. We start with a simple concrete example to illustrate a problem that an adaptive adversary can cause. Consider a data structure that uses sketching to estimate the cardinality of a subset under the insertion and deletion of elements. This data structure randomly generates a subset of keys such that , and it returns as its estimate the scaled intersection

which is guaranteed to be within an -additive error of the true value by Chernoff bounds, assuming that is generated independently of . Clearly this cardinality-estimation algorithm works in the oblivious adversary model.

However, an adaptive adversary can use answers to previous queries to infer the set of secret keys  in updates and queries. Consider the following scheme in Figure 2 that returns .


Initialize . For each to : Delete from . If the estimated size of changed, reinsert into . Return .


Figure 2: An adaptive routine that amplifies the error of a cardinality-estimation scheme that uses a fixed underlying sketch.

While the updates performed by a greedy algorithm are less extreme than this, in the setting where we maintain the cardinality of the smallest of dynamic sets, having access to elements in the minimizer does allow for this kind of sketch deduction. Any accounting of correlation (in the standard sense) also allows for worst-case kinds of adaptive behavior, similar to the scheme above.

To remove potential correlation, we use an external routine that is analogous to the local degree-estimation algorithm used in the approximate minimum degree algorithm, which runs in time close to the degree it estimates. In this simplified example, suppose for each cardinality query that the data structure first regenerates . Then the probability that belongs to is . Stepping through all , it follows that the expected number of deletions is , and hence remains close to size with high probability.

Reinjecting randomness is a standard method for decorrelating a data structure across steps. However, if we extend this example to the setting where we maintain the cardinality of sets (similar to our minimum degree algorithm), then the previous idea requires that we reestimate the size of every set to determine the one with minimum cardinality. As a result, this approach is prohibitively expensive. However, these kinds of cardinality estimations are actually local—meaning that it is sufficient to instead work with a small and accurate subset of candidates sets. If we compute the set with minimum cardinality among the candidates using an external estimation scheme, then this decision is independent of the random choice of in the sketching data structure, which then allows us to use the sketching data structure to generate the list candidates.

Our algorithm for generating an approximate greedy minimum degree ordering relies on a similar external routine called , which locally estimates the fill 1-degree of  at any step of the algorithm in time proportional to in the original graph. We further describe this estimator in Section 3.3 and present the full sampling algorithm in Section 6. In Section 5 we show that to generate an approximate greedy minimum degree sequence, it is instead sufficient to pivot the vertex

at each step, which we call the -decayed minimum over all external estimates.

Analogous to the discussion about the set cardinality estimation above, evaluating the degrees of every remaining vertex using EstimateFill1Degree at each step is expensive and leads to a total cost of . However, we can reincorporate the sketching data structure and use the following observations about the perturbation coefficient involving the exponential random variable to sample a small number of candidate vertices that contains the -decayed minimum.

  • For a set of vertices whose degrees are within of each other, it suffices to randomly select and consider of them by generating the highest order statistics of exponential random variables in decreasing order.

  • By the memoryless property of the exponential distribution, if we call

    EstimateFill1Degree, then with constant probability it will be for the vertex we pivot. Therefore, we can charge the cost of these evaluations to the original edge count and retain a nearly-linear running time.

Invoking EstimateFill1Degree only on the candidate vertices allows us to efficiently find the -decayed minimizer in each step, which leads to the nearly-linear runtime as stated in Theorem 1.2. The key idea is that any dependence on the -sketches stops after the candidates are generated, since their degrees only depend on the randomness of an external cardinality-estimation routine.

3.3 Local Estimation of Fill Degrees

A critical part of the approximate min-degree algorithm is the EstimateFill1Degree function, which estimates the fill 1-degree of a vertex using fresh randomness and oracle queries to the component graph . At the beginning of Section 6 we show how to construct a -matrix where each row corresponds to a remaining neighborhood of a component neighbor of . The number of nonzero columns in is equal to . Using only the following matrix operations (which correspond to component graph oracle queries), we analyze the more general problem of counting the number of nonzero columns in a matrix. We note that this technique is closely related to recent results in wedge sampling for triangle counting [KP17, ELRS17].

  • : Returns the number of nonzero elements in row of .

  • : Returns a column index uniformly at random from the nonzero entries of row of .

  • : Returns the value of .

[]lemmaNonZeroColumnEstimator There is a routine EstimateNonzeroColumns using the three operations above that takes as input (implicit) access to a matrix and an error , and returns an -approximation to the number of nonzero columns in with high probability. The expected total number of operations used is , where is the number of rows and is the number of columns in .

We now give an overview of how EstimateNonzeroColumns works. Let be the normalized version of where every nonzero entry is divided by its column sum. The sum of the nonzero entries in is the number of nonzero columns in , denoted by . If we uniformly sample a nonzero entry of , then the mean of this distribution is . Because random variables sampled from this distribution take their value in , we can estimate their mean using an EstimateMean subroutine (Lemma 6.3), which does the following:

  1. Set a threshold depending on the accuracy of the desired estimate.

  2. Sample independent random variables from the distribution until their sum first exceeds .

  3. Return .

Using the matrix operations above, we can easily sample indices of nonzero entries in , but evaluating requires that we know the -th column sum of

. Therefore, to compute this column sum we estimate the mean of a Bernoulli distribution on the

-th column of  defined by selecting an entry from uniformly at random. This distribution has mean , and it is amenable to sampling using the provided operations.

While the previous estimator works satisfactorily, we show how to combine these distributions and use a hitting time argument to reduce the sample complexity by a factor of . Specifically, for a fixed column, we consider a random variable that has a limited number of attempts to find a nonzero entry by uniformly sampling rows. By optimizing the number of attempts, we can reduce our error overhead in the runtime at the expense of a perturbation to the approximation.

3.4 Significance to Combinatorial Scientific Computing

Despite the unlikelihood of theoretical gains for solving linear systems by improved direct methods for sparse Gaussian elimination, we believe our study could influence combinatorial scientific computing in several ways. First, we provide evidence in Section 8

for the nonexistence of nearly-linear time algorithms for finding exact minimum degree orderings by proving conditional hardness results. Our reduction uses the observation that determining if a graph can be covered by a particular union of cliques (or equivalently, that the fill graph is a clique after eliminating certain vertices) is equivalent to the orthogonal vectors problem 

[Wil05]. Assuming the strong exponential time hypothesis, this leads to a conditional hardness of for computing a minimum degree ordering. However, we believe that this result is suboptimal and that a more careful construction could lead to -hardness.

On the other hand, advances in minimum degree algorithms cannot be justified in practice solely by worst-case asymptotic arguments. In general, nested dissection orderings are asymptotically superior in quality to minimum degree orderings [HR98]. Furthermore, methods based on Krylov spaces, multiscale analysis, and iterative methods [Gut07, GGLN13] are becoming increasingly popular as they continue to improve state-of-the-art solvers for large sparse systems. Such advancements are also starting to be reflected in theoretical works. As a result, from both a theoretical and practical perspective, we believe that the most interesting question related to minimum degree algorithms is whether or not such sequences lead to computational gains for problems of moderate size.

In our approximate minimum degree algorithm, the term and convenient choice of constants preclude it from readily impacting elimination orderings in practice. However, the underlying sketching technique is quite flexible. For example, consider modifying the dynamic -sketches such that:

  1. Each vertex maintains random numbers in the range .

  2. Each component vertex maintains the smallest numbers of its remaining neighbors.

  3. Each remaining vertex then maintains the smallest numbers among its component neighbors.

If we repeatedly eliminate the vertex whose median is largest, this routine is similar to using  copies of the previous type of the sketch. Letting , we can analyze this variant against an oblivious adversary using slight modifications to our original sketching algorithm [Mas00]. Although our analysis demonstrates that new tools are necessary for studying its behavior within a greedy algorithm, we experimentally observed desirable behavior for such sketches. Therefore, we plan to continue studying this kind of adaptive graph sketching both theoretically and experimentally.

4 Sketching Algorithms for Computing Degrees

Let us recall a few relevant definitions from Section 2 for convenience. For a given vertex elimination sequence , let denote the fill graph obtained by pivoting vertices , and let denote the minimum degree in . An -sketch data structure consists of the following:

  1. Each vertex independently generates a key from uniformly at random.

  2. Then each vertex determines which neighbor (including itself) has the smallest key value. We denote this neighbor by .

In this section we show that if an -sketch can efficiently be maintained for a dynamic graph, then we can use the same set of sketches at each step to determine the vertex with minimum fill degree and eliminate it. We explore the dynamic -sketch data structure for efficiently propagating key values under pivots in detail in Section 7 (and for now we interface it via Theorem 4.4). This technique leads to improved algorithms for computing the minimum degree ordering of a graph, which we analyze in three different settings.

First, we consider the case where the minimum degree at each step is bounded. In this case we choose a fixed number of -sketches and keep track of every minimizer of a vertex over all of the sketch copies. Note that we can always use as an upper bound on the minimum fill degree.

Theorem 4.1.

There is an algorithm DeltaCappedMinDegree that, when given a graph with a lexicographically-first min-degree ordering whose minimum degree is always bounded by , outputs this ordering with high probability in expected time and uses space .

Next, we relax the bound on the minimum degrees over all steps of the algorithm and allow the time and space complexity to be output sensitive by adaptively increasing the number of -sketches as the algorithm progresses.

Theorem 4.2.

There is an algorithm OutputSensitiveMinDegree that, when given a graph with a lexicographically-first min-degree sequence , outputs this ordering with high probability in expected time and uses space .

Lastly, we modify the algorithm to compute an approximate minimum degree vertex at each step. By maintaining copies of the -sketch data structure, we are able to accurately approximate the 1-degree of a vertex using the -th222Note that we use to refer to the base of the natural logarithm. order statistic of the key values of its minimizers. We abstract this idea using the following approximate degree data structure, which when given an elimination ordering directly leads to a nearly-linear time algorithm.

Theorem 4.3.

There is a data structure ApproxDegreeDS that supports the following methods:

  • , which pivots a remaining vertex .

  • , which provides balanced binary search tree (BST) containers such that all the vertices in the bucket have 1-degree in the range

The memory usage of this data structure is . Furthermore, if the pivots are picked independently from the randomness used in this data structure (i.e., we work under the oblivious adversary model) then:

  • The total cost of all the calls to ApproxDegreeDS_Pivot is bounded by .

  • The cost of each call to ApproxDegreeDS_Report is bounded by .

4.1 Computing the Exact Minimum Degree Ordering

We first consider the case where the minimum degree in each of the fill graphs is at most . In this case, we maintain copies of the -sketch data structure. By a coupon collector argument, any vertex with degree at most contains all of its neighbors in its list of minimizers with high probability. This implies that for each , we can obtain the exact minimum degree in  with high probability. Figure 3 briefly describes the data structures we will maintain for this version of the algorithm.


Global Variables: graph that undergoes pivots, degree cap . , the number of sketches set to . independent -sketch data structures
For each vertex , a balanced binary search tree that stores across all -sketches. A balanced binary tree size_of_minimizers on all vertices with the key of set to the number of different elements in .


Figure 3: Global variables for the -capped min-degree algorithm DeltaCappedMinDegree.

Note that if we can efficiently maintain the data structures in Figure 3, then querying the minimum element in size_of_minimizers returns the (lexicographically-least) vertex with minimum degree. Theorem 4.4 demonstrates that we can maintain the -sketch data structures efficiently.

Theorem 4.4.

Given i.i.d. random variables associated with each vertex , there is a data structure DynamicSketch that, for each vertex , maintains the vertex with minimum  among itself and its neighbors in . This data structure supports the following methods:

  • , which returns for a remaining vertex in time.

  • , which pivots a remaining vertex and returns the list of all remaining vertices whose value of changed immediately after this pivot.

The memory usage of this data structure is . Moreover, for any choice of key values :

  • The total cost of all the pivots is .

  • The total size of all lists returned by PivotVertex over all steps is .

This theorem relies on intermediate data structures described in Section 7, so we defer the proof until the end of that section. Note that this DynamicSketch data structure will be essential to all three min-degree algorithms.

Now consider a sketch of and a vertex with degree . By symmetry of the  values, each vertex in is the minimizer of with probability . Therefore, if we maintain independent -sketches, we can ensure that we have an accurate estimation of the minimum fill degree with high probability. The pseudocode for this routine is given in Figure 4. We formalize the probability guarantees in Lemma 4.5 and Lemma 4.6, which are essentially a restatement of [Coh97, Theorem 2.1].


Input: graph , threshold . Output: exact lexicographically-first min-degree ordering . For each step to : Set . . Return .



Input: vertex to be pivoted . Output: updated global state. For each sketch to : , the set of vertices in the -th sketch whose minimizers changed after pivoting out . For each to : Update the values corresponding to sketch in . Update the entry for in size_of_minimizers with the size of .


Figure 4: Pseudocode for the exact -capped min-degree algorithm, which utilizes the global data structures for DeltaCappedMinDegree defined in Figure 3.
Lemma 4.5.

With high probability, for all remaining vertices such that we have

Proof.

The only way we can have is if at least one neighbor of or  itself is not present in . Let be an arbitrary vertex in . The probability of not being the minimizer in any of the sketches is

We can upper bound the probability that there exists a vertex not in using a union bound. It follows that

Using a second union bound for the event that there exists a vertex such that and completes the proof. ∎

Lemma 4.6.

With high probability, for all remaining vertices with we have

Proof.

We first upper bound the probability of the event . Let be any subset of of size . Using the assumption that ,

Next, sum over all choices of the set and use a union bound. Note that this over counts events, but this suffices for an upper bound. It follows that

Using a second union bound for the event that there exists a vertex such that and completes the proof. ∎

Proof of Theorem 4.1..

The algorithm correctly pivots the minimum degree vertex by Lemma 4.5 and Lemma 4.6. For the space complexity, each of the -sketch data structures uses memory by Theorem 4.4, and for each vertex there is a corresponding balanced binary search tree minimizers which uses space. We also have size_of_minimizers, which uses space. Therefore, since we assume , the total space is .

For the running time, Theorem 4.4 gives a cost of across all PivotVertex calls per sketch, and thus a total cost of over all sketches. Theorem 4.4 also states the sum of  (the length of the update lists) across all steps is at most . Each of these updates leads to two BST updates, so the total overhead is , which is an equal order term. ∎

4.2 Modifying the Algorithm to be Output Sensitive by Adaptive Sketching

If we do away with the condition that minimum fill degrees are bounded above by , then the number of copies of the -sketch data structure needed depends only on the values of the minimum fill degree at each step. Therefore, we can modify DeltaCappedMinDegree to potentially be more efficient by adaptively maintaining the required number of sketches.

To accurately estimate degrees in we need copies of the -sketch data structure, but we do not know the values of a priori. To rectify this, consider the following scheme that adaptively keeps a sufficient number of copies of the -sketch data structures. First, initialize the value (the minimum degree in ). Then for each step to update according to:

  1. Let be the candidate minimum degree in using sketches.

  2. If , then set and repeat.

The core idea of the routine above is that if the candidate minimum degree is at most , then with high probability the true minimum degree is at most . It follows that using sketching data structures guarantees the minimum degree estimate is correct with high probability.

Proof of Theorem 4.2..

The proof is analogous to that of Theorem 4.1. The upper bound for the minimum degrees is now , and so the time and space complexities follow. ∎

4.3 Computing an Approximate Minimum Degree

To avoid bounding the minimum fill degree over all steps and to make the running time independent of the output, we modify the previous algorithms to obtain an approximate min-degree vertex at each step. We reduce the number of -sketches and use the reciprocal of the -th order statistic to approximate the cardinality (and hence the 1-degree of ) to obtain a nearly-linear time approximation algorithm.

There is, however, a subtle issue with the randomness involved with this algorithm. A necessary condition for the algorithm to succeed as intended is that the sketches at each step are independent of the past decisions of the algorithm. Therefore, we must remove all dependencies between previous and current queries. In Section 3.2 we demonstrate how correlations between steps can amplify. To avoid this problem, we must decorrelate the current state of the sketches from earlier pivoting updates to the data structures. We carefully address this issue in Section 5. Instead of simply selecting a vertex with an approximate min-degree, this algorithm instead requires access to all vertices whose estimated degree is within a certain range of values. Therefore, this approximation algorithm uses a bucketing data structure, as opposed to the previous two versions that output the vertex to be pivoted. Figure 5 describes the global data structures for this version of the algorithm.


Global Variables: graph , error tolerance . , the number of sketches set to . independent -sketch data structures
For each vertex , a balanced binary search tree that stores across all -sketches, and maintains the element in with rank
A balanced binary tree quantile over all vertices whose key is the -ranked element in .


Figure 5: Global variables and data structures for ApproxDegreeDS, which returns (implicit) partitions of vertices into buckets with -approximate degrees.

To successfully use fewer sketches, for a given vertex we estimate the cardinality of the set of its minimizers via its order statistics instead of using the exact cardinality as we did before with the binary search tree . Exploiting correlations in the order statistics of sketches is often the underlying idea behind efficient cardinality estimation. In particular, we make use of the following lemma, which is essentially a restatement of [Coh97, Propositions 7.1 and 7.2].

Lemma 4.7.

Suppose that we have copies of the -sketch data structure, for . Let be any vertex such that , and let denote the -ranked key value in the list . Then, with high probability, we have

In [Coh97] they assume that the random keys are drawn from the exponential distribution (and hence the minimum key value is also), whereas we assume that

is drawn independently from the uniform distribution. When

is large enough though, the minimum of random variables from either distribution is almost identically distributed. For completeness, we prove Lemma 4.7 when the keys are drawn from the uniform distribution in Appendix A.

This idea leads to the following subroutine for providing implicit access to all vertices with approximately the same degree. This is critical for our nearly-linear time algorithm, and we explain its intricacies in Section 5. The pseudocode for this subroutine is given in Figure 6.


Input: vertex to be pivoted, . Output: updated global state. For each sketch to : , the set of vertices in the -th sketch whose minimizers changed after we pivot out . For each to : Update the values corresponding to sketch in , which in turn updates its -ranked quantile. Update the entry for in quantile with the new value of the -ranked quantile of .



ApproxDegreeDS_Report() Output: approximate bucketing of the vertices by their fill 1-degrees. For each to : Set to be the split binary tree in quantile that contains all nodes with -ranked quantiles in the range
Return .


Figure 6: Pseudocode for the data structure that returns pointers to binary trees containing partitions of the remaining vertices into sets with -approximate degrees.

Observe that because 1-degrees are bounded by , whenever we call ApproxDegreeDS_Report we have with high probability by Lemma 4.7. Therefore, this data structure can simply return pointers to the first element in each of the partitions .

Proof of Theorem 4.3..

By construction, all vertices in have their -ranked quantile in the range

By Lemma 4.7, the 1-degree of any vertex in bucket lies in the range

with high probability, which is within the claimed range for .

The proof of time and space complexities is similar to that of Theorem 4.1. Letting the number of sketches instead of proves the space bound. One of the main differences in this data structure is that we need to store information about the -ranked quantiles. These queries can be supported in time by augmenting a balanced binary search tree with information about sizes of the subtrees in standard ways (e.g., [CLRS09, Chapter 14]). It follows that the total cost of all calls to ApproxDegreeDS_Pivot is . To analyze each call to ApproxDegreeDS_Report, we use standard splitting operations for binary search trees (e.g., treaps [SA96]), which allows us to construct each bucket in time. ∎

Note that there will be overlaps between the 1-degree intervals, so determining which bucket contains a given vertex is ambiguous if its order statistic is near the boundary of an interval.

An immediate corollary of Theorem 4.3 is that we can provide access to approximate min-degree vertices for a fixed sequence of updates by always returning an entry from the first nonempty bucket.

Corollary 4.8.

For a fixed elimination ordering , we can find -approximate minimum degree vertices in each of the intermediate states in time.

It is also possible to adaptively choose the number of sketches for the -approximate minimum degree algorithm by using a subroutine that is similar to the one in Section 4.2.

5 Generating Decorrelated Sequences

In this section we present a nearly-linear -approximate marginal min-degree algorithm. This algorithm relies on degree approximation via sketching, as described in Theorem 4.3. In particular, it uses the randomized data structure ApproxDegreeDS, which provides access to buckets of vertices where the -th bucket contains vertices with fill 1-degree in the range .

Theorem 5.1.

There is an algorithm ApproxMinDegreeSequence that produces a -approximate marginal min-degree ordering in expected time with high probability.

At each step of this algorithm, reporting any member of the first nonempty bucket gives an approximate minimum degree vertex to pivot. However, such a choice must not have any dependence on the randomness used to get to this step, and more importantly, it should not affect pivoting decisions in future steps. To address this issue, we introduce an additional layer of randomization that decorrelates the -sketches and the choice of vertices to pivot. Most of this section focuses on our technique for efficiently decorrelating such sequences.

The pseudocode for ApproxMinDegreeSequence is given in Figure 7. This algorithm makes use of the following global data structures and subroutines.

  • ApproxDegreeDS: Returns buckets of vertices with approximately equal 1-degrees (Section 4.3).

  • ExpDecayedCandidates: Takes a sequence of values that are within of each other, randomly perturbs the elements, and returns the new (-decayed) sequence (Section 5.2).

  • EstimateFill1Degree: Gives an -approximation to the 1-degree of any vertex (Section 6).

We give the formal statement for EstimateFill1Degree in the following result.

Theorem 5.2.

There is a data structure that maintains a component graph under (adversarial) vertex pivots in a total of time and supports the operation , which given a vertex and error threshold , returns with high probability a -approximation to the fill 1-degree of by making oracle queries to .


Input: graph with vertices, error . Output: -approximate marginal min-degree sequence . Set a smaller error . Initialize the approximate degree reporting data structure . For each to : Compute approximate buckets of 1-degrees (implicitly),
Let be the index of the minimum nonempty bucket. Set . For each to , perturb and rank vertices by their approximate 1-degree,
Trim so that its entries satisfy
Let be the vertex that is the minimizer over all of
.
Return .


Figure 7: Pseudocode for the -approximate marginal minimum degree ordering algorithm.

The most important part of this algorithm is arguably the use of exponential random variables to construct a list of candidates that is completely uncorrelated with the randomness used to generate the -sketches and the choice of previous vertex pivots. The next subsection summarizes some desirable properties of exponential distributions that we exploit for efficient perturbations.

5.1 Exponential Random Variables

The exponential distribution is a continuous analog of the geometric distribution that describes the time between events in a Poisson point process. We utilize well-known facts about its order statistics, which have also appeared in the study of fault tolerance and distributed graph decompositions 

[MPX13]. For a rate parameter , the exponential distribution

is defined by the probability density function (PDF)

We will also make use of its cumulative density function (CDF)

A crucial property of the exponential distribution is that it is memoryless. This means that for any rate and , an exponentially distributed random variable satisfies the relation

A substantial portion of our analysis relies on the order statistics of exponential random variables. Given random variables , the -th order statistic is the value of the -th minimum random variable. A useful fact about i.i.d. exponential random variables is that the difference between consecutive order statistics also follows an exponential distribution. The algorithmic consequences of this property are that we can sample the smallest (or largest) of exponential random variables in increasing (or decreasing) order without ever generating all random variables.

Lemma 5.3 ([Fel71]).

Let denote the -th order statistic of i.i.d. random variables drawn from the distribution . Then, the variables are independent, and the density of is given by the distribution .

One approach to prove Lemma 5.3 uses the i.i.d. assumption to show that the CDF of is

This proves that follows an exponential distribution with rate . Conditioning on , we see that follows an exponential distribution equal to by the memoryless property. Therefore, one can repeat this argument to get the density of for all up to .

5.2 Implicitly Sampling -Decayed Minimums

The key idea in this section is the notion of -decay, which we use to slightly perturb approximate 1-degree sequences. It is motivated by the need to decorrelate the list of vertices grouped approximately by their 1-degree from previous sources of randomness in the algorithm. In the following definition, is the number of vertices in the original graph before pivoting and is a constant.

Definition 5.4.

Given a sequence , we construct the corresponding -decayed sequence by independently sampling the exponential random variables

where as in line 1 in ApproxMinDegreeSequence, and letting

We say that the -decayed minimum of is the value