Learning Graphs from Linear Measurements: Fundamental Trade-offs and Applications

We consider a specific graph learning task: reconstructing a symmetric matrix that represents an underlying graph using linear measurements. We study fundamental trade-offs between the number of measurements (sample complexity), the complexity of the graph class, and the probability of error by first deriving a necessary condition (fundamental limit) on the number of measurements. Then, by considering a two-stage recovery scheme, we give a sufficient condition for recovery. Furthermore, assuming the measurements are Gaussian IID, we prove upper and lower bounds on the (worst-case) sample complexity. In the special cases of the uniform distribution on trees with n nodes and the Erdos-Renyi (n,p) class, the fundamental trade-offs are tight up to multiplicative factors. Applying the Kirchhoff's matrix tree theorem, our results are extended to the scenario when part of the topology information is known a priori. In addition, we design and implement a polynomial-time (in n) algorithm based on the two-stage recovery scheme. Simulations for several canonical graph classes and IEEE power system test cases demonstrate the effectiveness of the proposed algorithm for accurate topology and parameter recovery.

There are no comments yet.

Authors

• 5 publications
• 2 publications
• 5 publications
12/01/2017

Fundamental Limits on Data Acquisition: Trade-offs between Sample Complexity and Query Difficulty

In this paper, we consider query-based data acquisition and the correspo...
01/30/2021

Phase Transition for Support Recovery from Gaussian Linear Measurements

We study the problem of recovering the common k-sized support of a set o...
12/05/2017

Optimal Sample Complexity for Stable Matrix Recovery

Tremendous efforts have been made to study the theoretical and algorithm...
01/08/2020

Nullstellensatz Size-Degree Trade-offs from Reversible Pebbling

We establish an exactly tight relation between reversible pebblings of g...
10/08/2018

Time-Message Trade-Offs in Distributed Algorithms

This paper focuses on showing time-message trade-offs in distributed alg...
04/06/2015

Information Recovery from Pairwise Measurements

This paper is concerned with jointly recovering n node-variables { x_i}_...
03/04/2018

Detecting Correlations with Little Memory and Communication

We study the problem of identifying correlations in multivariate data, u...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

1.1 Background

Symmetric matrices are ubiquitous constructs in graphical models with examples such as the adjacency matrix and the (generalized) Laplacian of an undirected graph. A major challenge in graph learning is inferring graph parameters embedded in those graph-based matrices from historical data or real-time measurements. In contrast to traditional statistical inference methods [16, 17, 42], model-based graph learning, such as physically-motivated models and graph signal processing (GSC) [20] takes advantage of additional data structures offered freely by nature. Among different measurement models for graph learning, linear models have been used and analyzed commonly for different tasks, e.g.,, linear structural equation models (SEMs) [27, 26], linear graph measurements [5], generalized linear cascade models [38], etc. Despite extra efforts required on data collection, processing and storage, model-based graph learning often guarantees provable sample complexity, which, for most of the time, is significantly lower than the empirical number of measurements needed with traditional inference methods. In many problem settings, having computationally efficient algorithms with low sample complexity is important. One reason for this is that the graph parameters may change in a short time-scale, making sample complexity a vital metric to guarantee that the learning can be accomplished with limited number of measurements.

Taking a modern power system as a concrete example, due to the increasing size of distributed energy resources, the network parameters are subject to rapid changes. The necessity for preventing cascading failure also sometime involves reforming network connectivity and thus undesirably destabilizes the system [28]. Constrained by these issues arising out of the mega-trends, analyzing fundamental limits of parameter identification and designing a corresponding scheme that is efficient in both computational and sample complexity become more and more critical. In addition to the applications in the scenarios when stable measurements are scarce resources, understanding sample complexity and having a practical identification scheme can furthermore bridge the theory-to-application gap and benefit existing algorithms in electric grids. For instance, many real-time or nearly real-time graph algorithms based on temporal data, such as (real-time) optimal power flow [36, 34, 44], real-time contingency analysis [35] and frequency control [29] etc

., either require full and accurate knowledge of the network, or can be improved if certain estimates are (partially) accessible.

In this work, we consider a general graph learning problem where the measurements and underlying matrix to be recovered can be represented as or approximated by a linear system. A graph matrix with respect to an underlying graph (see Definition 2.1) is defined as an symmetric matrix with each nonzero -th entry corresponding to an edge connecting node and node where is the number of nodes of the underlying undirected graph. The diagonal entries can be arbitrary. The measurements are summarized as two () real or complex matrices and satisfying

 A=BY(G)+Z (1)

where denotes additive noise.

We focus on the following problems:

• Fundamental Trade-offs What is the minimum number of linear measurements required for reconstructing the symmetric matrix ? Is there an algorithm asymptotically achieving recovery with the minimum number of measurements? As a special case, can we characterize the sample complexity when the measurements are Gaussian IID111This means the entries of the matrix

are IID normally distributed.

?

• Applications to Electrical Grids Do the theoretical guarantees on sample complexity result in a practical algorithm (in terms of both sample and computational complexity) for recovering electric grid topology and parameters?

1.2 Related Work

1.2.1 Graph Learning Aspects

Algorithms for learning sparse graphical model structures have a rich tradition in previous literature. For general MRFs, learning the underlying graph structures is known to be NP-hard [11]. However, in the case when the underlying graph is a tree, the classical Chow-Liu algorithm [16] offers an efficient approach to structure estimation. Recent results contribute to an extensive understanding of the Chow-Liu algorithm. The authors in [42] analyzed the error exponent and showed experimental results for chain graphs and star graphs. For pairwise binary MRFs with bounded maximum degree, [40] provides sufficient conditions for for correct graph selection. Similar achievability results for Ising models are in [7]. Model-based graph learning has been emerging recently and assuming the measurements form linear SEMs, the authors in [27, 26] showed theoretical guarantees of the sample complexity for learning a directed acyclic graph (DAG) structure, under mild conditions on the class of graphs.

From a converse perspective, information-theoretic tools have been widely applied to derive fundamental limits for learning graph structures. For a Markov random field (MRF) with bounded maximum degree,  [40] derived necessary conditions on the number of samples for estimating the underlying graph structure using Fano’s inequality (see [22]). For Ising models, [8] combined Fano’s inequality with typicality to derive weak and strong converse. Similar techniques have also been applied to Gaussian graphical models [6]

[25]. Fundamental limits for noisy compressed sensing have been extensively studied in [4] under an information-theoretic framework.

1.2.2 Parameter Identification of Power Systems

Graph learning has been widely used electric grids applications, such as state estimation [14, 15] and topology identification [41, 19]. Most of the literature focuses on topology identification or change detection, but there is not much recent work on joint topology and parameter recovery, with notable exceptions of [31, 46, 37]. Moreover, there is little exploration on the fundamental performance limits (estimation error and sample complexity) on topology and parameter identification of power networks, with the exception of [48] where a sparsity condition is provided for exact recovery of outage lines. Based on single-type measurements (either current or voltage), correlation analysis has been applied for topology identification [43, 32, 12]

. Approximating the measurements as normal distributed random variables

[41] proposed an approach for topology identification with limited measurements. A graphical learning-based approach was provided by [18]. Recently, data-driven methods were studied for parameter estimation [46]. In [47], a similar linear system as (3) has been used combined with regression to recover the symmetric graph parameters (which is the admittance matrix in the power network) where the matrix is of full column rank, implying that at least measurements are necessary. Sparse recovery ([13, 39]), however suggests that recovering the graph matrix may take much fewer number of measurements by fully utilizing the sparsity of . Some experimental results for recovering topology of a power network based on compressed sensing algorithms are reported in [9]. Nonetheless, in the worst case, some of the columns (or rows) of

may be dense vectors consisting of many non-zeros, prohibiting us from applying compressed sensing algorithms to recover each of the columns (or rows) of

separately. Moreover, the columns to be recovered may not share the same support set. Thus many distributed compressed sensing schemes (cf. [10]) are not directly applicable in this situation. This motivates us to handle the difficulty that for a randomly chosen graph, some of the columns (or rows) in the corresponding graph matrix may not be sparse by considering a new two-stage recovery scheme.

1.3 Our Contributions

We demonstrate that the linear system in (1) can be used to learn the topology and parameters of a graph. Our framework can be applied to perform system identification in electrical grids by leveraging synchronous nodal current and voltage measurements obtained from phasor measurement units (PMUs).

The main results of this paper are summarized here.

1. Fundamental Trade-offs: In Theorem 3.1, we derive a general lower bound on the probability of error for topology identification (defined in (4)). In Section 3.2, we describe a simple two-stage recovery scheme combining -norm minimization with an additional step called consistency-checking. For any arbitrarily chosen distribution, we characterize it using the definition of -sparsity (see Definition 3.1) and argue that if a graph is drawn according to such a distribution, then the number of measurements required for exact recovery is bounded from above as in Theorem 3.2.

2. (Worst-case) Sample Complexity: We focus on the case when the matrix has Gaussian IID entries in Section 4. Under this assumption, we provide upper and lower bounds on the worst-case sample complexity in Theorem 4.2. We show two applications of Theorem 4.2 for the uniform sampling of trees and the Erdős-Rényi model in Corollary 4.1 and 4.2, respectively.

3. (Heuristic) Algorithm: Motivated by the two-stage recovery scheme, a heuristic algorithm with polynomial (in ) running-time is reported in Section 6, together with simulation results for power system test cases validating its performance in Section 7.

1.4 Outline of the Paper

The remaining content is organized as follows. In Section 2, we specify our models. In Section 3.1, we present the converse result as fundamental limits for recovery. The achievability is provided in  3.2. We present our main result as the worst-case sample complexity for Gaussian IID measurements in Section 4. A heuristic algorithm together with simulation results are reported in Section 6 and 7.

2 Model and Definitions

2.1 Notation

Let denote a field that can either be the set of real numbers , or the set of complex numbers . The set of all symmetric matrices whose entries are in is denoted by . The imaginary unit is denoted by . Throughout the work, let denote the binary logarithm with base and let denote the natural logarithm with base . We use

to denote the expectation of random variables if the underlying probability distribution is clear. The mutual information is denoted by

. The (differential) entropy is denoted by and in particular, we use for binary entropy. To distinguish random variables and their realizations, we follow the convention and denote the former by capital letters (e.g., ) and the latter by lower case letters (e.g., ). The symbol is used to designate a constant.

Matrices are denoted in boldface (e.g., , and ). The -th row, the -th column and the -th entry of a matrix are denoted by , and respectively. For notational convenience, let be a subset of . Denote by the complement of and by a sub-matrix consisting of columns of the matrix whose indices are chosen from . The notation denotes the transpose of a matrix, calculates its determinant. For the sake of notational simplicity, we use big O notation (,,,,) to quantify asymptotic behavior. Table 1 summarizes the notation used throughout the paper.

2.2 Graphical Model

Denote by a set of nodes and consider an undirected graph (with no self-loops) whose edge set contains the desired topology information. The degree of each node is denoted by . The connectivity between the nodes is unknown and our goal is to determine it by learning the associated graph matrix using linear measurements.

Definition 2.1 (Graph Matrix).

Provided with an underlying graph , a symmetric matrix is called a graph matrix if the following conditions hold:

 Yi,j(G)=⎧⎨⎩≠0 if i≠j and (i,j)∈E0 if i≠j and (i,j)∉Earbitrary otherwise .
Remark 1.

Our theorems can be generalized to recover a broader class of symmetric matrices, as long as the matrix to be recovered satisfies (1) Knowing gives the full knowledge of the topology of ; (2) The number of non-zero entries in a column of has the same order as the degree of the corresponding node, i.e., there is a positive constant such that . for all . To have a clear presentation, we consider specifically the case .

In this work, we employ a probabilistic model and assume that the graph is chosen randomly from a candidacy set (with nodes), according to some distribution . Both the candidacy set and distribution are not known to the estimator. For simplicity, we often omit the subscripts of and .

Example 2.1.

We exemplify some possible choices of the candidacy set and distribution:

1. (Mesh Network) When represents a transmission (mesh) power network and no prior information is available, the corresponding candidacy set consisting of all graphs with nodes and is selected uniformly at random from . Moreover, in this case.

2. (Radial Network) When represents a distribution (radial) power network and no other prior information available, then the corresponding candidacy set is a set containing all spanning trees of the complete graph with buses and is selected uniformly at random from ; the cardinality is followed by Cayley’s formula.

3. (Radial Network with Prior Information) When represents a distribution (radial) power network, and we further know that some of the buses cannot be connected (which may be inferred from locational/geographical information), then the corresponding candidacy set is a set of spanning trees of a sub-graph with buses. An edge if and only if we know . The size of is given by Kirchhoff’s matrix tree theorem (c.f. [45]). See Theorem 5.1.

4. (Erdős-Rényi model) In a more general setting, can be a random graph chosen from an ensemble of graphs according to a certain distribution. When a graph is sampled according to the Erdős-Rényi model, each edge of is connected IID with probability . We denote the corresponding graph distribution for this case by for convenience.

The next section is devoted to describing available measurements.

2.3 Linear System of Measurements

Suppose the measurements are sampled discretely and indexed by the elements of the set . As a general framework, the measurements are collected in two matrices and and defined as follows.

Definition 2.2 (Generator and Measurement Matrices).

Let be an integer with . The generator matrix is an random matrix and the measurement matrix is an matrix with entries selected from that satisfy the linear system (1):

 A=BY(G)+Z

where is a graph matrix to be recovered, with an underlying graph and denotes the random additive noise. We call the the recovery noiseless if . Our goal is to resolve the matrix based on given matrices and .

2.4 Applications to Electrical Grids

Various applications fall into the framework in (1). Here we present two examples of the graph identification problem in power systems. The measurements are modeled as time series data obtained via nodal sensors at each node, e.g., PMUs, smart switches, or smart meters.

2.4.1 Example 1: Nodal Current and Voltage Measurements

We assume data is obtained from a short time interval over which the unknown parameters in the network are time-invariant. denotes the nodal admittance matrix of the network and is defined

 Yi,j:={−yi,jif i≠jyi+∑k≠iyi,kif i=j (2)

where is the admittance of line and is the self-admittance of bus . Note that if two buses are not connected then .

The corresponding generator and measurement matrices are formed by simultaneously measuring both currents (or equivalently, power injections) and voltages at each node and at each time step. For each , the nodal current injections are collected in an -dimensional random vector . Concatenating the into a matrix we get . The generator matrix is constructed analogously. Each pair of measurement vectors from and must satisfy Kirchhoff’s and Ohm’s laws,

 It=YVt,t=1,…,m. (3)

In matrix notation (3) is equivalent to , which is a noiseless version of the linear system defined in (1).

Compared with only obtaining one of the current, power injection or voltage measurements (for example, as in [43, 42, 32]), collecting simultaneous current-voltage pairs doubles the amount of data to be acquired and stored. There are benefits however. First, exploiting the physical law relating voltage and current not only enables us to identify the topology of a power network but also recover the parameters of the admittance matrix. Furthermore, dual-type measurements significantly reduce the sample complexity for topology recovery, compared with the results for single-type measurements.

2.4.2 Example 2: Nodal Power Injections and Phase Angles

Similar to the previous example, at each time , denote by and the active nodal power injection and the phase of voltage at node respectively. The matrices and are constructed in a similar way by concatenating the vectors and . The matrix representation of the DC power flow model can be expressed as a linear system , which belongs to the general class represented in (1). Here, the diagonal matrix is the susceptence matrix whose -th diagonal entry represents the susceptence on the -th edge in and is the node-to-link incidence matrix of the graph. The vertex-edge incidence matrix222Although the underlying network is a directed graph, when considering the fundamental limit for topology identification, we still refer to the recovery of an undirected graph . is defined as

 Cj,e:=⎧⎨⎩1,if bus j is the source of % e−1,if bus j is the target of e0,otherwise.

Note that specifies both the network topology and the susceptences of power lines.

2.5 Probability of Error as the Recovery Metric

We define the error criteria considered in this paper. We refer to finding the edge set of via matrices and as the topology identification problem and recovering the graph matrix via matrices and as the parameter reconstruction problem.

Definition 2.3.

Let be a function or algorithm that returns an estimated graph matrix given inputs and . The probability of error for topology identification is defined to be the probability that the estimated edge set is not equal to the correct edge set:

 εT:=P(∃ i≠j:sign(Xi,j)≠sign(Yi,j(G))) (4)

where the probability is taken over the randomness in , and . The probability of error for (noiseless333In this exploratory work, we assume the measurements are noiseless and algorithms seek to recover each entry of the graph matrix exactly. When the measurements are noisy, Theorem 3.1 provides general converse results as trade-offs between the number of measurement needed and the probability of error defined in (5).) parameter reconstruction is defined to be the probability that the estimate is not equal to the original graph matrix :

 εP:=supY∈Y(G)P(X≠Y(G)) (5)

where is the set of all graph matrices that are consistent with the underlying graph and the probability is taken over the randomness in and .

Remark 2.

Note that for a fixed noiseless parameter reconstruction algorithm, we always have the corresponding greater than . We use as the error metric in this work and refer it as the probability of error considered in the remainder of this paper.

We discuss fundamental trade-offs of the parameter recovery problem defined in Section 2.2 and 2.3. The converse result is summarized in Theorem 3.1 as an inequality involving the probability of error, the distributions of the underlying graph, generator matrix and noise. Next, in Section 3.2, we focus on a particular two-stage scheme, and show in Theorem 3.2 that under certain conditions, the probability of error is asymptotically zero (in ).

3.1 Converse

The following theorem states the fundamental limit.

Theorem 3.1 (Converse).

The probability of error for topology identification is bounded from below as

 εT≥1−H(B)−H(Z)+ln2H(Gn) (6)

where , and are differential entropy (in base ) functions of the random variables , and probability distribution , respectively.

Proof.

The graph is chosen from a discrete set according to some probability distribution . As previously introduced, Fano’s inequality [22] borrowed plays an important role in deriving fundamental limits. We especially focus on its extended version. Similar generalizations appear in many places, e.g., [4, 40] and [24].

Lemma 1 (Generalized Fano’s inequality).

Let be a random graph and let and be measurement matrices defined in Section 2.2 and 2.3. Suppose the original graph is selected from a nonempty candidacy set according to a probability distribution . Let denote the estimated graph. Then the conditional probability of error for estimating from given is always bounded from below as

 P(^G≠G∣∣A)≥1−I(G;B∣∣A)+ln2H(Gn) (7)

where the randomness is over the selections of the original graph and the estimated graph .

In (7), the term denotes the conditional mutual information (base ) between and conditioned on , which is defined as

where the integrals are both taken over . Furthermore, the conditional mutual information is bounded from above by the differential entropies of and . It follows that

 I(G;B|A) =H(B|A)−H(B|G,A) (8) ≤H(B|A)−H(B|Y,A) (9) =H(B|A)−H(Z) (10) ≤H(B)−H(Z). (11)

Here, Eq. (8) follows from the definitions of mutual information and differential entropy. Moreover, knowing , the graph can be inferred. Thus, yields (9). Recalling the linear system in (1), we obtain (10). Furthermore, (11) holds since .

Plugging (11) into (7),

 εP≥εT=

which yields the desired (6). ∎

3.2 Achievability

In this subsection, we consider the achievability for noiseless parameter reconstruction. The proofs rely on constructing a two-stage recovery scheme (Algorithm 1), which contains two steps – column-retrieving and consistency-checking. The worst-case running time of this scheme depends on the underlying distribution 444Although for certain distributions, the computational complexity is not polynomial in , the scheme still provides insights on the fundamental trade-offs between the number of samples and the probability of error for recovering graph matrices. Furthermore, motivated by the scheme, a polynomial-time heuristic algorithm is provided in Section 6 and experimental results are reported in Section 7.. The scheme is presented as follows.

3.2.1 Two-stage Recovery Scheme

Retrieving columns

In the first stage, using -norm minimization, we recover each column of based on  (1) (with no noise):

 minimize ∣∣∣∣Xj∣∣∣∣ℓ1 (12) subject to BXj=Aj, (13) Xj∈Fn. (14)

Let be a length- column vector consisting of coordinates in , the -th retrieved column. We do not restrict the methods for solving the -norm minimization in (12)-(14), as long as there is a unique solution for sparse columns with fewer than non-zeros (the parameter is defined in Definition 3.1 below).

Checking consistency

In the second stage, we check for error in the decoded columns using the symmetry property of the graph matrix . Specifically, we fix a subset with a given size for some integer . Then we check if for all . If not, we choose a different set of the same size. This procedure stops until either we find such a subset of columns, or we go through all possible subsets without finding one. In the latter case, an error is declared and the recovery is unsuccessful. In the former case, we accept , as correct. For each vector , we accept its entries , as correct and use them to compute the other entries , of using (13):

 BScXScj=Aj−BSXSj,j∈Sc. (15)

We combine and to obtain a new estimate for each . Together with the columns , , that we have accepted, they form the estimated graph matrix .

3.2.2 (μ,K)-sparse Distribution

We now analyze the sample complexity of the two-stage scheme. Let denote the degree of node in . Denote by the set of nodes having degrees greater than the threshold parameter :

 NLarge(μ):={j∈N:dj(G)>μ} (16)

and Making use of (16), we define the following set of graphs, with a counting parameter :

 Gn(μ,K) :={G∈Gn:∣∣NLarge(μ)∣∣≤K}.

The following definition characterizes graph distributions.

Definition 3.1 (Sparse Distribution).

A graph distribution defined on is said to be -sparse if assuming ,

 PG(G∉Gn(μ,K))≤ρ. (17)

The following lemmas provide examples of sparse distributions. Denote by the uniform distribution on the set of all trees with nodes.

Lemma 2.

For any and , the distribution is -sparse.

Denote by the graph distribution for the Erdős-Rényi model.

Lemma 3.

For any and , the distribution is -sparse.

The threshold and counting parameters for both examples are tight, as indicated in Corollary 4.1 and 4.2. The proofs of Lemma 2 and 3 are postponed to Appendix .4.

3.2.3 Analysis of the Scheme

We now present another of our main theorems, which makes use of the restricted isometry property (cf., [13, 39]). Given a generator matrix , the corresponding restricted isometry constant denoted by is the smallest positive number with

for some constant and for all subsets of size and all .

Denote by the smallest number of columns in the matrix that are linearly dependent (see [21] for the requirements on the spark of the generator matrix to guarantee desired recovery criteria). Consider the models defined in Section 2.2 and 2.3.

Theorem 3.2 (Achievability).

Suppose the generator matrix has restricted isometry constants and satisfying and furthermore, . If the distribution is -sparse, then the probability of error for the two-stage scheme to recover a graph matrix of satisfies .

Proof.

First, the theory of compressed sensing (see [13, 39]) implies that if the generator matrix has restricted isometry constants and satisfying , then all columns with are correctly recovered using the minimization in (12)-(14). It remains to show that the consistency-check in our scheme works, which is summarized as the following lemma.

Lemma 4 (Consistency-check).

Suppose the matrix has restricted isometry constants and satisfying . Furthermore, suppose . If , then the collection of columns passing the consistency-check such that for all , are correctly decoded and together with (15), the two-stage scheme always returns the original (correct) graph matrix.

The proof of Lemma 4 can be found in Appendix .3. Making use of Lemma 4, it follows that provided . In agreement with the assumption that the distribution is -sparse, (17) must be satisfied. Thus, the probability of error is less than . ∎

4 Gaussian IID Measurements

In this section, we consider a special regime when the measurements in the matrix are Gaussian IID random variables. Utilizing the converse in Theorem 3.1 and the achievability in Theorem 3.2, the Gaussian IID assumption allows the derivation of explicit expressions of sample complexity as upper and lower bounds on the number of measurements . Combining with the results in Lemma 2 and 3, we are able to show that for the corresponding lower and upper bounds match each other for graphs distributions and (with certain conditions on and ).

For the convenience of presentation, in the remainder of the paper, we restrict that the measurements are chosen from , although the theorems can be generalized to the complex measurements. In realistic scenarios, for instance, a power network, besides the measurements collected from the nodes, nominal state values, e.g., operating current and voltage measurements are known to the system designer a priori. Representing the nominal values at the nodes by and respectively, the measurements in and are centered around matrices and defined as

The rows in and are the same, because the graph parameters are time-invariant, so are the nominal values. Without system fluctuations and noise, the nominal values satisfy the linear system in (1), i.e.,

 ¯¯¯¯¯A=¯¯¯¯¯BY. (18)

Knowing and is not sufficient to infer the network parameters (the entries in the graph matrix ), since the rank of the matrix is one. However, measurement fluctuations can be used to facilitate the recovery of . The deviations from the nominal values are denoted by additive perturbation matrices and such that Similarly, where is an matrix consisting of additive perturbations. Thus, putting (3) and (18) together, the equations above imply that leading to where we extract the perturbation matrices and . We specifically consider the case when the additive perturbations

is a matrix with Gaussian IID entries. Without loss of generality, we suppose the mean is zero and the variance is one. For simplicity, in the remainder of this paper, we slightly abuse the notation and replace the perturbations matrices

and by and (we assume that is Gaussian IID), if the context is clear. Moreover, throughout this section, we focus on the case when the measurements are noiseless.

The next theorem implies that Gaussian IID random variables are not arbitrary selections. They are the most “informative” measurements in the sense that any measurement vector with fixed mean and covariance achieves the maximal entropy with normal distribution.

Theorem 4.1.

Suppose the row measurements of the generator matrix are identically distributed random vectors with zero mean and covariance . The probability of error is bounded from below as

 εP≥ 1−[mln((2πe)2ndetK)+ln2]/H(Gn) (19)

for noiseless recovery where is the differential entropy (in base ) of the graph distribution .

Remark 3.

It can be inferred from the theorem that the number of samples must be at least linear in to ensure a small probability of error, the size of the graph, given that the graph, as a mesh network, is chosen uniformly at random from (see Example 2.1 (a)). On the other hand, as corollaries, under the assumptions of Gaussian IID measurements, is necessary for making the probability of error less or equal to , if the graph is chosen uniformly at random from ; is necessary if the graph is sampled according to , as in Examples 2.1 (b) and (c), respectively. The theorem can be generalized to complex measurements by adding additional multiplicative constants.

Proof.

The proof is based on Theorem 3.1. The key fact used is that the entropy is maximized when is distributed normally with zero mean and covariance , for all ,

 H(B)≤m∑t=1H(B(t)) ≤12mln((2πe)2ndetK). (20)

Substituting the above into Theorem 3.1 gives (19). ∎

4.1 Sample Complexity for Sparse Distributions

We consider the worst-case sample complexity for recovering graphs generated according to a sequence of sparse distributions, defined similarly as Definition 3.1 to characterize asymptotic behavior of graph distributions.

Definition 4.1 (Sequence of Sparse Distributions).

A sequence of graph distributions is said to be -sparse if assuming a sequence of graphs is chosen as , the sequences and guarantee that

 limn→∞PGn(Gn∉Gn(μ(n),K(n)))=0. (21)

In the remaining contexts, we sometime write and as and for simplicity. Based on the sequence of sparse distributions we defined above, we show the following theorem, which provides upper and lower bounds on the worst-case sample complexity, with Gaussian IID measurements.

Theorem 4.2 (Worst-case Sample Complexity).

Suppose that the generator matrix has Gaussian IID entries with mean zero and variance one and assume the sequences and satisfy and for all . For any sequence of distributions that is -sparse, the two-stage scheme guarantees that using measurements. Conversely, there exists a -sparse sequence of distributions such that the number of measurements must satisfy to make the probability of error less than for all .

Remark 4.

The upper bound on that we are able to show differs from the lower bound by a sub-linear term . In particular, when the term dominates , the lower and upper bounds become tight up to a multiplicative factor.

Proof.

The first part is based on Theorem 3.2. Under the assumption of the generator matrix , using Gordon’s escape-through-the-mesh theorem, Theorem in [39] implies that for any columns with are correctly recovered using the minimization in (12)-(14) with probability at least , as long as the number of measurements satisfies , and (if , the multiplicative constant increases but our theorem still holds). Similar results were first proved by Candes, et al. in [13] (see their Theorem ). Therefore, applying the union bound, the probability that all the -sparse columns can be recovered simultaneously is at least . On the other hand, conditioned on that all the -sparse columns are recovered, Theorem 3.2 shows that is sufficient for the two-stage scheme to succeed. Since each entry in is an IID Gaussian random variable with zero mean and variance one, if , with probability one that the spark of is greater than , verifying the statement.

The converse follows directly from Theorem 4.1. Consider the uniform distribution on . Then . Let be parameters such that . To bound the size of , we partition into and with and . First, we assume that the nodes in form a -regular graph. For each node in , construct edges and connect them to the other nodes in with uniform probability. A graph constructed in this way always belongs to , unless the added edges create more than nodes with degrees larger than . Therefore, as ,

 |Gn(μ,K)|≥ ρ⋅e1/4(N−1ϕ)N((N2)ϕN/2)(N(N−1)ϕN)⋅(n−1M)αK (22)

where , and . The first term denotes the fraction of the constructed graphs that are in . The second term in (22) counts the total number of -regular graphs [33], and the last term is the total number of graphs created by adding new edges for the nodes in . If , there exists a constant small enough such that . If , for any fixed node in , the probability that its degree is larger than is

 αK∑i=ϕ+1(αKi)βi(1−β)αK−i≤αK∑i=ϕ+1αKh(iαK)βi≤(αK)2βϕ+1

where is in base . Take and . The condition guarantees that . Letting be the assignment function for each node in , we check that

 (αK)2βϕ+1≤14n≤ϝ(n)⋅(1−1ϝ(n))N≤1en.

Therefore, applying the Lovász local lemma, the probability that all the nodes in have degree less than or equal to can be bounded from below by if , which furthermore is a lower bound on . Therefore, taking the logarithm,

 H(UGn(μ,K))≥ (23) = Ω(n2h(ε)+n1−3/μK) (24)

where . In (23), we have used Stirling’s approximation and the assumption that . Continuing from (24), since , for sufficiently large ,

 H(UGn(μ,K))=Ω(nμlognμ+n1−3/μK). (25)

Substituting (25) into (19) and noting that , when , it must hold that

 m=Ω(μlog(n/μ)+K/n3/μ)

to ensure that is smaller than . ∎

4.1.1 Uniform Sampling of Trees

As one of the applications of Theorem 4.2, we characterize the sample complexity of the uniform sampling of trees.

Corollary 4.1.

Suppose that the generator matrix has Gaussian IID entries with mean zero and variance one and assume . There exists an algorithm that guarantees using measurements. Conversely, the number of measurements must satisfy to make the probability of error less than .

Sketch of Proof: The achievability follows from combining Theorem 4.2 and Lemma 2, by setting . Substituting into (19) yields the desired result for converse.∎

4.1.2 Erdős-Rényi (n,p) model

Similarly, the following corollary is shown by recalling Lemma 3.

Corollary 4.2.

Assume with . Under the same conditions in Corollary 4.1, there exists an algorithm that guarantees using measurements. Conversely, the number of measurements must satisfy to make the probability of error less than .

Sketch of Proof: Taking and , we check that and . The assumptions on guarantee that , whence . The choice of and makes sure that the sequence of distributions is -sparse. Theorem 4.2 implies that is sufficient for achieving a vanishing probability of error. For the second part of the corollary, substituting into (19) yields the desired result.∎

5 Structure-based Parameter Recovery

Often in practice, some prior information of the graph topology is available. For example, in a power system, besides knowing that the transmission network is a radial network, if in addition, we are able to know from locational and geographical information or past records that some of the nodes in are not connected through a power line, then the size of the candidacy set becomes smaller, allowing a potential improvement on sample complexity. Applying the Kirchhoff’s theorem (c.f. [45]) stated below, our results are extended to practical situations.

Theorem 5.1 (Kirchhoff’s Theorem).

Let be a connected graph with labeled nodes. Then the number of spanning trees denoted by is given by the product of

and all non-zero eigenvalues of the (unnormalized) Laplacian matrix of

:

 κ(H)=1nλ1λ2⋯λn−1=det(L′H) (26)

where denotes the reduced Laplacian of (cofactor) by deleting the first column and row from the Laplacian matrix .

Therefore, if we know a priori that the topology to be recovered is a spanning tree lying in some known underlying graph , then the size of the candidacy set is given by . Let denote the uniform distribution on . As a remark, when we have no additional information of the underlying graph and we only know is a spanning tree, becomes the complete graph with nodes and . The following corollary is obtained as a direct application.

Corollary 5.1.

Under the same assumption of Corollary 4.1, if , then the number of measurements must satisfy

 m=Ω(1nlog(n−1∏j=1λj))

to make the probability of error less than . Here, denote the non-zero eigenvalues of the Laplacian matrix of .

Sketch of Proof: The proof follows along the same lines as those of Corollary 4.1 and 4.2. Putting into (14) gives the bound.∎

The next achievability follows straightforward by noting that the number of unknown entries in each -th column of the graph matrix is at most .

Corollary 5.2.

Under the same assumption of Corollary 4.1, if , then the following upper bound on the number of measurements is sufficient to achieve a vanishing probability of error :

Here, denote the -th diagonal entry of the (unnormalized) Laplacian matrix of .

6 Heuristic Algorithm

We present in this section an algorithm motivated by the consistency-checking step in the proof of achievability (see Section 3.2). Instead of checking the consistency of each subset of consisting of nodes, as the two-stage scheme does and which requires operations, we compute an estimate for each column of the graph matrix independently and then assign a score to each column based on its symemtric consistency with respect to the other columns in the matrix. The lower the score, the closer the estimate of the matrix column is to the ground truth . Using a scoring function we rank the columns, select a subset of them to be ”correct”, and then eliminate this subset from the system. The size of the subset determines the number of iterations. Heuristically, this procedure results in a polynomial-time algorithm to compute an estimate of the graph matrix .

The algorithm proceeds in four steps.

6.0.1 Step 1. Initialization

Let matrices and be given and set the number of columns fixed in each iteration to be an integer such that . For the first iteration, set , , and .

For each iteration , we perform the remaining three stages. The system dimension is reduced by after each iteration.

6.0.2 Step 2. Independent ℓ1-minimization

For all , we solve the following -minimization:

 Xj(r)=arg