1 Introduction
1.1 Background
Symmetric matrices are ubiquitous constructs in graphical models with examples such as the adjacency matrix and the (generalized) Laplacian of an undirected graph. A major challenge in graph learning is inferring graph parameters embedded in those graphbased matrices from historical data or realtime measurements. In contrast to traditional statistical inference methods [16, 17, 42], modelbased graph learning, such as physicallymotivated models and graph signal processing (GSC) [20] takes advantage of additional data structures offered freely by nature. Among different measurement models for graph learning, linear models have been used and analyzed commonly for different tasks, e.g.,, linear structural equation models (SEMs) [27, 26], linear graph measurements [5], generalized linear cascade models [38], etc. Despite extra efforts required on data collection, processing and storage, modelbased graph learning often guarantees provable sample complexity, which, for most of the time, is significantly lower than the empirical number of measurements needed with traditional inference methods. In many problem settings, having computationally efficient algorithms with low sample complexity is important. One reason for this is that the graph parameters may change in a short timescale, making sample complexity a vital metric to guarantee that the learning can be accomplished with limited number of measurements.
Taking a modern power system as a concrete example, due to the increasing size of distributed energy resources, the network parameters are subject to rapid changes. The necessity for preventing cascading failure also sometime involves reforming network connectivity and thus undesirably destabilizes the system [28]. Constrained by these issues arising out of the megatrends, analyzing fundamental limits of parameter identification and designing a corresponding scheme that is efficient in both computational and sample complexity become more and more critical. In addition to the applications in the scenarios when stable measurements are scarce resources, understanding sample complexity and having a practical identification scheme can furthermore bridge the theorytoapplication gap and benefit existing algorithms in electric grids. For instance, many realtime or nearly realtime graph algorithms based on temporal data, such as (realtime) optimal power flow [36, 34, 44], realtime contingency analysis [35] and frequency control [29] etc
., either require full and accurate knowledge of the network, or can be improved if certain estimates are (partially) accessible.
In this work, we consider a general graph learning problem where the measurements and underlying matrix to be recovered can be represented as or approximated by a linear system. A graph matrix with respect to an underlying graph (see Definition 2.1) is defined as an symmetric matrix with each nonzero th entry corresponding to an edge connecting node and node where is the number of nodes of the underlying undirected graph. The diagonal entries can be arbitrary. The measurements are summarized as two () real or complex matrices and satisfying
(1) 
where denotes additive noise.
We focus on the following problems:

Fundamental Tradeoffs What is the minimum number of linear measurements required for reconstructing the symmetric matrix ? Is there an algorithm asymptotically achieving recovery with the minimum number of measurements? As a special case, can we characterize the sample complexity when the measurements are Gaussian IID^{1}^{1}1This means the entries of the matrix
are IID normally distributed.
? 
Applications to Electrical Grids Do the theoretical guarantees on sample complexity result in a practical algorithm (in terms of both sample and computational complexity) for recovering electric grid topology and parameters?
1.2 Related Work
1.2.1 Graph Learning Aspects
Algorithms for learning sparse graphical model structures have a rich tradition in previous literature. For general MRFs, learning the underlying graph structures is known to be NPhard [11]. However, in the case when the underlying graph is a tree, the classical ChowLiu algorithm [16] offers an efficient approach to structure estimation. Recent results contribute to an extensive understanding of the ChowLiu algorithm. The authors in [42] analyzed the error exponent and showed experimental results for chain graphs and star graphs. For pairwise binary MRFs with bounded maximum degree, [40] provides sufficient conditions for for correct graph selection. Similar achievability results for Ising models are in [7]. Modelbased graph learning has been emerging recently and assuming the measurements form linear SEMs, the authors in [27, 26] showed theoretical guarantees of the sample complexity for learning a directed acyclic graph (DAG) structure, under mild conditions on the class of graphs.
From a converse perspective, informationtheoretic tools have been widely applied to derive fundamental limits for learning graph structures. For a Markov random field (MRF) with bounded maximum degree, [40] derived necessary conditions on the number of samples for estimating the underlying graph structure using Fano’s inequality (see [22]). For Ising models, [8] combined Fano’s inequality with typicality to derive weak and strong converse. Similar techniques have also been applied to Gaussian graphical models [6]
[25]. Fundamental limits for noisy compressed sensing have been extensively studied in [4] under an informationtheoretic framework.1.2.2 Parameter Identification of Power Systems
Graph learning has been widely used electric grids applications, such as state estimation [14, 15] and topology identification [41, 19]. Most of the literature focuses on topology identification or change detection, but there is not much recent work on joint topology and parameter recovery, with notable exceptions of [31, 46, 37]. Moreover, there is little exploration on the fundamental performance limits (estimation error and sample complexity) on topology and parameter identification of power networks, with the exception of [48] where a sparsity condition is provided for exact recovery of outage lines. Based on singletype measurements (either current or voltage), correlation analysis has been applied for topology identification [43, 32, 12]
. Approximating the measurements as normal distributed random variables,
[41] proposed an approach for topology identification with limited measurements. A graphical learningbased approach was provided by [18]. Recently, datadriven methods were studied for parameter estimation [46]. In [47], a similar linear system as (3) has been used combined with regression to recover the symmetric graph parameters (which is the admittance matrix in the power network) where the matrix is of full column rank, implying that at least measurements are necessary. Sparse recovery ([13, 39]), however suggests that recovering the graph matrix may take much fewer number of measurements by fully utilizing the sparsity of . Some experimental results for recovering topology of a power network based on compressed sensing algorithms are reported in [9]. Nonetheless, in the worst case, some of the columns (or rows) ofmay be dense vectors consisting of many nonzeros, prohibiting us from applying compressed sensing algorithms to recover each of the columns (or rows) of
separately. Moreover, the columns to be recovered may not share the same support set. Thus many distributed compressed sensing schemes (cf. [10]) are not directly applicable in this situation. This motivates us to handle the difficulty that for a randomly chosen graph, some of the columns (or rows) in the corresponding graph matrix may not be sparse by considering a new twostage recovery scheme.1.3 Our Contributions
We demonstrate that the linear system in (1) can be used to learn the topology and parameters of a graph. Our framework can be applied to perform system identification in electrical grids by leveraging synchronous nodal current and voltage measurements obtained from phasor measurement units (PMUs).
The main results of this paper are summarized here.

Fundamental Tradeoffs: In Theorem 3.1, we derive a general lower bound on the probability of error for topology identification (defined in (4)). In Section 3.2, we describe a simple twostage recovery scheme combining norm minimization with an additional step called consistencychecking. For any arbitrarily chosen distribution, we characterize it using the definition of sparsity (see Definition 3.1) and argue that if a graph is drawn according to such a distribution, then the number of measurements required for exact recovery is bounded from above as in Theorem 3.2.

(Worstcase) Sample Complexity: We focus on the case when the matrix has Gaussian IID entries in Section 4. Under this assumption, we provide upper and lower bounds on the worstcase sample complexity in Theorem 4.2. We show two applications of Theorem 4.2 for the uniform sampling of trees and the ErdősRényi model in Corollary 4.1 and 4.2, respectively.
1.4 Outline of the Paper
The remaining content is organized as follows. In Section 2, we specify our models. In Section 3.1, we present the converse result as fundamental limits for recovery. The achievability is provided in 3.2. We present our main result as the worstcase sample complexity for Gaussian IID measurements in Section 4. A heuristic algorithm together with simulation results are reported in Section 6 and 7.
2 Model and Definitions
2.1 Notation
Let denote a field that can either be the set of real numbers , or the set of complex numbers . The set of all symmetric matrices whose entries are in is denoted by . The imaginary unit is denoted by . Throughout the work, let denote the binary logarithm with base and let denote the natural logarithm with base . We use
to denote the expectation of random variables if the underlying probability distribution is clear. The mutual information is denoted by
. The (differential) entropy is denoted by and in particular, we use for binary entropy. To distinguish random variables and their realizations, we follow the convention and denote the former by capital letters (e.g., ) and the latter by lower case letters (e.g., ). The symbol is used to designate a constant.Matrices are denoted in boldface (e.g., , and ). The th row, the th column and the th entry of a matrix are denoted by , and respectively. For notational convenience, let be a subset of . Denote by the complement of and by a submatrix consisting of columns of the matrix whose indices are chosen from . The notation denotes the transpose of a matrix, calculates its determinant. For the sake of notational simplicity, we use big O notation (,,,,) to quantify asymptotic behavior. Table 1 summarizes the notation used throughout the paper.
Graph Model  

Number of nodes  
Degree of node  
Set of nodes  
Set of edges  
Underlying random graph  
Candidacy set consisting of graphs with vertices  
Probability distribution of in  
symmetric matrix to be recovered  
Measurements  
Number of measurements  
Set of measurement indexes  
Matrices of Measurements  
th row of the matrix  
th column of the matrix  
th entry of the matrix  
Additive output noise  
Others  
Set of real numbers  
Set of complex numbers  
Either or  
Imaginary unit  
Sign function 
2.2 Graphical Model
Denote by a set of nodes and consider an undirected graph (with no selfloops) whose edge set contains the desired topology information. The degree of each node is denoted by . The connectivity between the nodes is unknown and our goal is to determine it by learning the associated graph matrix using linear measurements.
Definition 2.1 (Graph Matrix).
Provided with an underlying graph , a symmetric matrix is called a graph matrix if the following conditions hold:
Remark 1.
Our theorems can be generalized to recover a broader class of symmetric matrices, as long as the matrix to be recovered satisfies (1) Knowing gives the full knowledge of the topology of ; (2) The number of nonzero entries in a column of has the same order as the degree of the corresponding node, i.e., there is a positive constant such that . for all . To have a clear presentation, we consider specifically the case .
In this work, we employ a probabilistic model and assume that the graph is chosen randomly from a candidacy set (with nodes), according to some distribution . Both the candidacy set and distribution are not known to the estimator. For simplicity, we often omit the subscripts of and .
Example 2.1.
We exemplify some possible choices of the candidacy set and distribution:

(Mesh Network) When represents a transmission (mesh) power network and no prior information is available, the corresponding candidacy set consisting of all graphs with nodes and is selected uniformly at random from . Moreover, in this case.

(Radial Network) When represents a distribution (radial) power network and no other prior information available, then the corresponding candidacy set is a set containing all spanning trees of the complete graph with buses and is selected uniformly at random from ; the cardinality is followed by Cayley’s formula.

(Radial Network with Prior Information) When represents a distribution (radial) power network, and we further know that some of the buses cannot be connected (which may be inferred from locational/geographical information), then the corresponding candidacy set is a set of spanning trees of a subgraph with buses. An edge if and only if we know . The size of is given by Kirchhoff’s matrix tree theorem (c.f. [45]). See Theorem 5.1.

(ErdősRényi model) In a more general setting, can be a random graph chosen from an ensemble of graphs according to a certain distribution. When a graph is sampled according to the ErdősRényi model, each edge of is connected IID with probability . We denote the corresponding graph distribution for this case by for convenience.
The next section is devoted to describing available measurements.
2.3 Linear System of Measurements
Suppose the measurements are sampled discretely and indexed by the elements of the set . As a general framework, the measurements are collected in two matrices and and defined as follows.
Definition 2.2 (Generator and Measurement Matrices).
Let be an integer with . The generator matrix is an random matrix and the measurement matrix is an matrix with entries selected from that satisfy the linear system (1):
where is a graph matrix to be recovered, with an underlying graph and denotes the random additive noise. We call the the recovery noiseless if . Our goal is to resolve the matrix based on given matrices and .
2.4 Applications to Electrical Grids
Various applications fall into the framework in (1). Here we present two examples of the graph identification problem in power systems. The measurements are modeled as time series data obtained via nodal sensors at each node, e.g., PMUs, smart switches, or smart meters.
2.4.1 Example : Nodal Current and Voltage Measurements
We assume data is obtained from a short time interval over which the unknown parameters in the network are timeinvariant. denotes the nodal admittance matrix of the network and is defined
(2) 
where is the admittance of line and is the selfadmittance of bus . Note that if two buses are not connected then .
The corresponding generator and measurement matrices are formed by simultaneously measuring both currents (or equivalently, power injections) and voltages at each node and at each time step. For each , the nodal current injections are collected in an dimensional random vector . Concatenating the into a matrix we get . The generator matrix is constructed analogously. Each pair of measurement vectors from and must satisfy Kirchhoff’s and Ohm’s laws,
(3) 
In matrix notation (3) is equivalent to , which is a noiseless version of the linear system defined in (1).
Compared with only obtaining one of the current, power injection or voltage measurements (for example, as in [43, 42, 32]), collecting simultaneous currentvoltage pairs doubles the amount of data to be acquired and stored. There are benefits however. First, exploiting the physical law relating voltage and current not only enables us to identify the topology of a power network but also recover the parameters of the admittance matrix. Furthermore, dualtype measurements significantly reduce the sample complexity for topology recovery, compared with the results for singletype measurements.
2.4.2 Example : Nodal Power Injections and Phase Angles
Similar to the previous example, at each time , denote by and the active nodal power injection and the phase of voltage at node respectively. The matrices and are constructed in a similar way by concatenating the vectors and . The matrix representation of the DC power flow model can be expressed as a linear system , which belongs to the general class represented in (1). Here, the diagonal matrix is the susceptence matrix whose th diagonal entry represents the susceptence on the th edge in and is the nodetolink incidence matrix of the graph. The vertexedge incidence matrix^{2}^{2}2Although the underlying network is a directed graph, when considering the fundamental limit for topology identification, we still refer to the recovery of an undirected graph . is defined as
Note that specifies both the network topology and the susceptences of power lines.
2.5 Probability of Error as the Recovery Metric
We define the error criteria considered in this paper. We refer to finding the edge set of via matrices and as the topology identification problem and recovering the graph matrix via matrices and as the parameter reconstruction problem.
Definition 2.3.
Let be a function or algorithm that returns an estimated graph matrix given inputs and . The probability of error for topology identification is defined to be the probability that the estimated edge set is not equal to the correct edge set:
(4) 
where the probability is taken over the randomness in , and . The probability of error for (noiseless^{3}^{3}3In this exploratory work, we assume the measurements are noiseless and algorithms seek to recover each entry of the graph matrix exactly. When the measurements are noisy, Theorem 3.1 provides general converse results as tradeoffs between the number of measurement needed and the probability of error defined in (5).) parameter reconstruction is defined to be the probability that the estimate is not equal to the original graph matrix :
(5) 
where is the set of all graph matrices that are consistent with the underlying graph and the probability is taken over the randomness in and .
Remark 2.
Note that for a fixed noiseless parameter reconstruction algorithm, we always have the corresponding greater than . We use as the error metric in this work and refer it as the probability of error considered in the remainder of this paper.
3 Fundamental Tradeoffs
We discuss fundamental tradeoffs of the parameter recovery problem defined in Section 2.2 and 2.3. The converse result is summarized in Theorem 3.1 as an inequality involving the probability of error, the distributions of the underlying graph, generator matrix and noise. Next, in Section 3.2, we focus on a particular twostage scheme, and show in Theorem 3.2 that under certain conditions, the probability of error is asymptotically zero (in ).
3.1 Converse
The following theorem states the fundamental limit.
Theorem 3.1 (Converse).
The probability of error for topology identification is bounded from below as
(6) 
where , and are differential entropy (in base ) functions of the random variables , and probability distribution , respectively.
Proof.
The graph is chosen from a discrete set according to some probability distribution . As previously introduced, Fano’s inequality [22] borrowed plays an important role in deriving fundamental limits. We especially focus on its extended version. Similar generalizations appear in many places, e.g., [4, 40] and [24].
Lemma 1 (Generalized Fano’s inequality).
Let be a random graph and let and be measurement matrices defined in Section 2.2 and 2.3. Suppose the original graph is selected from a nonempty candidacy set according to a probability distribution . Let denote the estimated graph. Then the conditional probability of error for estimating from given is always bounded from below as
(7) 
where the randomness is over the selections of the original graph and the estimated graph .
In (7), the term denotes the conditional mutual information (base ) between and conditioned on , which is defined as
where the integrals are both taken over . Furthermore, the conditional mutual information is bounded from above by the differential entropies of and . It follows that
(8)  
(9)  
(10)  
(11) 
Here, Eq. (8) follows from the definitions of mutual information and differential entropy. Moreover, knowing , the graph can be inferred. Thus, yields (9). Recalling the linear system in (1), we obtain (10). Furthermore, (11) holds since .
3.2 Achievability
In this subsection, we consider the achievability for noiseless parameter reconstruction. The proofs rely on constructing a twostage recovery scheme (Algorithm 1), which contains two steps – columnretrieving and consistencychecking. The worstcase running time of this scheme depends on the underlying distribution ^{4}^{4}4Although for certain distributions, the computational complexity is not polynomial in , the scheme still provides insights on the fundamental tradeoffs between the number of samples and the probability of error for recovering graph matrices. Furthermore, motivated by the scheme, a polynomialtime heuristic algorithm is provided in Section 6 and experimental results are reported in Section 7.. The scheme is presented as follows.
3.2.1 Twostage Recovery Scheme
Retrieving columns
In the first stage, using norm minimization, we recover each column of based on (1) (with no noise):
(12)  
(13)  
(14) 
Let be a length column vector consisting of coordinates in , the th retrieved column. We do not restrict the methods for solving the norm minimization in (12)(14), as long as there is a unique solution for sparse columns with fewer than nonzeros (the parameter is defined in Definition 3.1 below).
Checking consistency
In the second stage, we check for error in the decoded columns using the symmetry property of the graph matrix . Specifically, we fix a subset with a given size for some integer . Then we check if for all . If not, we choose a different set of the same size. This procedure stops until either we find such a subset of columns, or we go through all possible subsets without finding one. In the latter case, an error is declared and the recovery is unsuccessful. In the former case, we accept , as correct. For each vector , we accept its entries , as correct and use them to compute the other entries , of using (13):
(15) 
We combine and to obtain a new estimate for each . Together with the columns , , that we have accepted, they form the estimated graph matrix .
3.2.2 sparse Distribution
We now analyze the sample complexity of the twostage scheme. Let denote the degree of node in . Denote by the set of nodes having degrees greater than the threshold parameter :
(16) 
and Making use of (16), we define the following set of graphs, with a counting parameter :
The following definition characterizes graph distributions.
Definition 3.1 (Sparse Distribution).
A graph distribution defined on is said to be sparse if assuming ,
(17) 
The following lemmas provide examples of sparse distributions. Denote by the uniform distribution on the set of all trees with nodes.
Lemma 2.
For any and , the distribution is sparse.
Denote by the graph distribution for the ErdősRényi model.
Lemma 3.
For any and , the distribution is sparse.
3.2.3 Analysis of the Scheme
We now present another of our main theorems, which makes use of the restricted isometry property (cf., [13, 39]). Given a generator matrix , the corresponding restricted isometry constant denoted by is the smallest positive number with
for some constant and for all subsets of size and all .
Denote by the smallest number of columns in the matrix that are linearly dependent (see [21] for the requirements on the spark of the generator matrix to guarantee desired recovery criteria). Consider the models defined in Section 2.2 and 2.3.
Theorem 3.2 (Achievability).
Suppose the generator matrix has restricted isometry constants and satisfying and furthermore, . If the distribution is sparse, then the probability of error for the twostage scheme to recover a graph matrix of satisfies .
Proof.
First, the theory of compressed sensing (see [13, 39]) implies that if the generator matrix has restricted isometry constants and satisfying , then all columns with are correctly recovered using the minimization in (12)(14). It remains to show that the consistencycheck in our scheme works, which is summarized as the following lemma.
Lemma 4 (Consistencycheck).
Suppose the matrix has restricted isometry constants and satisfying . Furthermore, suppose . If , then the collection of columns passing the consistencycheck such that for all , are correctly decoded and together with (15), the twostage scheme always returns the original (correct) graph matrix.
4 Gaussian IID Measurements
In this section, we consider a special regime when the measurements in the matrix are Gaussian IID random variables. Utilizing the converse in Theorem 3.1 and the achievability in Theorem 3.2, the Gaussian IID assumption allows the derivation of explicit expressions of sample complexity as upper and lower bounds on the number of measurements . Combining with the results in Lemma 2 and 3, we are able to show that for the corresponding lower and upper bounds match each other for graphs distributions and (with certain conditions on and ).
For the convenience of presentation, in the remainder of the paper, we restrict that the measurements are chosen from , although the theorems can be generalized to the complex measurements. In realistic scenarios, for instance, a power network, besides the measurements collected from the nodes, nominal state values, e.g., operating current and voltage measurements are known to the system designer a priori. Representing the nominal values at the nodes by and respectively, the measurements in and are centered around matrices and defined as
The rows in and are the same, because the graph parameters are timeinvariant, so are the nominal values. Without system fluctuations and noise, the nominal values satisfy the linear system in (1), i.e.,
(18) 
Knowing and is not sufficient to infer the network parameters (the entries in the graph matrix ), since the rank of the matrix is one. However, measurement fluctuations can be used to facilitate the recovery of . The deviations from the nominal values are denoted by additive perturbation matrices and such that Similarly, where is an matrix consisting of additive perturbations. Thus, putting (3) and (18) together, the equations above imply that leading to where we extract the perturbation matrices and . We specifically consider the case when the additive perturbations
is a matrix with Gaussian IID entries. Without loss of generality, we suppose the mean is zero and the variance is one. For simplicity, in the remainder of this paper, we slightly abuse the notation and replace the perturbations matrices
and by and (we assume that is Gaussian IID), if the context is clear. Moreover, throughout this section, we focus on the case when the measurements are noiseless.The next theorem implies that Gaussian IID random variables are not arbitrary selections. They are the most “informative” measurements in the sense that any measurement vector with fixed mean and covariance achieves the maximal entropy with normal distribution.
Theorem 4.1.
Suppose the row measurements of the generator matrix are identically distributed random vectors with zero mean and covariance . The probability of error is bounded from below as
(19) 
for noiseless recovery where is the differential entropy (in base ) of the graph distribution .
Remark 3.
It can be inferred from the theorem that the number of samples must be at least linear in to ensure a small probability of error, the size of the graph, given that the graph, as a mesh network, is chosen uniformly at random from (see Example 2.1 (a)). On the other hand, as corollaries, under the assumptions of Gaussian IID measurements, is necessary for making the probability of error less or equal to , if the graph is chosen uniformly at random from ; is necessary if the graph is sampled according to , as in Examples 2.1 (b) and (c), respectively. The theorem can be generalized to complex measurements by adding additional multiplicative constants.
Proof.
4.1 Sample Complexity for Sparse Distributions
We consider the worstcase sample complexity for recovering graphs generated according to a sequence of sparse distributions, defined similarly as Definition 3.1 to characterize asymptotic behavior of graph distributions.
Definition 4.1 (Sequence of Sparse Distributions).
A sequence of graph distributions is said to be sparse if assuming a sequence of graphs is chosen as , the sequences and guarantee that
(21) 
In the remaining contexts, we sometime write and as and for simplicity. Based on the sequence of sparse distributions we defined above, we show the following theorem, which provides upper and lower bounds on the worstcase sample complexity, with Gaussian IID measurements.
Theorem 4.2 (Worstcase Sample Complexity).
Suppose that the generator matrix has Gaussian IID entries with mean zero and variance one and assume the sequences and satisfy and for all . For any sequence of distributions that is sparse, the twostage scheme guarantees that using measurements. Conversely, there exists a sparse sequence of distributions such that the number of measurements must satisfy to make the probability of error less than for all .
Remark 4.
The upper bound on that we are able to show differs from the lower bound by a sublinear term . In particular, when the term dominates , the lower and upper bounds become tight up to a multiplicative factor.
Proof.
The first part is based on Theorem 3.2. Under the assumption of the generator matrix , using Gordon’s escapethroughthemesh theorem, Theorem in [39] implies that for any columns with are correctly recovered using the minimization in (12)(14) with probability at least , as long as the number of measurements satisfies , and (if , the multiplicative constant increases but our theorem still holds). Similar results were first proved by Candes, et al. in [13] (see their Theorem ). Therefore, applying the union bound, the probability that all the sparse columns can be recovered simultaneously is at least . On the other hand, conditioned on that all the sparse columns are recovered, Theorem 3.2 shows that is sufficient for the twostage scheme to succeed. Since each entry in is an IID Gaussian random variable with zero mean and variance one, if , with probability one that the spark of is greater than , verifying the statement.
The converse follows directly from Theorem 4.1. Consider the uniform distribution on . Then . Let be parameters such that . To bound the size of , we partition into and with and . First, we assume that the nodes in form a regular graph. For each node in , construct edges and connect them to the other nodes in with uniform probability. A graph constructed in this way always belongs to , unless the added edges create more than nodes with degrees larger than . Therefore, as ,
(22) 
where , and . The first term denotes the fraction of the constructed graphs that are in . The second term in (22) counts the total number of regular graphs [33], and the last term is the total number of graphs created by adding new edges for the nodes in . If , there exists a constant small enough such that . If , for any fixed node in , the probability that its degree is larger than is
where is in base . Take and . The condition guarantees that . Letting be the assignment function for each node in , we check that
Therefore, applying the Lovász local lemma, the probability that all the nodes in have degree less than or equal to can be bounded from below by if , which furthermore is a lower bound on . Therefore, taking the logarithm,
(23)  
(24) 
where . In (23), we have used Stirling’s approximation and the assumption that . Continuing from (24), since , for sufficiently large ,
(25) 
Substituting (25) into (19) and noting that , when , it must hold that
to ensure that is smaller than . ∎
4.1.1 Uniform Sampling of Trees
As one of the applications of Theorem 4.2, we characterize the sample complexity of the uniform sampling of trees.
Corollary 4.1.
Suppose that the generator matrix has Gaussian IID entries with mean zero and variance one and assume . There exists an algorithm that guarantees using measurements. Conversely, the number of measurements must satisfy to make the probability of error less than .
4.1.2 ErdősRényi model
Similarly, the following corollary is shown by recalling Lemma 3.
Corollary 4.2.
Assume with . Under the same conditions in Corollary 4.1, there exists an algorithm that guarantees using measurements. Conversely, the number of measurements must satisfy to make the probability of error less than .
Sketch of Proof: Taking and , we check that and . The assumptions on guarantee that , whence . The choice of and makes sure that the sequence of distributions is sparse. Theorem 4.2 implies that is sufficient for achieving a vanishing probability of error. For the second part of the corollary, substituting into (19) yields the desired result.∎
5 Structurebased Parameter Recovery
Often in practice, some prior information of the graph topology is available. For example, in a power system, besides knowing that the transmission network is a radial network, if in addition, we are able to know from locational and geographical information or past records that some of the nodes in are not connected through a power line, then the size of the candidacy set becomes smaller, allowing a potential improvement on sample complexity. Applying the Kirchhoff’s theorem (c.f. [45]) stated below, our results are extended to practical situations.
Theorem 5.1 (Kirchhoff’s Theorem).
Let be a connected graph with labeled nodes. Then the number of spanning trees denoted by is given by the product of
and all nonzero eigenvalues of the (unnormalized) Laplacian matrix of
:(26) 
where denotes the reduced Laplacian of (cofactor) by deleting the first column and row from the Laplacian matrix .
Therefore, if we know a priori that the topology to be recovered is a spanning tree lying in some known underlying graph , then the size of the candidacy set is given by . Let denote the uniform distribution on . As a remark, when we have no additional information of the underlying graph and we only know is a spanning tree, becomes the complete graph with nodes and . The following corollary is obtained as a direct application.
Corollary 5.1.
Under the same assumption of Corollary 4.1, if , then the number of measurements must satisfy
to make the probability of error less than . Here, denote the nonzero eigenvalues of the Laplacian matrix of .
Sketch of Proof: The proof follows along the same lines as those of Corollary 4.1 and 4.2. Putting into (14) gives the bound.∎
The next achievability follows straightforward by noting that the number of unknown entries in each th column of the graph matrix is at most .
Corollary 5.2.
Under the same assumption of Corollary 4.1, if , then the following upper bound on the number of measurements is sufficient to achieve a vanishing probability of error :
Here, denote the th diagonal entry of the (unnormalized) Laplacian matrix of .
6 Heuristic Algorithm
We present in this section an algorithm motivated by the consistencychecking step in the proof of achievability (see Section 3.2). Instead of checking the consistency of each subset of consisting of nodes, as the twostage scheme does and which requires operations, we compute an estimate for each column of the graph matrix independently and then assign a score to each column based on its symemtric consistency with respect to the other columns in the matrix. The lower the score, the closer the estimate of the matrix column is to the ground truth . Using a scoring function we rank the columns, select a subset of them to be ”correct”, and then eliminate this subset from the system. The size of the subset determines the number of iterations. Heuristically, this procedure results in a polynomialtime algorithm to compute an estimate of the graph matrix .
The algorithm proceeds in four steps.
6.0.1 Step . Initialization
Let matrices and be given and set the number of columns fixed in each iteration to be an integer such that . For the first iteration, set , , and .
For each iteration , we perform the remaining three stages. The system dimension is reduced by after each iteration.
6.0.2 Step . Independent minimization
For all , we solve the following minimization:
Comments
There are no comments yet.