# Learning Graphs from Smooth and Graph-Stationary Signals with Hidden Variables

Network-topology inference from (vertex) signal observations is a prominent problem across data-science and engineering disciplines. Most existing schemes assume that observations from all nodes are available, but in many practical environments, only a subset of nodes is accessible. A natural (and sometimes effective) approach is to disregard the role of unobserved nodes, but this ignores latent network effects, deteriorating the quality of the estimated graph. Differently, this paper investigates the problem of inferring the topology of a network from nodal observations while taking into account the presence of hidden (latent) variables. Our schemes assume the number of observed nodes is considerably larger than the number of hidden variables and build on recent graph signal processing models to relate the signals and the underlying graph. Specifically, we go beyond classical correlation and partial correlation approaches and assume that the signals are smooth and/or stationary in the sought graph. The assumptions are codified into different constrained optimization problems, with the presence of hidden variables being explicitly taken into account. Since the resulting problems are ill-conditioned and non-convex, the block matrix structure of the proposed formulations is leveraged and suitable convex-regularized relaxations are presented. Numerical experiments over synthetic and real-world datasets showcase the performance of the developed methods and compare them with existing alternatives.

## Authors

• 2 publications
• 5 publications
• 15 publications
10/05/2021

### Joint inference of multiple graphs with hidden variables from stationary graph signals

Learning graphs from sets of nodal observations represents a prominent p...
06/30/2014

### Learning Laplacian Matrix in Smooth Graph Signal Representations

The construction of a meaningful graph plays a crucial role in the succe...
05/16/2018

### Semi-Blind Inference of Topologies and Dynamical Processes over Graphs

Network science provides valuable insights across numerous disciplines i...
10/16/2020

### Joint Inference of Multiple Graphs from Matrix Polynomials

Inferring graph structure from observations on the nodes is an important...
08/21/2020

### Graph learning under spectral sparsity constraints

Graph inference plays an essential role in machine learning, pattern rec...
06/28/2018

### Single Index Latent Variable Models for Network Topology Inference

A semi-parametric, non-linear regression model in the presence of latent...
10/19/2021

### Accelerated Graph Learning from Smooth Signals

We consider network topology identification subject to a signal smoothne...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Recent years have witnessed the rise of problems involving datasets with non-Euclidean support. A popular approach to deal with this type of data consists in exploiting graphs to generalize a wide range of classical information-processing techniques to those irregular domains. This graph-based perspective has been successfully applied to a number of applications (with power, communications, social, geographical, genetics, and brain networks being notable examples [19, 42, 44, 30, 33]

) and has attracted the attention of researchers from different areas, including statistics, machine learning and signal processing (SP). For the latter case, graph SP (GSP) has been capable of generalizing a number of tools originally conceived to process signals with regular support (time or space) to signals defined on heterogeneous domains represented by a graph, providing new insights and efficient algorithms

[42, 34, 35, 10, 22]. The core assumption of GSP is that the properties of the graph signals can be explained by the influence of the network, whose topology is codified in the so-called graph-shift operator (GSO), a square matrix whose non-zero entries identify the edges of the graph.

Although networks may exist as physical entities, oftentimes they are abstract mathematical representations with nodes describing variables and links describing pairwise relationships between them. More importantly for the paper at hand, such relationships may not be always known a priori. In the scenarios where the graph is unknown, it is possible to learn the graph from a set of nodal observations under the fundamental assumption that there exists a relationship between the properties of the observed signals and the topology of the sought graph. The described task represents a prominent problem commonly referred to as network topology inference, which is also known as graph learning [17, 31, 38, 37, 24, 36]. Noteworthy approaches include correlation networks [19], partial correlations and (Gaussian) Markov random fields [27, 13, 19, 20], sparse structural equation models [6, 2], GSP-based approaches [25, 11, 31, 38, 37], as well as their non-linear generalizations [18, 41], to name a few.

The standard network-inference approach in the aforementioned works is to assume that observations from all the nodes of the graph are available. In certain environments, however, only observations from a subset of nodes are available, with the remaining nodes being unobserved or hidden. The existence of hidden/latent nodes constitutes a relevant and challenging problem since closely related values from two observed nodes may be explained not only by an edge between the two nodes but by a third latent node connected to both of them. Moreover, because there are no observations from the hidden nodes, modeling their influence renders the network inference problem substantially more challenging and ill-posed. Except for direct pairwise methods, which can be trivially generalized to the setup at hand, most of the existing approaches require important modifications to deal with hidden nodes. Network-inference works that have looked at the problem of hidden variables include examples in the context of Gaussian graphical model selection [8, 48]

, inference of linear Bayesian networks

[1], nonlinear regression [26], or brain connectivity [9] to name a few. Nonetheless, there are still a number of effective network-inference methods (including most in the context of GSP) that have not considered the presence of latent unobserved nodes.

Motivated by the previous discussion, in this paper we approach the problem of network topology inference with hidden variables by leveraging two fundamental concepts of the GSP framework: smoothness [42] and stationarity [23, 32]. A signal being smooth on a graph implies that the signal values at two neighboring nodes are close so that the signal varies slowly across the graph. This fairly general assumption has been successfully exploited to infer the topology of the graph when values from all nodes are observed [11, 16, 47]. From a different perspective, assuming that a random process is stationary on a graph is tantamount to assuming that the covariance matrix of the random process is a polynomial of the GSO, which has been leveraged in the context of network-inference to develop new algorithms and establish important links between graph stationarity and classical correlation and partial-correlation approaches  [38, 40, 29]. Although the assumptions of smoothness and stationarity have been successfully adopted in the context of the network-topology inference problem, a formulation robust to the presence of hidden variables is still missing. To fill this gap, this paper builds on our previous work and investigates how the presence of the hidden variables impacts the classical definitions of graph smoothness and stationarity. Then, it formulates the network-recovery problem as a constrained optimization that accounts explicitly for the modified definitions. A key in our formulation is the consideration of a block matrix factorization approach and exploitation of the low rankness and the sparsity pattern present in the blocks related to hidden variables. A range of formulations are presented and suitable (convex and non-convex) relaxations to deal with the sparsity and low-rank terms are considered. While our focus is to learn the connections among observed nodes, some of our approaches also reveal information related to links involving hidden nodes. A further investigation of this matter is left as future work.

To summarize, our contributions are as follows: (i) we analyze the influence of hidden variables on graph smoothness and graph stationarity; (ii) we propose several optimization problems to solve the topology inference problem with hidden variables when the observed signals are smooth, stationary, or both; and (iii) we present an extensive evaluation of the proposed models through both synthetic and real experiments.

The remainder of the paper is organized as follows. Section II introduces basic GSP concepts leveraged during the paper. Section III formalizes the problem at hand. Sections IV and V respectively detail the proposed topology inference algorithms for smooth and stationary signals, with Section VI combining both assumptions and considering that the signals are both smooth and stationary. The numerical evaluation of the proposed methods is presented in Section VII, and Section VIII provides some concluding remarks.

## Ii Fundamentals of graph signal processing

In this section, we introduce basic GSP concepts that help to explain the relationship between the observed signals and the topology of the underlying graph.

GSO and graph signals. Let be an undirected and weighted graph with nodes where and represent the vertex and edge set, respectively. The weighted adjacency matrix is a sparse matrix encoding the topology of the graph , with capturing the weight of the edge between the nodes and , and with if and are not connected. A general representation of the graph is the GSO , where if and only if or . Typical choices for the GSO are the adjacency matrix [34], the combinatorial graph Laplacian [42], and their degree-normalized variants. Since the graph is undirected, the GSO is symmetric and can be diagonalized as

, where the orthogonal matrix

collects the eigenvectors of the GSO and the diagonal matrix

its eigenvalues. A graph signal can be denoted as a vector

where represents the signal value observed at node . A common tool to model the relationship between the signal and its underlying graph are the graph filters. A graph filter is a linear operator defined as a polynomial of the GSO of the form

 H=L−1∑l=0hlSl=VL−1∑l=0hlΛlV⊤=Vdiag(~h)V⊤, (1)

where the filter degree is , represent the filter coefficients, and denotes the frequency response of the graph filter. Since is a polynomial of , it readily follows that both matrices have the same eigenvectors.

Graph stationarity. A random graph signal is stationary on the graph if it can be represented as the output of a graph filter with a zero mean white signal as input, i.e., the covariance of is and . In turn, if is stationary, then its covariance is given by

 C=E[xx⊤]=HE[ww⊤]H⊤=HH⊤=H2. (2)

In the spectral domain, it can be seen from (2) that the GSO and the covariance matrix share the same eigenvectors  [23, 32, 14]. Therefore, graph stationarity implies that the matrices and commute, i.e., , which is a relevant property to be exploited later on.

Graph smoothness. A graph signal is considered smooth on a graph if the signal value at two connected nodes is “close”, or equivalently, if the difference between the signal value at neighboring nodes is small. A common approach to quantify the smoothness of a graph signal is by means of the quadratic form [17]

 ∑(i,j)∈EAij(xi−xj)2=x⊤Lx, (3)

which quantifies how much the signal changes with respect to the notion of similarity encoded in the weights of . This measure will be referred to as “local variation” () of . Note that, if the goal is to obtain the mean of graph signals collected in the matrix , this can be achieved by computing

 1MM∑m=1x⊤mLxm=1MM∑m=1tr(xmx⊤mL)=tr(^CL), (4)

where denotes the sample estimate of the covariance of .

## Iii Influence of hidden variables in the topology inference model

The current section is devoted to formally posing the topology-inference problem when only observations from a subset of nodes of the graph are available. We present a general formulation and highlight the influence of the hidden variables.

Denote as the collection of signals defined on top of the unknown graph with nodes. Then, we consider that we only observe the values of from a subset of nodes with cardinality . In contrast, the values corresponding to the remaining nodes in the subset stay hidden111With a slight abuse of notation, we use to denote the number of hidden nodes and the square matrix to denote a generic graph filter.. For simplicity and without loss of generality, let the observed nodes correspond to the first nodes of the graph, so the values of the given signals at are collected in the submatrix , which is formed by the first rows of the matrix . As explained in the previous section, these observations can be used to form the sample covariance matrix. When doing so, it is important to notice the matrices and , which respectively represent the GSO and the sample covariance matrix associated with the full graph , and the signals , present the following block structure

 (5)

The matrix denotes the GSO describing the connections between the observed nodes, while the remaining blocks model the edges involving hidden nodes. Similarly, denotes the sample covariance of the observed signals, and the other blocks denote the submatrices of involving signal values from the hidden nodes. Since is undirected, both and are symmetric, and thus, and .

With the previous definitions in place, the problem of graph learning/network topology inference in the presence of hidden variables is formally introduced next.

###### Problem 1

Let be a graph with nodes and GSO , and suppose that are all unknown. Given the nodal subset with cardinality , and the observations corresponding to the values of graph signals observed at the nodes in , find the underlying graph structure encoded in under the assumptions that:
(AS1) The number of hidden variables (nodes) is substantially smaller than the number of observed nodes, i.e., ; and
(AS2) There exists a (known) property relating the full graph signals to the GSO .

Despite having observations from nodes, there are still nodes that remain unseen and influence the observed signals , rendering the inference problem challenging and severely ill-conditioned. To make the problem more tractable, (AS1) ensures that the number of hidden variables is small. Assumption (AS2) is more generic and establishes that there is a known relationship between the graph signals and the full graph . The particular relationship is further developed in the following sections, where we assume that is either smooth (Section IV) or stationary (Section V) on . The key issue to address is how (AS2), which involves the full signals and GSO, translates to the submatrices , , and in (5).

Given the above considerations, a general formulation to solve Problem 1 is as follows

 ^SO= argminSO f(SO) (6) s.t. XO∈X(S), SO∈S,

where is a (preferably convex) function that promotes desirable properties on the sought graph. Typical examples include the norm, the Frobenius norm, the spectral radius, or linear combinations of those [24]. Note that the first constraint in (6) (referred to as observation constraint) takes into account that involves the full matrices and but only is observed. It is also important to remark that, as will be apparent in the following sections, for observations that are either smooth or stationarity in the graph, the constraint can be reformulated in terms of the (sample) covariance matrices and . Regarding the second constraint in (6), the set collects the requirements for to be a specific type of GSO. A typical example is the set of adjacency matrices

 A:={Aij≥0; A=A⊤; Aii=0; A1≥1}, (7)

where we require the GSO to have non-negative weights, be symmetric, and have no self-loops, and the last constraint rules out the trivial solution by imposing that every node has at least one neighbor. Analogously, the set of combinatorial Laplacian matrices is

 L:={Lij≤0fori≠j;L=L⊤;L1=0;L⪰0}, (8)

where we require the GSO to be a positive semidefinite matrix, have non-positive off-diagonal values, have positive entries on its diagonal, and have the constant vector as an eigenvector (i.e, the sum of the entries of each row to be zero). Lastly, we want to stress that the objective and the constraint can be alternatively formulated based on the full GSO , provided that we know that the structural properties (for instance sparsity in the objective and positive entries in the constraints) hold also for the non-observed parts of . Such an approach is suitable when the interest goes beyond and spans the estimation of the links involving the nodes in .

Hidden variables in correlation and partial-correlation networks: Before discussing our specific solutions to Problem 1, a relevant question is how classical topology-inference approaches (namely correlation and partial-correlation networks) handle the problem of latent nodal variables. The so-called direct methods consider that a link between nodes and exists based only on a pairwise similarity metric between the signals observed at and . Within this class of methods, correlation networks set the similarity metric to the correlation and, as a result, corresponds to a (thresholded) version of . Given their simplicity, the generalization of direct methods to setups where hidden variables are present is straightforward and simply given by . Nevertheless, a high correlation between two nodes can be due to global network effects rather than to the direct influence among pairs of neighbors, calling for more involved topology-inference methods. To that end, partial-correlation methods, including the celebrated graphical Lasso (GL) algorithm [19], propose estimating the graph as a matrix of partial correlation coefficients, which boils down to assuming that the connectivity patterns can be identified as , with being known as the precision matrix. When hidden variables are present, the submatrix of the precision matrix is given by , with being a low-rank matrix since . Leveraging this structure, the authors in [8] modified the GL algorithm to deal with hidden variables via a maximum-likelihood estimator augmented with a nuclear-norm regularizer to promote low rankness in . The resulting algorithm is known as latent variable graphical Lasso (LVGL) and is given by

 maxSO−B⪰0,B⪰0logdet(SO−B)−trace(^CO(SO−B))−λ1∥SO∥1−λ2∥B∥∗, (9)

where represents the sample covariance of the observed data and and are regularization constants [8].

Rather than assuming that the relation between and postulated in (AS2) is given by either correlations or partial-correlations, this paper looks at setups where the operational assumption is that the observed signals are: i) smooth on the graph; ii) stationary on the graph; and iii) both smooth and stationary. Sections IV-VI deal with each of those three setups. Section VII evaluates numerically the performance of the developed algorithms and compares it with that of classical correlation and LVGL schemes.

## Iv Topology inference from smooth signals

In this section, we address Problem 1 by particularizing (6) to the case of the signals being smooth on .

As explained in Section II, a natural way of measuring the smoothness of (a set of) graph signals is to leverage the graph Laplacian and compute their as [cf. (4)]. As a result, in this section we set and focus on . Recall that, due to the existence of hidden variables, the whole covariance matrix is not observed. To account for this and leveraging the block definition of and introduced in (5), we can rewrite the of our dataset as

 tr(^CL)=tr(^COLO)+2tr(^COHL⊤OH)+tr(^CHLH), (10)

where only is assumed to be known and the influence of the hidden variables in the has been made explicit.

Although the block-wise smoothness presented in (10) could be directly employed to approach the network-topology inference as an optimization problem, most of the submatrices are not known and need to be estimated. Incorporating the terms and would directly render the problem non-convex. To circumvent this issue, we lift the problem by defining the matrix . Since (AS1) guarantees that , we exploit the low-rank structure of the matrix in our formulation. Correspondingly, we also define the matrix and note that, since is the product of two positive semidefinite matrices, it has positive eigenvalues and, as a result, it holds that .

With these considerations in mind, the network topology inference from smooth signals is formulated as

 tr(COLO)+2tr(K)+tr(R)+α∥LO∥2F,off −βlog(diag(LO))+γ∥K∥∗ (11) s.t. tr(COLO)+2tr(K)+tr(R)≥0, tr(R)≥0, LO∈¯L,

where denotes the Frobenius norm excluding the elements of the diagonal. This term, together with , serves to control the sparsity of . Furthermore, the logarithmic barrier rules out the trivial solution of . The nuclear norm is a convex regularizer that promotes low-rank solutions for the matrix and it is typically employed as a surrogate of the (non-convex) rank constraint. The adoption of the nuclear norm, together with the consideration of the matrices and , ensure the convexity of (IV) so a globally optimum solution can be efficiently found. The weights control the trade-off between the regularizers, the first constraint ensures that the LV is non-negative, and the second constraint captures that fact of matrix222From an algorithmic point of view, it is worth noticing that the matrix always appears as in (IV). As a result, if convenient to reduce the numerical burden, one can replace with and optimize over in lieu of . See the related formulation in (13) for details. being PSD.

The last point to discuss in detail is the form of . Mathematically, the set is equivalent to the set of combinatorial Laplacians , but replacing the condition with , i.e., . The modification is required because, strictly speaking, is not a combinatorial Laplacian. The existence of links between the elements in and the hidden nodes in give rise to non-zero (negative) entries in and, as a result, the sum of the off-diagonal elements of can be smaller than the value of the associated diagonal elements (which account for the links in both and ). Intuitively, the more relaxed condition enlarges the set of feasible solutions rendering the inference process harder to solve, an issue that has been observed when running the numerical experiments. Moreover, when estimating the diagonal of we are indirectly estimating the number of edges between observed and the hidden nodes. This could be potentially leveraged to estimate links with non-observed nodes, but this entails a more challenging problem that goes beyond the scope of the paper. An approach to bypass some of these issues is analyzed next.

### Iv-a Exploiting the Laplacian of the observed adjacency matrix

The Laplacian offers a neat way to measure the smoothness of graph signals [cf. (3)]. However, when addressing the problem of estimating the Laplacian from smooth signals under the presence of hidden nodes, we must face the challenges associated with the fact of the submatrix not being a Laplacian itself. As discussed in the preceding paragraphs, this requires dropping some of the Laplacian constraints from the optimization, leading to a looser recovery framework. To circumvent these issues, rather than estimating , this section looks at the problem of estimating , the Laplacian associated with the observed adjacency matrix . In contrast to , the matrix is a proper combinatorial Laplacian () and, hence, the original Laplacian constraints can be restored. The remaining of this section is devoted to reformulating (IV) in terms of .

Upon defining the diagonal matrices and , which count the number of observed and hidden neighbors for the nodes in , the matrix is expressed as . With this equivalence, the smoothness penalty in (IV) is rewritten as

 tr(CL) =tr(CO~LO)+tr(CODOH)+2tr(K)+tr(R) =tr(CO~LO)+2tr(~K)+tr(R), (12)

where . Because the entries of depend on the presence of edges between the observed and the hidden nodes, if the graph is sparse, the matrix will be a low-rank matrix. Furthermore, since the sparsity pattern of the diagonal of depends on the matrix , it follows that the column sparsity pattern of matches that of , and thus, is also low rank.

With these considerations in mind, we reformulate the optimization in (IV) replacing with , resulting in the following convex optimization problem

 min~LO,~K,r tr(CO~LO)+2tr(~K)+r+α∥~LO∥2F,off (13) −βlog(diag(~LO))+γ∗∥~K∥∗+γ2,1∥~K∥2,1 s.t. tr(CO~LO)+2tr(~K)+r≥0, r≥0 ~LO∈L,

where in (IV) has been replaced with in (13), which is the set of all valid combinatorial Laplacian matrices defined in (8). Moreover, knowing that the matrix only appears as we replace it with the nonnegative variable to alleviate the numerical burden. Note that, although we replaced with , the terms previously associated with in (IV) remain unchanged in (13). Nonetheless, while the original matrix is low rank because it is the product of a tall matrix and a fat matrix, the low-rankness of is a byproduct of the sparsity of the graph. More precisely, the matrix involves the product of the square (full rank) matrix and the diagonal matrix . Since the diagonal of is sparse, such a product gives rise to a matrix with several zero columns, with the rank of the resultant matrix coinciding with the number of non-zero columns. We exploit this structure by further regularizing the matrix with the norm.

Indeed, two different configurations of (13) can be obtained depending on the values of the regularization constants. Setting we promote a solution with a low rank on by applying the nuclear norm regularization. Since the nuclear norm minimization does not ensure the desired column-sparsity of , an alternative is to set and rely on the penalty . The computation of can be understood as a two-step process where one first obtains the norm of each of the columns of and, then, the norm of the resulting row vector is computed. This regularization is commonly known as the group Lasso penalty [49, 43] and has been used in a number of sparse-recovery problems. The results in Section VII will illustrate that the formulation in (13

) succeeds in promoting the desired column-sparsity pattern when using the appropriate values for the hyperparameters

and . Note also that, by looking at the non-zero columns of , the nodes in with connections to hidden nodes can be identified.

## V Topology inference from stationarity signals

In this section, instead of relying on the smoothness of the signals , we approach Problem 1 by modifying (AS2) and considering that the data is stationary on the sought graph. The assumption of being stationary on is tantamount to the matrices and sharing the same eigenvectors  [23]. As a result, the approach for the fully observable case is to use the observations to estimate the sample covariance and then rely on the sample covariance to estimate the eigenvectors  [38]. However, when dealing with hidden variables, there is no obvious way to obtain , the submatrix of the eigenvectors of the full covariance, using as input the submatrix . To bypass this problem, instead of requiring the eigenvectors of and being the same, our approach is to require that and commute, i.e., that the equation must hold [39]. To see why this condition leads to a more tractable formulation, let us leverage the block structure of and described in (5). It follows readily that the upper left submatrix of size in both sides of the equality is given by

 COSO+COHS⊤OH=SOCO+SOHC⊤OH. (14)

The above expression succeeds in relating the sought with , which can be efficiently estimated using . Furthermore, (14) reveals that when hidden variables are present, we cannot simply ask and to commute, but we also need to account for the associated terms and .

Implementing steps similar to those in Section IV, we can lift the problem defining the matrix and leverage the fact that , due to (AS1). Note that the matrix is equivalent to the one defined in Section IV with the only difference that now we use a block from the generic GSO instead of the Laplacian . Moreover, since both and are symmetric matrices, we have that . Then, under the general assumption that graphs are typically sparse, we can approach Problem 1 with stationary observations by solving

 ∥SO∥0 (15) s.t. COSO+K=SOCO+K⊤, rank(K)≤H, SO∈S,

where the norm promotes sparse solutions, the equality constraint ensures commutativity of the GSO and the covariance while accounting for latent nodes, and the rank constraint captures the low rank of due to (AS1).

Regarding the specific choice of the GSO, when the interest is in the Laplacian matrix we set , with denoting the Laplacian of the observed adjacency matrix. Then, the matrix is replaced with , which accounts for the fact of using instead of in (14). This was further motivated in Section IV-A, and the discussion provided there also applies here.

The presence of the rank constraint and the norm renders (15) non-convex and computationally hard to solve. Furthermore, the first constraint assumes perfect knowledge of , which may not always represent a practical setup. These issues are addressed in the next section.

### V-a Convex and robust stationary topology inference

A natural approach to deal with (15) is to relax the non-convex terms, replacing the norm with the norm and the rank constraint with the nuclear norm, their closest convex surrogates. Furthermore, in most practical scenarios the ensemble covariance is not known and one must rely on its sampled counterpart . This requires relaxing the equality constraint and replacing it with a constraint that guarantees that the terms on the left-hand side and right-hand side are similar but not necessarily the same. Taking all these considerations into account, the relaxed convex topology-inference problem is

 ∥SO∥1+η∥K∥∗ (16) s.t. ∥^COSO+K−SO^CO−K⊤∥2F≤ϵ, SO∈S,

where controls the low rankness of . Regarding the (relaxed) stationarity constraint, the squared Frobenius norm has been adopted to measure the similarity between the matrices at hand, but other (convex) distances could be alternatively used. It is also important to note that the value of the non-negative constant should be selected based on prior knowledge on the noise level present in the observations and, more importantly, the number of samples used to estimate the covariance. Clearly, if , the matrix is not full rank, increasing notably the size of the feasible set. On the other hand, if , one can set

. This reduces drastically the degrees of freedom of the formulation and, as a result, renders more likely the solution to (

16) to coincide with the actual GSO.

Remark 1 (Reweighted algorithm): The formulation in (16) is convex and robust. However, while replacing the original norm with the convex norm constitutes a common approach, it is well-known that non-convex surrogates can lead to sparser solutions. Indeed, a more sophisticated alternative in the context of sparse recovery is to define as a small positive number and replace the norm with a (non-convex) logarithmic penalty [7]. An efficient way to handle the non-convexity of the logarithmic penalty is to rely on a majorization-minimization (MM) approach [46], which considers an iterative linear approximation to the concave objective and leads to an iterative re-weighted minimization. To be specific, with being the iteration index, adopting such an approach for the problem in (16) results in

 S(t+1)O:=\operatornamewithlimitsargminSO,K O∑i,j=1[W(t)]ij|[SO]ij|+η∥K∥∗ (17) s.t. COSO+K=SOCO+K⊤, SO∈S,

with being defined as . Since the iterative algorithm penalizes (assigns a larger weight to) entries of that are close to zero, the obtained solution is typically sparser at the expense of a higher computational cost. Finally, note that the absolute values can be removed whenever the constraint is enforced.

### V-B Exploiting structure through alternating optimization

In the previous section, the product of the unknown matrices and was absorbed into matrix . Since such a matrix is low rank, the convex nuclear norm was used to promote low-rank solutions while achieving convexity. However, when implementing this approach, there were other properties (such as being sparse) that were ignored. A reasonable question is, hence, if the judicious incorporation of the additional information outperforms the potential loss of convexity. In this section, we propose an efficient alternating non-convex algorithm that accounts for the additional structure present in our setup. Its associated recovery performance (along with comparisons to its convex counterparts) will be tested in Section VII.

A well-established approach in low-rank optimization is to factorize the matrix of interest as the product of a tall and fat matrix, which boils down to replacing with the original submatrices and . Moreover, when the value of is unknown, which determines the size of and , a principled approach is to rely on an upper bound on and add the Frobenius terms and to the objective function (see, e.g., [45] for a formal derivation of this approach). In our particular setup, this factorization has the additional benefit of being sparse. Then, the resulting non-convex optimization problem is given by

 +νO,H∑i,j=1log(|[SOH]ij|+δ)+η∥COH∥2F +ρ∥^COSO+COHS⊤OH−SO^CO−SOHC⊤OH∥2F s.t. SO∈S,SOH∈SOH, (18)

Clearly, problem (V-B) guarantees that the rank of the matrix is upper bounded by the size of its composing factors and . In this case, the sparse solutions for and are promoted by means of the (concave) logarithmic penalty, introduced on Remark 1. The robust commutativity constraint is placed on the objective function as a penalty term, and the set captures the fact that is a block from the GSO. In its simplest form, we have that if the GSO is the adjacency matrix, and if it is set to the Laplacian matrix.

The main drawback associated with the formulation in (V-B) is that the presence of the bilinear term and logarithmic penalty render the problem non-convex. To address this issue, we implement a Block Successive Upper bound Minimization (BSUM) algorithm [15], an iterative approach that merges the techniques from MM and alternating optimization. Then, we find a solution to (V-B) by iterating between the following thee steps.

Step 1. Given the estimates and , we substitute and into (V-B) and solve it to estimate . This yields

 ^S(t+1)O:= \operatornamewithlimitsargminSO∈S O∑i,j=1[W(t)O]ij|[SO]ij| (19) +ρ∥^COSO+^C(t)OH[^S(t)OH]⊤−SO^CO−^S(t)OH[^C(t)OH]⊤∥2F,

where the logarithmic penalty is approximated by the re-weighted norm as detailed after (17).

Step 2. Given the estimate from the previous iteration, and leveraging the estimate from the last step, we estimate the matrix by solving

 ^S(t+1)OH :=\operatornamewithlimitsargminSOH∈SOH O,H∑i,j=1[W(t)OH]ij|[SOH]ij|+η∥SOH∥2F (20) +ρ∥^CO^S(t+1)O+^C(t)OHS⊤OH−^S(t+1)O^CO−SOH[^C(t)OH]⊤∥2F.

Step 3. With the estimates from the previous steps in place, the last step involves estimating the matrix by solving

 ^C(t+1)OH :=\operatornamewithlimitsargminCOHη∥COH∥2F (21) ∥^CO^S(t+1)O+COH[^S(t+1)OH]⊤−^S(t+1)O^CO−^S(t+1)OHC⊤OH∥2F.

The alternating algorithm is initialized upon: i) solving (16) to obtain and ii) setting and as the top left and right singular vectors of . The three steps proposed in (19)-(21) are iterated until convergence to a stationary point, a result that is formally stated next.

Proposition 1. Denote with the objective function in (V-B). Let be the set of stationary points of (V-B), and let be the solution generated after running the 3 steps in (19)-(21) times. Then, the solution generated by the iterative algorithm (19)-(21) converges to a stationary point of as goes to infinity, i.e.,

 limt→∞d(y(t),Y∗)=0,

with .

Note that convergence was not obvious since at least one of the steps does not have a unique minimizer, and the first and second steps employ an approximation of the objective function in (V-B). The details of the proof, which relies on convergence results for BSUM schemes [15, Th. 1b], are provided in Appendix A.

Although more computationally expensive, the numerical tests in Section VII confirm that the additional structure incorporated by replacing with and together with the re-weighted approach for encouraging sparsity give rise to a better network reconstruction, provided that the iterative optimization is initialized with the solution to the convex formulation in (16). Last but not least, notice that an additional benefit of the formulation in (V-B) is that, by analyzing , information of the potential links between nodes in and the hidden nodes in is obtained. While network-tomography schemes [19] go beyond the scope of this paper, the results in this section can be used as a first step towards that goal.

Graph stationary vis-à-vis graph smoothness: Suppose that we are given two datasets and , both with the same number of signals. Moreover, suppose that we also know that the observed signals are smooth on an unknown graph, that are stationary on an unknown graph, and that our goal is to identify the underlying graphs. Based on that information, we run the algorithms in Section IV for the dataset and those in this section for the dataset . An interesting question is which one yields a better recovery result. While the exact answer depends on all the particularities of each of the setups, from a general point of view stationary schemes are expected to achieve better results. The reason is that stationarity strongly limits the degrees of freedom of the GSO, while smoothness is a more lenient assumption, an intuition that will be validated in Section VII. Equally relevant, there can be situations where the data is both stationary and smooth. That is the case, for example, if the covariance matrix shares the eigenvectors with the graph Laplacian and its power spectral density is low pass. In such a setup, one could combine both network-recovery approaches, leading to a better recovery performance. This is precisely the subject of the ensuing section.

## Vi Topology inference from stationary and smooth graph signals with hidden variables

In this section, we address Problem 1 by assuming that the graph signals are both smooth and stationary on the unknown graph . These two assumptions can be jointly considered to design optimization problems with additional structure to enhance the estimation of . To that end, we consider the smoothness-based inference problem described in (13) and incorporate the robust commutativity constraint accounting for stationarity [cf. (14)], resulting in

 min~LO,~K,r tr(^CO~LO)+2tr(~K)+r+α∥~LO∥2F,off (22) −βlog(diag(~LO))+γ∗∥~K∥∗+γ2,1∥~K∥2,1 s.t. tr(^CO~LO)+2tr(~K)+r≥0, ~LO∈L, ∥^CO~LO+~K−~LO^CO−~K⊤∥2F≤ϵ.

Since the smooth formulation involves the Laplacian matrix, note that we adopted the Laplacian of the observed adjacency matrix as the GSO. Regarding the stationarity constraint, as discussed for (16), the value of should be selected based on the number of available signals and the observation noise. It is also worth noting that the matrix is inconspicuously absorbing the error derived from the presence of the hidden variables and from using instead of in both the smoothness penalty and the commutativity constraint. Regarding matrix , two different regularizers are considered: the nuclear norm (to promote solutions with a low rank) and the norm (to promote column sparsity). Since having solutions with columns that are zero also reduces the rank, it is prudent to tune the value of the hyperparameters and jointly, so that the (joint) dependence between the rank and the column sparsity is kept under control.

We close the section by noting that the formulation in (22) is convex so that its globally optimal solution can be found efficiently. However, non-convex versions of (22) that leverage the re-weighted norm to promote column sparsity and factorization approaches for the low-rank penalty (similar to those used in Section V) could be developed here as well.

## Vii Numerical experiments

This section runs numerical experiments to gain insights on the proposed schemes and evaluate their recovery performance. First, we test the smooth-based approaches with synthetic data and compare the results with existent algorithms from the literature. Secondly, we assess the performance of the stationary-based schemes proposed in Section V, comparing them with those in Section VI and the classical LVGL. Lastly, we apply the proposed algorithms to two real-world datasets and compare the obtained results with those of existing alternatives.333 The MATLAB scripts for running all the numerical experiments presented in this section as well as additional related test cases can be found in https://github.com/andreibuciulea/topoIDhidden

### Vii-a Synthetic experiments based on smooth signals

We start by defining the default setup for the experiments in this section. With denoting the eigendecomposition of the graph Laplacian, the smooth signals are generated as , where the columns of

are independent realizations of a multivariate Gaussian distribution

. Note that this model, which is oftentimes referred to as factor analysis [4, 3, 11], assigns more energy to the low-frequency components, promoting smoothness on the generated graph signals. Unless otherwise stated, the number of signals is set to