I Introduction
Modeling vertex attributes as processes that take values over a graph allows for data processing tasks, such as filtering, inference, and compression, while accounting for information captured by the network topology [34, 20]. However, if the topology is unavailable, inaccurate or even unrelated to the process of interest, performance of the associated task may degrade severely. For example, consider a social graph where the goal is to predict the salaries of all individuals given the salaries of some. Graphbased inference approaches that assume smoothness of the salary over the given graph, may fall short if the salary is dissimilar among friends.
Topology identification is possible when observations at all nodes are available by employing structural models, see e.g., [18]. However, in many real settings one can only afford to collect nodal observations from a subset of nodes due to applicationspecific restrictions. For example, sampling all nodes may be prohibitive in massive graphs; in social networks individuals may be reluctant to share personal information due to privacy concerns; in sensor networks, devices may report measurements sporadically to save energy; and in gene regulatory networks, gene expression data may contain misses due to experimental errors. In this context, the present paper relies on SEMs [18], and SVARMs [9] and aims at jointly inferring the network topology and estimating graph signals, given noisy observations at subsets of nodes.
SEMs provide a statistical framework for inference of causal relationships among nodes [18, 12]. Linear SEMs have been widely adopted in fields as diverse as sociometrics [14], psychometrics [24], recommender systems [26], and genetics [8]. Conditions for identifying the network topology under the SEM have been also provided [6, 32], but require observations of the process at all nodes. Recently, nonlinear SEMs have been developed to also capture nonlinear interactions [33]. On the other hand, SVARMs postulate that nodes further exert timelagged dependencies on one another, and are appropriate for modeling multivariate time series [9]. Nonlinear SVARMs have been employed to identify directed dependencies between regions of interest in the brain [31]. Other approaches identify undirected topologies provided that the graph signals are smooth over the graph [11]; or, that the observed process is graphbandlimited [30]. All these contemporary approaches assume that samples of the graph process are available over all nodes. However, acquiring networkwide observations may incur prohibitive sampling costs, especially for massive networks.
Methods for inference of graph signals (or processes), typically assume that the network topology is known and undirected, and the graph signal is smooth, in the sense that neighboring vertices have similar values [35]. Parametric approaches adopt the graphbandlimited model [5, 25], which postulate that the signal lies in a graphrelated dimensional subspace; see [22] for timevarying signals. Nonparametric techniques employ kernels on graphs for inference [35, 28]; see also [15] for semiparametric alternatives. Online dataadaptive algorithms for reconstruction of dynamic processes over dynamic graphs have been proposed in [16], where kernel dictionaries are generated from the network topology. However, performance of the aforementioned techniques may degrade when the process of interest is not smooth over the adopted graph.
To recapitulate, existing approaches either infer the graph process given the known topology and nodal observations, or estimate the network topology given the process values over all the nodes. The present paper fills this gap by introducing algorithms based on SEMs and SVARMs for joint inference of network topologies and graph processes over the underlying graph. The approach is semiblind because it performs the joint estimation task with only partial observations over the network nodes. Specifically, the contribution is threefold.

A novel approach is proposed for joint inference of directed network topologies and signals over the underlying graph using SEMs. An efficient algorithm is developed with provable convergence at least to a stationary point.

To further accommodate temporal dynamics, we advocate a SVARM to infer dynamic processes and graphs. A batch solver is provided that alternates between topology estimation and signal inference with linear complexity across time. Furthermore, a novel online algorithm is developed that performs realtime joint estimation, and tracks timeevolving topologies.

Analysis of the partially observed noiseless SEM is provided that establishes sufficient conditions for identifiability of the unknown topology. These conditions suggest that the required number of observations for identification reduces significantly when the network exhibits edge sparsity.
The rest of the paper is organized as follows. Sec. II reviews the SEM and SVARM, and states the problem. Sec. III presents a novel estimator for joint inference based on SEMs. Sec. IV develops both batch and online algorithms for inferring dynamic processes and networks using SVARMs. Sec. V presents the identifiability results of the partially observed SEM. Finally, numerical experiments and conclusions are presented in Secs. VI and VII, respectively.
Notation: Scalars are denoted by lowercase, column vectors by bold lowercase, and matrices by bold uppercase letters. Superscripts and respectively denote transpose and inverse; while stands for the allone vector. Moreover, denotes a block entry of appropriate size. Finally, if is a matrix and a vector, then , , denotes the norm of the vectorized matrix, and is the Frobenius norm of .
Ii Structural models and problem formulation
Consider a network with nodes modeled by the graph , where is the set of vertices and denotes the adjacency matrix, whose th entry represents the weight of the directed edge from to . A realvalued process (or signal) on is a map . In social networks (e.g., Twitter) over which information diffuses could represent the timestamp when subscriber tweeted about a viral story . Since realworld networks often exhibit edge sparsity, has only a few nonzero entries.
Iia Structural models
The linear SEM[14] postulates that depends linearly on , that amounts to
(1) 
where the unknown captures the causal influence of node upon node , and accounts for unmodeled dynamics. Clearly, suggests that is influenced directly by nodes in its neighborhood . With the vectors , and , (1) can be written in matrixvector form as
(2) 
SEMs have been successful in a host of applications, including gene regulatory networks [8], and recommender systems [26]. Therefore, the index does not necessarily indicate time, but may represent different individuals (gene regulatory networks), or movies (recommender systems). An interesting consequence emerges if one considers as a random process with . Thus, (2) can be written as with having covariance matrix . Matrices and are simultaneously diagonalizable, and hence is a graph stationary process [23].
In order to unveil the hidden causal network topology, SVARMs postulate that each can be represented as a linear combination of instantaneous measurements at other nodes , and their timelagged versions [9]. Specifically, the following instantaneous plus timelagged model is advocated
(3) 
where captures the instantaneous causal influence of node upon node , encodes the timelagged causal influence between them, and accounts for unmodeled dynamics. By defining , , and the matrices , and with entries , and respectively, the matrixvector form of (3) becomes
(4) 
with , and considered known. The SVARM in (4) is a better fit for timeseries over graphs compared to the SEM in (2), because it further accounts for temporal dynamics of through the timelagged influence term . For this reason, SVARMs will be employed for dynamic setups, such as modeling ECoG time series in brain networks, and predicting Internet router delays. The SVARM is depicted in Fig. 1.
IiB Problem statement
Applicationspecific constraints allow only for a limited number of samples across nodes per slot . Suppose that noisy samples of the th observation vector
(5) 
are available, where contains the indices of the sampled vertices, and models the observation error. With , and , the observation model is
(6) 
where is an matrix with entries , set to one, and the rest set to zero.
The broad goal of this paper is the joint inference of the hidden network topology and signals over graphs (JISG) from partial observations of the latter. Given the observations collected in accordance to the sampling matrices , one aims at finding the underlying topologies, for the SEM, or and for the SVARM, as well as reconstructing the graph process at all nodes . The complexity of the estimators should preferably scale linearly in . As estimating the topology and relies on partial observations, this is a semiblind inference task.
Iii Jointly inferring topology and signals
Given in (6), this section develops a novel approach to infer , and . To this end, we advocate the following regularized leastsquares (LS) optimization problem
(7)  
where tunes the relative importance of the fitting term; , control the effect of the norm and the Frobeniusnorm, respectively, and . The weighted sum of and is the sotermed elastic net penalty, which promotes connections between highly correlated nodal measurements. The elastic net targets the “sweet spot” between the regularizer that effects sparsity, and the regularizer, which advocates fully connected networks [37].
Even though (7) is nonconvex in both and due to the bilinear product , it is convex with respect to (w.r.t.) each block variable separately. This motivates an iterative block coordinate descent (BCD) algorithm that alternates between estimating and .
Given , the estimates are found by solving the quadratic problem
(8) 
where the regularization terms in (7) do not appear. Clearly, (8) conveniently decouples across as
(9) 
The first quadratic in (9) can be written as , and it can be viewed as a regularizer for , promoting graph signals with similar values at neighboring nodes. Notice that (9) may not be strongly convex, since could be rank deficient. Nonetheless, since is smooth, (9) can be readily solved via gradient descent (GD) iterations
(10) 
where , and is the stepsize chosen e.g. by the Armijo rule [7]. The computational cost of (10) is dominated by the matrixvector multiplication of with , which is proportional to , where denotes the number of nonzero entries of . Moreover, the learned is expected to be sparse due to the regularizer in (7), which renders firstorder iterations (10) computationally attractive, especially when graphs are large. The GD iterations (10) are run in parallel across until convergence to a minimizer of (9).
On the other hand, with available, is found via
(11) 
where the LS observation error in (7) has been omitted. Note that (11) is strongly convex, and as such it admits a unique minimizer. Hence, we adopt the alternating methods of multipliers (ADMM), which guarantees convergence to the global minimum in a finite number of iterations; see e.g. [13]. The derivation of the algorithm is omitted due to lack of space; instead the detailed derivation of an ADMM solver for a more general setting will be presented in Sec. IVA.
The BCD solver for JISG is summarized as Algorithm 1. JISG converges at least to a stationary point of (7), as asserted by the ensuing proposition.
Proposition 1.
Proof.
The basic convergence results of BCD have been established in [36]. First, notice that all the terms in (7) are differentiable over their open domain except the nondifferentiable norm, which is however separable. These observations establish, based on [36, Lemma 3.1], that is regular at each coordinatewise minimum point , and therefore every such a point is a stationary point of (7). Moreover, is continuous and convex per variable. Hence, by appealing to [36, Theorem 5.1], the sequence of iterates generated by JISG converges monotonically to a coordinatewise minimum point of , and consequently to a stationary point of (7). ∎
A few remarks are now in order.
Remark 1.
A popular alternative to the elastic net regularizer is the nuclear norm that promotes low rank of the learned adjacency matrix  a wellmotivated attribute when the graph is expected to exhibit clustered structure [10].
Remark 2.
Oftentimes, prior information about may be available, e.g. the support of ; nonnegative edge weights ; or, the value of for some . Such prior information can be easily incorporated in (7) by adjusting , and the ADMM solver accordingly.
Remark 3.
The estimator in (8) that relies on SEMs is capable of estimating functions over directed graphs as well as undirected ones, while kernelbased approaches [35]
and estimators that rely on the graph Fourier transform
[34] are usually confined to undirected graphs.Remark 4.
In realworld networks, sets of nodes may depend upon each other via multiple types of relationships, which ordinary networks cannot capture [19]. Consequently, generalizing the traditional singlelayer to multilayer networks that organize the nodes into different groups, called layers, is well motivated. Such layer structure can be incorporated in (7) via appropriate regularization; see e.g. [17]. Thus, the JISG estimator can also accommodate multilayer graphs.
Iv Jointly infer graphs and processes over time
Realworld networks often involve processes that vary over time, with dynamics not captured by SEMs. This section considers an alternative based on SVARMs that allows for joint inference of dynamic network processes and graphs.
Iva Batch Solver for JISG over time
Given , this section develops an efficient approach to infer , , and . Clearly, to cope with the undetermined system of equations (4) and (6), one has to exploit the structure in and . This prompts the following regularized LS objective
(12)  
where is a regularization scalar weighting the fit to the observations, and is the elastic net regularizer for the connectivity matrices. The first sum accounts for the LS fitting error of the SVARM, and the second LS cost accounts for the initial conditions. The third term sums the measurement error over . Finally, the elastic net penalty terms , and favor connections among highly correlated nodes; see also discussion after (7).
The optimization problem in (12) is nonconvex due to the bilinear terms , and ; nevertheless, it is convex w.r.t. each of the variables separately. Next, an efficient algorithm based on BCD is put forth that provably attains a stationary point of (12). With , and available, the following objective yields estimates
(13) 
where denotes the estimate of given . Different from (8), the timelagged dependencies couple the objective in (IVA) across . Upon defining that is assumed invertible,
Comments
There are no comments yet.