Semi-Blind Inference of Topologies and Dynamical Processes over Graphs

05/16/2018
by   Vassilis N. Ioannidis, et al.
0

Network science provides valuable insights across numerous disciplines including sociology, biology, neuroscience and engineering. A task of major practical importance in these application domains is inferring the network structure from noisy observations at a subset of nodes. Available methods for topology inference typically assume that the process over the network is observed at all nodes. However, application-specific constraints may prevent acquiring network-wide observations. Alleviating the limited flexibility of existing approaches, this work advocates structural models for graph processes and develops novel algorithms for joint inference of the network topology and processes from partial nodal observations. Structural equation models (SEMs) and structural vector autoregressive models (SVARMs) have well-documented merits in identifying even directed topologies of complex graphs; while SEMs capture contemporaneous causal dependencies among nodes, SVARMs further account for time-lagged influences. This paper develops algorithms that iterate between inferring directed graphs that "best" fit the data, and estimating the network processes at reduced computational complexity by leveraging tools related to Kalman smoothing. To further accommodate delay-sensitive applications, an online joint inference approach is put forth that even tracks time-evolving topologies. Furthermore, conditions for identifying the network topology given partial observations are specified. It is proved that the required number of observations for unique identification reduces significantly when the network structure is sparse. Numerical tests with synthetic as well as real datasets corroborate the effectiveness of the novel approach.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

page 8

page 10

10/05/2021

Joint inference of multiple graphs with hidden variables from stationary graph signals

Learning graphs from sets of nodal observations represents a prominent p...
11/10/2021

Learning Graphs from Smooth and Graph-Stationary Signals with Hidden Variables

Network-topology inference from (vertex) signal observations is a promin...
10/26/2016

Tensor Decompositions for Identifying Directed Graph Topologies and Tracking Dynamic Networks

Directed networks are pervasive both in nature and engineered systems, o...
05/14/2018

A Systematic Approach to Constructing Families of Incremental Topology Control Algorithms Using Graph Transformation

In the communication systems domain, constructing and maintaining networ...
05/10/2016

Kernel-Based Structural Equation Models for Topology Identification of Directed Networks

Structural equation models (SEMs) have been widely adopted for inference...
09/05/2013

Bayesian Structural Inference for Hidden Processes

We introduce a Bayesian approach to discovering patterns in structurally...
06/28/2016

Tracking Switched Dynamic Network Topologies from Information Cascades

Contagions such as the spread of popular news stories, or infectious dis...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Modeling vertex attributes as processes that take values over a graph allows for data processing tasks, such as filtering, inference, and compression, while accounting for information captured by the network topology [34, 20]. However, if the topology is unavailable, inaccurate or even unrelated to the process of interest, performance of the associated task may degrade severely. For example, consider a social graph where the goal is to predict the salaries of all individuals given the salaries of some. Graph-based inference approaches that assume smoothness of the salary over the given graph, may fall short if the salary is dissimilar among friends.

Topology identification is possible when observations at all nodes are available by employing structural models, see e.g., [18]. However, in many real settings one can only afford to collect nodal observations from a subset of nodes due to application-specific restrictions. For example, sampling all nodes may be prohibitive in massive graphs; in social networks individuals may be reluctant to share personal information due to privacy concerns; in sensor networks, devices may report measurements sporadically to save energy; and in gene regulatory networks, gene expression data may contain misses due to experimental errors. In this context, the present paper relies on SEMs [18], and SVARMs [9] and aims at jointly inferring the network topology and estimating graph signals, given noisy observations at subsets of nodes.

SEMs provide a statistical framework for inference of causal relationships among nodes [18, 12]. Linear SEMs have been widely adopted in fields as diverse as sociometrics [14], psychometrics [24], recommender systems [26], and genetics [8]. Conditions for identifying the network topology under the SEM have been also provided [6, 32], but require observations of the process at all nodes. Recently, nonlinear SEMs have been developed to also capture nonlinear interactions [33]. On the other hand, SVARMs postulate that nodes further exert time-lagged dependencies on one another, and are appropriate for modeling multivariate time series [9]. Nonlinear SVARMs have been employed to identify directed dependencies between regions of interest in the brain [31]. Other approaches identify undirected topologies provided that the graph signals are smooth over the graph [11]; or, that the observed process is graph-bandlimited [30]. All these contemporary approaches assume that samples of the graph process are available over all nodes. However, acquiring network-wide observations may incur prohibitive sampling costs, especially for massive networks.

Methods for inference of graph signals (or processes), typically assume that the network topology is known and undirected, and the graph signal is smooth, in the sense that neighboring vertices have similar values [35]. Parametric approaches adopt the graph-bandlimited model [5, 25], which postulate that the signal lies in a graph-related -dimensional subspace; see [22] for time-varying signals. Nonparametric techniques employ kernels on graphs for inference [35, 28]; see also [15] for semi-parametric alternatives. Online data-adaptive algorithms for reconstruction of dynamic processes over dynamic graphs have been proposed in [16], where kernel dictionaries are generated from the network topology. However, performance of the aforementioned techniques may degrade when the process of interest is not smooth over the adopted graph.

To recapitulate, existing approaches either infer the graph process given the known topology and nodal observations, or estimate the network topology given the process values over all the nodes. The present paper fills this gap by introducing algorithms based on SEMs and SVARMs for joint inference of network topologies and graph processes over the underlying graph. The approach is semi-blind because it performs the joint estimation task with only partial observations over the network nodes. Specifically, the contribution is threefold.

  • A novel approach is proposed for joint inference of directed network topologies and signals over the underlying graph using SEMs. An efficient algorithm is developed with provable convergence at least to a stationary point.

  • To further accommodate temporal dynamics, we advocate a SVARM to infer dynamic processes and graphs. A batch solver is provided that alternates between topology estimation and signal inference with linear complexity across time. Furthermore, a novel online algorithm is developed that performs real-time joint estimation, and tracks time-evolving topologies.

  • Analysis of the partially observed noiseless SEM is provided that establishes sufficient conditions for identifiability of the unknown topology. These conditions suggest that the required number of observations for identification reduces significantly when the network exhibits edge sparsity.

The rest of the paper is organized as follows. Sec. II reviews the SEM and SVARM, and states the problem. Sec. III presents a novel estimator for joint inference based on SEMs. Sec. IV develops both batch and online algorithms for inferring dynamic processes and networks using SVARMs. Sec. V presents the identifiability results of the partially observed SEM. Finally, numerical experiments and conclusions are presented in Secs. VI and VII, respectively.

Notation: Scalars are denoted by lowercase, column vectors by bold lowercase, and matrices by bold uppercase letters. Superscripts and respectively denote transpose and inverse; while stands for the all-one vector. Moreover, denotes a block entry of appropriate size. Finally, if is a matrix and a vector, then , , denotes the -norm of the vectorized matrix, and is the Frobenius norm of .

Ii Structural models and problem formulation

Consider a network with nodes modeled by the graph , where is the set of vertices and denotes the adjacency matrix, whose -th entry represents the weight of the directed edge from to . A real-valued process (or signal) on is a map . In social networks (e.g., Twitter) over which information diffuses could represent the timestamp when subscriber tweeted about a viral story . Since real-world networks often exhibit edge sparsity, has only a few nonzero entries.

Ii-a Structural models

The linear SEM[14] postulates that depends linearly on , that amounts to

(1)

where the unknown captures the causal influence of node upon node , and accounts for unmodeled dynamics. Clearly, suggests that is influenced directly by nodes in its neighborhood . With the vectors , and , (1) can be written in matrix-vector form as

(2)

SEMs have been successful in a host of applications, including gene regulatory networks [8], and recommender systems [26]. Therefore, the index does not necessarily indicate time, but may represent different individuals (gene regulatory networks), or movies (recommender systems). An interesting consequence emerges if one considers as a random process with . Thus, (2) can be written as with having covariance matrix . Matrices and are simultaneously diagonalizable, and hence is a graph stationary process [23].

Fig. 1: The SVARM (4); instantaneous dependencies (blue arrows) and time-lagged influences (red arrows).

In order to unveil the hidden causal network topology, SVARMs postulate that each can be represented as a linear combination of instantaneous measurements at other nodes , and their time-lagged versions  [9]. Specifically, the following instantaneous plus time-lagged model is advocated

(3)

where captures the instantaneous causal influence of node upon node , encodes the time-lagged causal influence between them, and accounts for unmodeled dynamics. By defining , , and the matrices , and with entries , and respectively, the matrix-vector form of (3) becomes

(4)

with , and considered known. The SVARM in (4) is a better fit for time-series over graphs compared to the SEM in (2), because it further accounts for temporal dynamics of through the time-lagged influence term . For this reason, SVARMs will be employed for dynamic setups, such as modeling ECoG time series in brain networks, and predicting Internet router delays. The SVARM is depicted in Fig. 1.

Ii-B Problem statement

Application-specific constraints allow only for a limited number of samples across nodes per slot . Suppose that noisy samples of the -th observation vector

(5)

are available, where contains the indices of the sampled vertices, and models the observation error. With , and , the observation model is

(6)

where is an matrix with entries , set to one, and the rest set to zero.

The broad goal of this paper is the joint inference of the hidden network topology and signals over graphs (JISG) from partial observations of the latter. Given the observations collected in accordance to the sampling matrices , one aims at finding the underlying topologies, for the SEM, or and for the SVARM, as well as reconstructing the graph process at all nodes . The complexity of the estimators should preferably scale linearly in . As estimating the topology and relies on partial observations, this is a semi-blind inference task.

Iii Jointly inferring topology and signals

Given in (6), this section develops a novel approach to infer , and . To this end, we advocate the following regularized least-squares (LS) optimization problem

(7)

where tunes the relative importance of the fitting term; , control the effect of the -norm and the Frobenius-norm, respectively, and . The weighted sum of and is the so-termed elastic net penalty, which promotes connections between highly correlated nodal measurements. The elastic net targets the “sweet spot” between the regularizer that effects sparsity, and the regularizer, which advocates fully connected networks [37].

Even though (7) is nonconvex in both and due to the bilinear product , it is convex with respect to (w.r.t.) each block variable separately. This motivates an iterative block coordinate descent (BCD) algorithm that alternates between estimating and .

Given , the estimates are found by solving the quadratic problem

(8)

where the regularization terms in (7) do not appear. Clearly, (8) conveniently decouples across as

(9)

The first quadratic in (9) can be written as , and it can be viewed as a regularizer for , promoting graph signals with similar values at neighboring nodes. Notice that (9) may not be strongly convex, since could be rank deficient. Nonetheless, since is smooth, (9) can be readily solved via gradient descent (GD) iterations

(10)

where , and is the stepsize chosen e.g. by the Armijo rule [7]. The computational cost of (10) is dominated by the matrix-vector multiplication of with , which is proportional to , where denotes the number of non-zero entries of . Moreover, the learned is expected to be sparse due to the regularizer in (7), which renders first-order iterations (10) computationally attractive, especially when graphs are large. The GD iterations (10) are run in parallel across until convergence to a minimizer of (9).

On the other hand, with available, is found via

(11)

where the LS observation error in (7) has been omitted. Note that (11) is strongly convex, and as such it admits a unique minimizer. Hence, we adopt the alternating methods of multipliers (ADMM), which guarantees convergence to the global minimum in a finite number of iterations; see e.g. [13]. The derivation of the algorithm is omitted due to lack of space; instead the detailed derivation of an ADMM solver for a more general setting will be presented in Sec. IV-A.

The BCD solver for JISG is summarized as Algorithm 1. JISG converges at least to a stationary point of (7), as asserted by the ensuing proposition.

Input: Observations ; sampling matrices ; and regularization parameters 1:  Intialize: 2:  while iterates not converge do 3:           Estimate from (11) using ADMM. 4:           Update using (9) and (10). 5:            6:  end while Output: .
Algorithm 1 Joint Infer. of Signals and Graphs (JISG)
Proposition 1.

The sequence of iterates , resulting from obtaining the global minimizers of (8) and (11), is bounded and converges monotonically to a stationary point of (7).

Proof.

The basic convergence results of BCD have been established in [36]. First, notice that all the terms in (7) are differentiable over their open domain except the non-differentiable norm, which is however separable. These observations establish, based on [36, Lemma 3.1], that is regular at each coordinatewise minimum point , and therefore every such a point is a stationary point of (7). Moreover, is continuous and convex per variable. Hence, by appealing to [36, Theorem 5.1], the sequence of iterates generated by JISG converges monotonically to a coordinatewise minimum point of , and consequently to a stationary point of (7). ∎

A few remarks are now in order.

Remark 1.

A popular alternative to the elastic net regularizer is the nuclear norm that promotes low rank of the learned adjacency matrix - a well-motivated attribute when the graph is expected to exhibit clustered structure [10].

Remark 2.

Oftentimes, prior information about may be available, e.g. the support of ; nonnegative edge weights ; or, the value of for some . Such prior information can be easily incorporated in (7) by adjusting , and the ADMM solver accordingly.

Remark 3.

The estimator in (8) that relies on SEMs is capable of estimating functions over directed graphs as well as undirected ones, while kernel-based approaches [35]

and estimators that rely on the graph Fourier transform 

[34] are usually confined to undirected graphs.

Remark 4.

In real-world networks, sets of nodes may depend upon each other via multiple types of relationships, which ordinary networks cannot capture [19]. Consequently, generalizing the traditional single-layer to multilayer networks that organize the nodes into different groups, called layers, is well motivated. Such layer structure can be incorporated in (7) via appropriate regularization; see e.g. [17]. Thus, the JISG estimator can also accommodate multilayer graphs.

Iv Jointly infer graphs and processes over time

Real-world networks often involve processes that vary over time, with dynamics not captured by SEMs. This section considers an alternative based on SVARMs that allows for joint inference of dynamic network processes and graphs.

Iv-a Batch Solver for JISG over time

Given , this section develops an efficient approach to infer , , and . Clearly, to cope with the undetermined system of equations (4) and (6), one has to exploit the structure in and . This prompts the following regularized LS objective

(12)

where is a regularization scalar weighting the fit to the observations, and is the elastic net regularizer for the connectivity matrices. The first sum accounts for the LS fitting error of the SVARM, and the second LS cost accounts for the initial conditions. The third term sums the measurement error over . Finally, the elastic net penalty terms , and favor connections among highly correlated nodes; see also discussion after (7).

The optimization problem in (12) is nonconvex due to the bilinear terms , and ; nevertheless, it is convex w.r.t. each of the variables separately. Next, an efficient algorithm based on BCD is put forth that provably attains a stationary point of (12). With , and available, the following objective yields estimates

(13)

where denotes the estimate of given . Different from (8), the time-lagged dependencies couple the objective in (IV-A) across . Upon defining that is assumed invertible,