Accelerated Graph Learning from Smooth Signals

by   Seyed Saman Saboksayr, et al.
University of Rochester

We consider network topology identification subject to a signal smoothness prior on the nodal observations. A fast dual-based proximal gradient algorithm is developed to efficiently tackle a strongly convex, smoothness-regularized network inverse problem known to yield high-quality graph solutions. Unlike existing solvers, the novel iterations come with global convergence rate guarantees and do not require additional step-size tuning. Reproducible simulated tests demonstrate the effectiveness of the proposed method in accurately recovering random and real-world graphs, markedly faster than state-of-the-art alternatives and without incurring an extra computational burden.



There are no comments yet.


page 1

page 2

page 3

page 4


Adaptive Accelerated (Extra-)Gradient Methods with Variance Reduction

In this paper, we study the finite-sum convex optimization problem focus...

Online Graph Learning under Smoothness Priors

The growing success of graph signal processing (GSP) approaches relies h...

Efficient Superimposition Recovering Algorithm

In this article, we address the issue of recovering latent transparent l...

Accelerated Alternating Minimization

Alternating minimization (AM) optimization algorithms have been known fo...

A Simple Practical Accelerated Method for Finite Sums

We describe a novel optimization method for finite sums (such as empiric...

Learning over Multitask Graphs - Part I: Stability Analysis

This paper formulates a multitask optimization problem where agents in t...

Nonconvex Demixing From Bilinear Measurements

We consider the problem of demixing a sequence of source signals from th...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Network-aware signal and information processing is having a major impact in technology and the biobehavioral sciences; see e.g, [19, Ch. 1]. In this context, graph signal processing (GSP) builds on a graph-theoretic substrate to effectively model signals with complex relational structures [26, 37, 32]. However, the required connectivity information is oftentimes not explicitly available. This motivates the prerequisite step of using signals (e.g., brain activity traces, distributed sensor measurements) to unveil latent network structure, or, to construct discriminative graph representations to facilitate downstream learning tasks. As graph data grow in size and complexity, there is an increasing need to develop customized, fast and computationally-efficient graph learning algorithms.

Given nodal measurements (known as graph signals in the GSP parlance), the network topology inference problem is to search for a graph within a model class that is optimal in some application-specific sense, e.g., [19, Ch. 7]. The adopted criterion is naturally tied to the signal model relating the observations to the sought network, which can include constraints motivated by physical laws, statistical priors, or, explainability goals. Workhorse probabilistic graphical models include Gaussian Markov random fields, and topology identification arises with so-termed high-dimensional graphical model selection [8, 39, 12, 21, 11, 29, 20]. Other recent approaches embrace a signal representation perspective to reveal parsimonious data signatures with respect to the underlying graph. These include stationarity induced via linear network diffusion [34, 28, 36] and smoothness (i.e., bandlimitedness) [18, 9, 16, 17, 5, 30, 33, 4, 22]. The interested reader is referred to [24, 10, 13] for comprehensive tutorial treatments of network topology inference advances.

In this short letter, we develop a fast and scalable algorithm to estimate graphs subject to a smoothness prior (Section

II outlines the required background and formally states the problem). Adopting the well-appreciated graph learning framework of [18, 17], in Section III we bring to bear the fast proximal-gradient (PG) iterations in [2] to solve the resulting strongly convex, signal smoothness-regularized optimization problem in the dual domain. There are noteworthy recent scalable solvers for this problem that rely on the primal-dual (PD) method [18], PG [31], or, the linearized alternating-direction method of multipliers (ADMM) [38]. Unlike these algorithms, the novel iterations come with global convergence rate guarantees and do not require additional step-size tuning. Borrowing results from [2], we show that a (possibly infeasible) primal sequence generated from the accelerated graph learning algorithm converges to a globally optimal solution at a rate of . To the best of our knowledge, this is the first work that establishes the convergence rate of topology inference algorithms subject to smoothnes priors. Computer simulations in Section IV showcase the favorable convergence properties of the proposed approach when recovering a wide variety of graphs. In the interest of reproducible research, the code used to generate all figures in this letter is publicly available. Conclusions are in Section V. Due to page contraints, proofs are deferred to the accompanying Supplementary Material.

Ii Graph Learning from Smooth Signals

Let be an undirected graph, where are the nodes (or vertices) with , are the edges, and is the symmetric adjacency matrix collecting the edge weights. For we have . We exclude the possibility of self-loops, so is hollow meaning , . We acquire graph signal observations , where denotes the signal value at vertex . More general graphs capturing directionality are important [23], but beyond the scope of this letter.

Ii-a Graph signal smoothness

For undirected graphs one typically adopts the Laplacian as descriptor of graph structure, where collects the vertex degrees. As the central object in spectral graph theory, is instrumental in formalizing the notion of smooth (i.e., low-pass bandlimited) signals on graphs [40, 26]. Specifically, the total variation (TV) of the graph signal with respect to is given by the quadratic form


We can interpret as a smoothness measure for graph signals, which gauges the extent to which varies across local neighborhoods in . Accordingly, we say a signal is smooth if it has a small total variation. For reference, , where is the spectral radius of . The lower bound is attained by constant signals. The ubiquity of smooth network data has been well-documented, with examples spanning sensor measurements [6], protein function annotations [19], and product ratings [15]. These empirical findings motivate adopting smoothness as the criterion to search for graphs on which measurements exhibit desirable parsimony or regularity.

Ii-B Problem statement

We study the following graph learning problem.

Problem 1

Given a set of graph signal observations, the goal is to learn an undirected graph such that the observations in are smooth on .

We now briefly review the method proposed in [18, 17] to tackle Problem 1, from which we henceforth build on to develop a fast graph learning algorithm.

Consider the matrix , whose columns are the observations in . The rows, denoted by , collect all measurements at vertex . Define then the nodal Euclidean-distance matrix , where , . Using these notions, the signal smoothness measure over can be equivalently written as


where denotes element-wise product [18]. Smoothness minimization as criterion in Problem 1 has the following intuitive interpretation: when pairwise nodal distances in are sampled from a smooth manifold, the learnt topology tends to be sparse, preferentially choosing edges whose corresponding are smaller [cf. the weighted -norm in (2)].

Leveraging this neat link between signal smoothness and edge sparsity, a fairly general graph-learning framework was put forth in [18]. The idea therein is to solve the following convex inverse problem

s. to

where are tunable regularization parameters. Different from [9], the logarithmic barrier on the vertex degrees excludes the possibility of having (often undesirable) isolated vertices in the estimated graph. Through , the Frobenius-norm penalty offers a handle on the graphs’ edge sparsity level. Among the parameterized familiy of solutions to (3), the sparsest graph is obtained when .

Arguably, the most important upshot of identity (2) is computational. It facilitates formulating (3) as a search over adjacency matrices, and the resulting constraints (null diagonal, symmetry and non-negativity) are separable across the variables . This does not hold for the Laplacian . Exploting this favorable structure of (3), efficient solvers were developed based on PD iterations [18], the PG method [31], or the ADMM [38]. However, none of these graph learning methods come with convergence rate guarantees because the objective function of (3) lacks a Lipschitz continuous gradient. To close this gap, next we develop a markedly faster first-order algorithm using an accelerated dual-based PG method [2].

Iii Fast Dual Proximal Gradient Algorithm

Because is hollow and symmetric, the optimization variables in (3) are effectively the, say, upper-triangular elements ,

. Thus, it suffices to retain only those entries in the vector

, were we have adopted convenient Matlab notation. To impose that edge weights are non-negative, we penalize the cost with the indicator function if , else  [18]. This way, we equivalently reformulate (3) as the unconstrained, non-differentiable problem


where and maps edge weights to nodal degrees, i.e., . The non-smooth function is strongly convex with strong convexity parameter (details are in the Supplementary Material), while is a (strictly) convex function for all . Under the aforementioned properties of and , the composite problem (4) has a unique optimal solution ; see e.g., [2] and [38].

A fast dual-based PG algorithm was developed in [2] to solve the non-smooth, strictly convex optimization problem of which (4) is a particular instance. In the remainder of this section we will bring to bear this optimization framework to develop a novel graph learning algorithm with global rate of convergence guarantees.

Iii-a The dual problem

The structure of (4) lends itself naturally to variable-splitting via the equivalent linearly-constrained form


Attaching Lagrange multipliers to the equality constraints and minimizing the Lagrangian function w.r.t. the primal variables , one arrives at the (minimization form) dual problem [2]




Interestingly, the strong convexity of induces useful smoothness properties for (namely, the composition of with the Fenchel conjugate of ), that we summarize next. The result is adapted from [2, Lemma 3.1] and the additional proof arguments can be found in the Supplementary Material.

Lemma 1

Function in (7) is smooth, and the gradient is Lipschitz continuous with constant .

This additional structure of (6) makes it feasible to apply accelerated PG algorithms [3] (such as FISTA [1]), to solve the dual problem.

Iii-B Accelerated dual proximal gradient algorithm

The FISTA algorithm applied to the dual problem (6) yields the following iterations (initialized as and , henceforth denotes the iteration index)


where the proximal operator of a proper, lower semi-continuous convex function is (see e.g., [27])


An adaptation of the result in [2, Lemma 3.2] – stated as Proposition 1 below – yields the novel graph learning iterations tabulated under Algorithm 1. Again, due to page constraints the proof details are deferred to the Supplementary Material.

Proposition 1

The dual variable update iteration in (9) can be equivalently rewritten as , with


where in (13) as well as both and in (14) are element-wise operations on their vector arguments.

The updates in Proposition 1 are fully expressible in terms of parameters from the original graph learning problem, namely , , , and the data in . This is to be contrasted with (9), which necessitates the conjugate functions and .

Algorithm 1’s overall computational complexity is dominated by the update (13), which incurs a per iteration cost of . The remaining updates are also given in closed form, through simple operations of vectors living in the dual -dimensional domain of nodal degrees [cf. the -dimensional primal variables ]. The overall complexity of is in par with state-of-the-art PD and linearized ADMM algorithms [38], which have been shown to scale well to large networks with in the order of thousands. The computational cost can be further reduced by constraining a priori the space of possible edges; see [17] for examples where this approach is warranted. For a given problem instance, there are no step-size parameters to tune here (on top of and ) since we can explicitly compute the Lipschitz constant in Lemma 1. On the other hand, the linearized ADMM algorithm in [38] necessitates tuning two step-sizes and the penalty parameter defining the augmented Lagrangian.

The distinctive feature of the proposed accelerated dual PG algorithm is that it comes with global convergence rate guarantees. These results are outlined in the ensuing section.

Input parameters , data , set .
Initialize and at random.
for  do
end for
Output graph estimate
Algorithm 1 Topology inference via fast dual PG (FDPG)

Iii-C Convergence rate analysis

Moving on to convergence properties, when the iterates generated by Algorithm 1 provably approach a dual optimal solution that minimizes in (6); see e.g., [1]. The celebrated FISTA rate of convergence for the dual cost function is stated next.

Theorem 1

[1, Theorem 4.4] For all , dual iterates stemming from Algorithm 1 are such that


This well-documented global convergence rate of accelerated PG algorithms implies an iteration complexity to return an -optimal dual solution measured in terms of values.

We now consider a primal sequence generated from the iterates of Algorithm 1, and borrow the results from [2] to show the sequence is globally convergent to at a rate of . To this end, suppose that for all we are given dual updates generated from the accelerated dual PG algorithm. We can construct a primal sequence as , namely [cf. (7)]


As noted in [38], this primal sequence may be infeasible in the sense that resulting nodal degrees are not guaranteed to lie in the domain of . The promised rate of converge result for is stated next.

Theorem 2

[2, Theorem 4.1] For all , the primal sequence (16) defined in terms of dual iterates generated by Algorithm 1 satisfies


Iv Numerical Results



Fig. 1: Convergence performance in terms of primal variable error when recovering different synthetic and real graphs. (a) ER graphs with (top-left) and nodes (top-right); SBM graphs with (bottom-left), and nodes (bottom-right). (b) Four representative structural brain graphs with ROIs; Subject 1 (top-left), Subject 2 (top-right), Subject 4 (bottom-left), and Subject 6 (bottom-right). (c) Minnesota road network with intersections. In all cases, the proposed FDPG method converges faster to than state-of-the-art graph learning algorithms.

Here we test the proposed fast dual PG (FDPG) algorithm for learning random and real-world graphs from simulated signals. The merits of the formulation (3) in terms of recovering high-quality graphs have been well documented; see e.g., [18, 17, 24, 10] and references therein. For this reason, the numerical experiments that follow will exclusively focus on algorithmic performance, with no examination of the quality of the optimal solution that defines the learnt graph. In all ensuing test cases, we search for the best regularization parameters in terms of graph recovery performance, adopting the edge-detection F-measure as criterion. We compare Algorithm 1 to other state-of-the-art methods such as PD [18], PG [31], and linearized ADMM [38]. We also consider the non-accelerated dual PG (DPG) method that is obtained from Algorithm 1 when for all . For FDPG we implemented customary fixed-interval restarts of the momentum term in Algorithm 1; see also [25] for adaptive restart rules. Moreover, the ADMM parameters and PD step-size are tuned to yield the best possible convergence rate. Implementation details can be found in the publicly available code111, which can be used to generate all plots in Fig. 1.

Iv-a Random graphs

We generate ground-truth graphs as draws from the Erdős-Rényi (ER) model (edge formation probability

) with and nodes, as well as from the -block Stochastic Block Model (SBM) with the same number of nodes, and connection probability for nodes in the same community and for nodes in different blocks. We simulate i.i.d. graph signals , where represents the noise level and is the Laplacian of the ground-truth random graph. For a graph-based factor analysis model justifying this approach to smooth signal generation, see e.g., [9]. We compare the convergence performance of the aforementioned methods through the evolution of the primal variable error . To obtain for the chosen and , we ran the PD method for iterations. The results of these comparisons are illustrated in Fig. 1 (a). Apparently, the proposed FDPG algorithm markedly outperforms all other methods in terms of convergence rate, uniformly across graph model classes and number of nodes. Here, convergence to the largest graphs takes less iterations than for .

Iv-B Brain and road networks

We first focus on recovering the topology of unweighted structural brain graphs [14], all with regions of interest (ROIs) and whose edges connect ROIs with non-trivial density of axonal bundles; see also [35] for additional details. For a larger-scale experiment, we adopt the Minnesota road network which is an unweighted and undirected graph with intersections [7]. In both cases, we generated synthetic smooth signals over the real topologies using the generative model in Section IV-A. The high value of renders the ADMM’s 3-D parameter search a significantly time consuming operation. Hence, for the Minnesota road network experiment, we only focus on the proposed (F)DPG methods and the PD algorithm in [18].

Fig. 1 (b) depicts the convergence results for the structural brain networks of representative subjects. Once more, in all cases the FDPG method is faster, but for these smaller graphs the performance gap appears to narrow. The gains can also be quantified in terms of wall-clock time. For instance, for Subject the time in seconds for the algorithms to reach a suboptimality of are: s for FDPG, s for PD, s for ADMM and s for DPG. Results for the Minnesota road network are depicted in Fig. 1 (c), where the superiority of the proposed method is also apparent.

V Conclusion

We developed a fast and scalable algorithm to learn the graph structure of signals subject to a smoothness prior. Leveraging this cardinal property of network data is central to various statistical learning tasks, such as graph smoothing and semi-supervised node classification. We brought to bear a fast dual-based PG method to derive lightweight graph-learning iterations that come with global convergence rate guarantees. The merits of the proposed algorithm are showcased via experiments using several random and real-world graphs.


  • [1] A. Beck and M. Teboulle (2009-03) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Img. Sci. 2 (1), pp. 183–202. Cited by: §III-A, §III-C, Theorem 1.
  • [2] A. Beck and M. Teboulle (2014) A fast dual proximal gradient algorithm for convex minimization and applications. Operations Research Letters 42 (1), pp. 1–6. Cited by: §I, §II-B, §III-A, §III-B, §III-C, §III, §III, Proof of Lemma 1, Proof of Proposition 1, Theorem 2.
  • [3] A. Beck (2018) First-order methods in optimization. Society for Industrial and Applied Mathematics, Philadelphia, PA. Cited by: §III-A.
  • [4] P. Berger, G. Hannak, and G. Matz (2020) Efficient graph learning from noisy and incomplete data. IEEE Trans. Signal Inf. Process. Netw. 6 (), pp. 105–119. Cited by: §I.
  • [5] S. P. Chepuri, S. Liu, G. Leus, and A. O. Hero (2017-03) Learning sparse graphs under smoothness prior. In IEEE Intl. Conf. Acoust., Speech and Signal Process. (ICASSP), pp. 6508–6512. Cited by: §I.
  • [6] S. P. Chepuri, S. Liu, G. Leus, and A. O. Hero (2017) Learning sparse graphs under smoothness prior. In IEEE Intl. Conf. Acoust., Speech and Signal Process. (ICASSP), Vol. , pp. 6508–6512. Cited by: §II-A.
  • [7] T. A. Davis and Y. Hu (2011-12) The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38 (1). Cited by: §IV-B.
  • [8] A. P. Dempster (1972) Covariance selection. Biometrics 28 (1), pp. 157–175. Cited by: §I.
  • [9] X. Dong, D. Thanou, P. Frossard, and P. Vandergheynst (2016) Learning Laplacian matrix in smooth graph signal representations. IEEE Trans. Signal Process. 64 (23), pp. 6160–6173. Cited by: §I, §II-B, §IV-A.
  • [10] X. Dong, D. Thanou, M. Rabbat, and P. Frossard (2019) Learning graphs from data: a signal representation perspective. IEEE Signal Process. Mag. 36 (3), pp. 44–63. Cited by: §I, §IV.
  • [11] H. E. Egilmez, E. Pavez, and A. Ortega (2017-09) Graph learning from data under Laplacian and structural constraints. IEEE J. Sel. Topics Signal Process. 11 (6), pp. 825–841. Cited by: §I.
  • [12] J. Friedman, T. Hastie, and R. Tibshirani (2008) Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 (3), pp. 432–441. Cited by: §I.
  • [13] G. B. Giannakis, Y. Shen, and G. V. Karanikolas (2018) Topology identification and learning over graphs: accounting for nonlinearities and dynamics. Proc. IEEE 106 (5), pp. 787–807. Cited by: §I.
  • [14] P. Hagmann, L. Cammoun, X. Gigandet, R. Meuli, C. J. Honey, V. J. Wedeen, and O. Sporns (2008) Mapping the structural core of human cerebral cortex. PLoS Biol 6 (7), pp. e159. Cited by: §IV-B.
  • [15] W. Huang, A. G. Marques, and A. R. Ribeiro (2018) Rating prediction via graph signal processing. IEEE Trans. Signal Process. 66 (19), pp. 5066–5081. Cited by: §II-A.
  • [16] V. Kalofolias, A. Loukas, D. Thanou, and P. Frossard (2017) Learning time varying graphs. In IEEE Intl. Conf. Acoust., Speech and Signal Process. (ICASSP), Vol. , pp. 2826–2830. Cited by: §I.
  • [17] V. Kalofolias and N. Perraudin (2019) Large scale graph learning from smooth signals. In Int. Conf. Learning Representations (ICLR), Cited by: §I, §I, §II-B, §III-B, §IV.
  • [18] V. Kalofolias (2016) How to learn a graph from smooth signals. In Artif. Intel. and Stat. (AISTATS), pp. 920–929. Cited by: §I, §I, §II-B, §II-B, §II-B, §II-B, §III, §IV-B, §IV, Proof of Proposition 1.
  • [19] E. D. Kolaczyk (2009) Statistical analysis of network data: methods and models. Springer-Verlag, New York, NY. Cited by: §I, §I, §II-A.
  • [20] S. Kumar, J. Ying, J. V. de M. Cardoso, and D. P. Palomar (2020) A unified framework for structured graph learning via spectral constraints. J. Mach. Learn. Res. 21 (22), pp. 1–60. Cited by: §I.
  • [21] B. M. Lake and J. B. Tenenbaum (2010) Discovering structure by learning sparse graphs. In Annual Cognitive Sc. Conf., pp. 778 – 783. Cited by: §I.
  • [22] B. Le Bars, P. Humbert, L. Oudre, and A. Kalogeratos (2019) Learning Laplacian matrix from bandlimited graph signals. In IEEE Intl. Conf. Acoust., Speech and Signal Process. (ICASSP), pp. 2937–2941. Cited by: §I.
  • [23] A. G. Marques, S. Segarra, and G. Mateos (2020) Signal processing on directed graphs: the role of edge directionality when processing and learning from network data. IEEE Signal Process. Mag. 37 (6), pp. 99–116. Cited by: §II.
  • [24] G. Mateos, S. Segarra, A. G. Marques, and A. Ribeiro (2019) Connecting the dots: identifying network structure via graph signal processing. IEEE Signal Process. Mag. 36 (3), pp. 16–43. Cited by: §I, §IV.
  • [25] B. O’Donoghue and E. Candes (2015-04) Adaptive restart for accelerated gradient schemes. Foundations of Computational Mathematics 15, pp. 715–732. Cited by: §IV.
  • [26] A. Ortega, P. Frossard, J. Kovac̆ević, J. M. F. Moura, and P. Vandergheynst (2018) Graph signal processing: overview, challenges, and applications. Proc. IEEE 106 (5), pp. 808–828. Cited by: §I, §II-A.
  • [27] N. Parikh and S. Boyd (2014) Proximal algorithms. Foundations and Trends in optimization 1 (3), pp. 127–239. Cited by: §III-B.
  • [28] B. Pasdeloup, V. Gripon, G. Mercier, D. Pastor, and M. G. Rabbat (2017-09) Characterization and inference of graph diffusion processes from observations of stationary signals. IEEE Trans. Signal Inf. Process. Netw. PP (99), pp. . Cited by: §I.
  • [29] E. Pavez, H. E. Egilmez, and A. Ortega (2018-05) Learning graphs with monotone topology properties and multiple connected components. IEEE Trans. Signal Process. 66 (9), pp. 2399–2413. Cited by: §I.
  • [30] M. G. Rabbat (2017-03) Inferring sparse graphs from smooth signals with theoretical guarantees. In IEEE Intl. Conf. Acoust., Speech and Signal Process. (ICASSP), pp. 6533–6537. Cited by: §I.
  • [31] S. S. Saboksayr, G. Mateos, and M. Cetin (2021 (accepted; see also arXiv:2103.03762 [cs.LG])) Online graph learning under smoothness priors. In European Signal Process. Conf. (EUSIPCO), Dublin, Ireland, pp. . Cited by: §I, §II-B, §IV.
  • [32] A. Sandryhaila and J. M. F. Moura (2013-04) Discrete signal processing on graphs. IEEE Trans. Signal Process. 61 (7), pp. 1644–1656. Cited by: §I.
  • [33] S. Sardellitti, S. Barbarossa, and P. Di Lorenzo (2019) Graph topology inference based on sparsifying transform learning. IEEE Trans. Signal Process. 67 (7), pp. 1712–1727. Cited by: §I.
  • [34] S. Segarra, A.G. Marques, G. Mateos, and A. Ribeiro (2017-08) Network topology inference from spectral templates. IEEE Trans. Signal Inf. Process. Netw. 3 (3), pp. 467–483. Cited by: §I.
  • [35] S. Segarra, A. G. Marques, G. Mateos, and A. Ribeiro (2016) Network topology identification from spectral templates. In IEEE Wrkshp. Statistical Signal Process. (SSP), Vol. , pp. 1–5. Cited by: §IV-B.
  • [36] R. Shafipour and G. Mateos (2020-09) Online topology inference from streaming stationary graph signals with partial connectivity information. Algorithms 13 (9), pp. 1–19. Cited by: §I.
  • [37] D. I. Shuman, S. K. Narang, P. Frossard, A. Ortega, and P. Vandergheynst (2013)

    The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains

    IEEE Signal Process. Mag. 30 (3), pp. 83–98. Cited by: §I.
  • [38] X. Wang, C. Yao, H. Lei, and A. M. So (2021) An efficient alternating direction method for graph learning from smooth signals. In IEEE Intl. Conf. Acoust., Speech and Signal Process. (ICASSP), Toronto, Canada, pp. 5380–5384. Cited by: §I, §II-B, §III-B, §III-C, §III, §IV, Proof of Proposition 1.
  • [39] M. Yuan and Y. Lin (2007) Model selection and estimation in the Gaussian graphical model. Biometrika 94 (1), pp. 19–35. Cited by: §I.
  • [40] D. Zhou and B. Schölkopf (2004) A regularization framework for learning from graph data. In Int. Conf. Mach. Learning (ICML), Cited by: §II-A.

Supplementary Material

Proof of Lemma 1

A couple preliminary calculations are required to derive an explicit expression for the Lipschitz constant of .

Lemma 2

The function is strongly convex with constant .

Proof : The strong convexity of with parameter follows because

is a convex function.

Lemma 3

Going back to (4), recall defined so that . Then, .

Proof : Because maps the upper-triangular adjacency matrix entries in to the degree sequence , then has ones in each row while all other entries are zero. Hence, the diagonal entries of are all and the off-diagonal entries are equal to

. The eigenvalues

of are the roots of the characteristic polynomial

To obtain the third equality we leveraged the Sherman-Morrison formula, where stands for the adjugate matrix of . From the final factorization of the polynomial, the eigenvalues are . Because , the result follows.

Since is strongly convex (with constant ), by virtue of [2, Lemma 3.1] the function is continuously differentiable and it has a Lipschitz continuous gradient with constant . From the expressions for and in Lemmata 2 and 3, the result follows.

Proof of Proposition 1

Leveraging the result in [2, Lemma 3.2], it follows that for all , the dual variable update iteration in (9) can be equivalently rewritten as , with


Starting with (18), we have from the definition of that

as desired [cf. (13)]. The last equality follows from the fact that the proximal operator of is the projection onto the non-negative orthant .

To arrive at the update of in (14), it suffices to start from (19) and recall that the proximal operator of is given by (see e.g., [18] and [38, Proposition 2])

where the square and square root are understood to be taken element-wise. Evaluating the proximal operator at , the result follows.

Dual suboptimality

Fig. 2: Convergence performance in terms of dual suboptimality , when recovering the SBM graph with nodes described in Section IV-A. As expected, the FDPG graph learning algorithm converges markedly faster than its non-accelerated counterpart.

Recall one of the test cases in Section IV-A, where the goal was to recover a -block Stochastic Block Model (SBM) with nodes from synthetically-generated smooth signals. In Fig. 2 we depict the evolution of the dual suboptimality for the FDPG (Algorithm 1) and DPG iterations. As expected, the proposed FDPG solver markedly outperforms its non-accelerated counterpart in terms of convergence rate. Similar behavior can be observed for random graphs drawn from the Erdős-Rényi (ER) model and for different values of and . The corresponding plots are not included due to lack of space.