1 Introduction
Graphs are fundamental mathematical structures consisting of sets of nodes and weighted edges among them. The weight associated with each edge represents the similarity between the two vertices it connects. Graphical models provide an effective abstraction for expressing dependence relationships among data variables available across numerous applications (see Barabási et al., 2016; Wang et al., 2018; Friedman et al., 2008; Guo et al., 2011; Segarra et al., 2017; Banerjee et al., 2008). The aim of any graphical model is to encode the dependencies among the data in the form of a graph matrix, where nonzero entries of the matrix imply the dependencies among any two variables. Gaussian graphical modeling (GGM) encodes the conditional dependence relationships among a set of variables (Dempster, 1972; Lauritzen, 1996)
. GGM is a tool of increasing importance in a number of fields, including finance, biology, statistical learning, and computer vision
(Friedman et al., 2008). In this framework, an undirected graph is matched to the variables, where each vertex corresponds to one variable, and an edge is present between two vertices if the corresponding random variables are conditionally dependent
(Lauritzen, 1996). Putting it more formally, consider andimensional vector
, the GGM method aims to learn a graph through the following optimization problem(1) 
where denotes the desired graph matrix with the number of nodes in the graph, denotes the set of positive definite matrices of size , is a similarity matrix, is the regularization term, and is the regularization parameter. When the observed data is distributed according to a zeromean
variate Gaussian distribution and the similarity matrix is the sample covariance matrix (SCM), the optimization in (
1) corresponds to the maximum likelihood estimation (MLE) of the inverse covariance (precision) matrix of the Gaussian random variable also known as Gaussian Markov Random Field (GMRF). With the graph inferred from
, the random vector follows the Markov property: implies and are conditionally dependent given the rest (see Lauritzen, 1996; Dempster, 1972).In many realworld applications, prior knowledge about the underlying graph structure is usually available. For example, in gene network analysis, genes can be grouped into pathways, and connections within a pathway might be more likely than connections between pathways, forming a cluster (Marlin and Murphy, 2009). For better interpretability and precise identification of the structure in the data, it is desirable to enforce structures on the learned graph matrix . Furthermore, the structured graph also enables performing more sophisticated tasks such as prediction, community detection, clustering, and causal inference.
It is known that if the ultimate goal is structured graph learning, structure inference and graph weight estimation should be done in a singlestep (Ambroise et al., 2009; Hao et al., 2018). Performing the structure inference (also known as model selection) prior to the weight estimation (also known as parameter estimation) in the selected model will, in fact, result in a nonrobust procedure (Ambroise et al., 2009). Although GGM has been extended to incorporate structures on the learned graph, most of the existing methods perform graph structure learning and graph weight estimation separately. Essentially, the methods are either able to infer connectivity information (Ambroise et al., 2009) or with known connectivity information could perform the graph weights estimation (see Lee and Liu, 2015; Wang, 2015; Cai et al., 2016; Danaher et al., 2014; Pavez et al., 2018; Egilmez et al., 2017). Furthermore there are few recent works considering the two tasks jointly, but those methods are limited to some specific structures (e.g., multicomponent in Hao et al., 2018) which cannot be extended to other graph structures. In addition, these methods involve computationally demanding multistage steps, which make it unsuitable for big data applications.
In general, structured graph learning is an NPhard combinatorial problem (Anandkumar et al., 2012; Bogdanov et al., 2008)
which brings difficulty in designing a general tractable optimization method. In this paper, we propose to integrate spectral graph theory with GGM graph learning, and convert combinatorial constraints of graph structure into analytical constraints on graph matrix eigenvalues. Realizing the fact that combinatorial structures of a family of graphs (e.g., multicomponent graph, bipartite graph, etc.) are encoded in the eigenvalue properties of their graph matrices, we devise a general framework of
Structured Graph (SG) learning by enforcing spectral constraints instead of combinatorial structure constraints directly. We develop computationally efficient and theoretically convergent algorithms that can learn graph structures and weights simultaneously.1.1 Related work
The penalized likelihood approach with sparsity regularization has been widely studied in precision matrix estimation. An norm regularization () which promotes elementwise sparsity on the graph matrix is a common choice of regularization function to enforce a sparse structure (Yuan and Lin, 2007; Shojaie and Michailidis, 2010a, b; Ravikumar et al., 2010; Mazumder and Hastie, 2012; Fattahi and Sojoudi, 2019). Authors in Friedman et al. (2008) came up with an efficient computational method to solve (1) and proposed the wellknown GLasso algorithm. In addition, nonconvex penalties are proposed for sparse precision matrix estimation to reduce estimation bias (Shen et al., 2012; Lam and Fan, 2009). However, if a specific structure is required then simply a sparse graphical modeling is not sufficient, since it only enforces a uniform sparsity structure (Heinävaara et al., 2016; Tarzanagh and Michailidis, 2017). Towards this, the sparse GGM model should be extended to incorporate more specific structures.
In this direction, the work in Ambroise et al. (2009)
has considered the problem of graph connectivity inference for multicomponent structure and developed a twostage framework lying at the integration of expectation maximization (EM) and the graphical Lasso framework. The works in
Lee and Liu (2015); Wang (2015); Cai et al. (2016); Danaher et al. (2014); Guo et al. (2011); Sun et al. (2015); Tan et al. (2015) have considered the problem of edgeweight estimation with the known connectivity information. However, prior knowledge of connectivity information is not always available, in particular for the massive data with complex and unknown population structures (Hao et al., 2018; Jeziorski and Segal, 2015). Furthermore, considering simultaneous connectivity inference and graph weight estimation, twostage methods based on Bayesian model (Marlin and Murphy, 2009) and expectation maximization (Hao et al., 2018) were proposed, but these methods are computationally prohibitive and limited to only multicomponet graph structures.Other important graph structures have also been considered for example: factor models in Meng et al. (2014), scale free in (Liu and Ihler, 2011)
, eigenvector centrality prior in
Fiori et al. (2012), degreedistribution in Huang and Jebara (2008), and overlapping structure with multiple graphical models in Tarzanagh and Michailidis (2017); Mohan et al. (2014), tree structure in Chow and Liu (1968); Anandkumar et al. (2012). Recently, there has been a considerable interest in enforcing the Laplacian structure (see Lake and Tenenbaum, 2010; Slawski and Hein, 2015; Pavez and Ortega, 2016; Kalofolias, 2016; Egilmez et al., 2017; Pavez et al., 2018) but all these methods are limited to learning a graph without specific structural constraints, or just learn Laplacian weights for a graph with the connectivity information.Due to the complexity posed by the graph learning problem, owing to its combinatorial nature, existing methods are tailored to specific structures which cannot be generalized to other graph structures; require connectivity information for graph weight estimation; often involve multistage framework and become computationally prohibitive. Furthermore, there does not exist any GGM framework to learn a graph with useful structures such as bipartite structure, regular structure and multicomponent bipartite structure.
1.2 Summary of contributions
Enforcing a structure onto a graph is generally an NPhard combinatorial problem, which is difficult to solve via existing methods. In this paper, we propose a unified framework of structured graph learning. Our contributions are threefold:
First, we introduce new problem formulations that convert the combinatorial constraints into analytical spectral constraints on Laplacian and adjancency matrices, resulting in three main formulations:

Structured graph learning via Laplacian spectral constraints:
This formulation utilizes the Laplacian matrix spectral properties to learn multicomponent graph, regular graph, multicomponent regular graph, sparse connected graph, modular graph, grid graph and other specific structured graphs. 
Structured graph learning via adjacency spectral constraints
This formulation utilizes spectral properties of the adjacency matrix for bipartite graph learning. 
Structured graph learning via Laplacian and adjacency spectral constraints
Under this formulation we simultaneously utilize spectral properties of Laplacian and adjacency matrices to enforce nontrivial structures including bipartiteregular graph, multicomponent bipartite graph, and multicomponent bipartiteregular graph structures.
Second, we develop algorithms based on the block majorizationminimization (MM) framework also known as block successive upperbound minimization (BSUM) to solve the proposed formulations. The algorithms are theoretically convergent and computationally efficient with worst case complexity , which is same as that of GLasso.
Third, we verify the effectiveness of the proposed algorithms via extensive synthetic and real data sets experiments. We believe that the work carried out in this paper will provide a starting point for structured graph learning based on Gaussian Markov random fields and spectral graph theory, which in turn may have a significant and longstanding impact. The code for all the simulations is made available as open source repository on author’s website^{1}^{1}1 https://github.com/dppalomar/spectralGraphTopology.
1.3 Outline and Notation
This paper is organized as follows. The generalized problem formulation and related background are provided in Section 2. The detailed algorithm derivations and the associated convergence results are presented in Sections 3, 4, and 5. Then the simulation results with both real and synthetic data sets for the proposed algorithms are provided in Section 6. Finally, Section 7 concludes the paper with a list of plausible extensions.
In terms of notation, lower case (bold) letters denote scalars (vectors) and upper case letters denote matrices, whose sizes are not stated if they are clear from the context. The th entry of a matrix is denoted by . and denote the pseudo inverse and transpose of matrix , respectively. The allzero and allone vectors or matrices of all sizes are denoted by and , respectively. , denote norm and Frobenius norm of , respectively. is defined as the generalized determinant of a positive semidefinite matrix , i.e., the product of its nonzero eigenvalues. The inner product of two matrices is defined as . is a diagonal matrix with diagonal elements of filling its principal diagonal and diag is a vector with diagonal elements of as the vector elements. Operators are defined using calligraphic letters.
2 Problem Formulation
A graph is denoted by , where is the vertex set, and is the edge set. If there is an edge between vertices and we denote it by . We consider a simple undirected graph with positive weights , having no selfloops or multiple edges and therefore its edge set consists of distinct pairs. Graphs are conveniently represented by some matrix (such as Laplacian and adjacency graph matrices), whose nonzero entries correspond to edges in the graph. The choice of a matrix usually depends on modeling assumptions, properties of the desired graph, applications, and theoretical requirements.
A matrix is called as a graph Laplacian matrix if its elements satisfy
(2) 
The properties of the elements of in (2) imply that the Laplacian matrix is: i) diagonally dominant (i.e., ); ii) positive semidefinite, implied from the diagonally dominant property (see den Hertog et al., 1993, Proposition 2.2.20.); iii) an matrix, i.e., a positive semidefinite matrix with nonpositive offdiagonal elements (Slawski and Hein, 2015); iv) zero row sum and column sum of (i.e., ), which means that the vector satisfies (Chung, 1997).
We introduce the adjacency matrix as
(3) 
The nonzero entries of the matrix encode edge weights as and implies no connectivity between vertices and .
Definition 1.
Let be an symmetric positive semidefinite matrix with rank . Then is an improper GMRF (IGMRF) of rank with parameters (assuming without loss of generality), if its density is
(4) 
where denotes the generalized determinant (Rue and Held, 2005) defined as the product of nonzero eigenvalues of . Furthermore, is called IGMRF w.r.t to a graph , where
(5)  
(6) 
It simply states that the nonzero pattern of determines , so we can read off from whether and are conditionally independent. If the rank of is exactly then is called GMRF and parameters () represent the mean and precision matrix corresponding a variate Gaussian distribution (Rue and Held, 2005). In addition, if precision has nonpositive offdiagonal entries (Slawski and Hein, 2015) then random vector is called an attractive improper GMRF.
2.1 A General Framework for Graph Learning under Spectral Constraints
A general scheme is to learn the matrix as a Laplacian matrix under some eigenvalue constraints, which are motivated from the a priori information for enforcing structure on the learned graph. Now we introduce a general optimization framework for structured graph learning via spectral constraints on the graph matrices,
(7) 
where denotes the observed data statistics (e.g., the sample covariance matrix), is the sought graph matrix to be optimized, is the Laplacian matrix structural constraint set (2), is a regularization term (e.g., sparsity), denotes the eigenvalues of , which is the transformation of matrix . More specifically, if is identity, then , implying we impose constraints on the eigenvalues of the Laplacian matrix ; if defined in (3), then we enforce constraints on the eigenvalues of the adjacency matrix , and is the set containing spectral constraints on the eigenvalues.
Fundamentally, the formulation in aims to learn a structured graph Laplacian matrix given data statistics , where enforces Laplacian matrix structure and allows to include structural constraints of desired graph structure via spectral constraints on the eigenvalues. Observe that the formulation (7) has converted the complicated combinatorial structural constraints into the simple analytical spectral constraints, due to which, now the structured graph learning becomes a matrix optimization problem under the proper choice of spectral constraints.
Remark 1.
Apart from motivation of enforcing structure onto a graph, the Laplacian matrix is also desirable from numerous practical and theoretical considerations: i) Laplacian matrix is widely used in spectral graph theory, machine learning, graph regularization, graph signal processing, and graph convolution networks
(Smola and Kondor, 2003; Defferrard et al., 2016; Egilmez et al., 2017; Chung, 1997); ii) in the highdimensional setting where the number of the data samples is less than the dimension of the data, learning as an matrix greatly simplifies the optimization problem by avoiding the need for the explicit regularization term (Slawski and Hein, 2015); iii) the graph Laplacian is crucial for utilizing the GMRF framework, which requires the matrix to have the positive semidefinite property (Rue and Held, 2005); iv) the graph Laplacian allows flexibility in incorporating useful spectral properties of graph matrices(Chung, 1997; Spielman and Teng, 2011).Remark 2.
From the probabilistic perspective, when the similarity matrix is the sample covariance matrix of Gaussian data, (7) can be viewed as penalized maximum likelihood estimation problem of structured precision matrix of an improper attractive GMRF model, see Definition 1. In a more general setting with arbitrarily distributed data, when the similarity matrix is positive definite matrix, then formulation (7) can be related to the logdeterminant Bregman divergence regularized optimization problem (see Dhillon and Tropp, 2007; Duchi et al., 2012; Slawski and Hein, 2015), where the goal is to find the parameters of multivariate Gaussian model that best approximates the data.
In the coming subsections, we will specialize the optimization framework in (8) under Laplacian eigenvalue constraints, adjacency eigenvalue constraints, and joint Laplacian and adjacency eigenvalue constraints.
2.2 Structured Graph Learning Via Laplacian Spectral Constraints
To enforce spectral constraints on the Laplacian matrix (i.e., in (7)), we consider the following optimization problem:
(8) 
where is the desired Laplacian matrix and admits the decomposition , is a diagonal matrix containing on its diagonal with , and is a matrix satisfying . We enforce to be a Laplacian matrix by the constraint , while we incorporate some specific spectral constraints on by forcing , with containing priori spectral information on the desired graph structure.
Next, we will introduce various choices of that will enable (8) to learn numerous popular graph structures.
2.2.1 component graph
A graph is said to be component connected if its vertex set can be partitioned into disjoint subsets such that any two nodes belonging to different subsets are not connected by an edge. Any edge in edge set have end points in , and no edge connect two different components. The component structural property of a graph is directly encoded in the eigenvalues of its Laplacian matrix. The multiplicity of zero eigenvalue of a Laplacian matrix gives the number of connected components of a graph .
Theorem 1.
Figure 1 depicts a component graph and its Laplacian eigenvalues with =3 connected components and zero eigenvalues.
2.2.2 Connected sparse graph
A sparse graph is simply a graph with not many connections among the nodes. Often, making a graph highly sparse can split the graph into several disconnected components, which many times is undesirable (Sundin et al., 2017; HassanMoghaddam et al., 2016). The existing formulation cannot ensure both sparsity and connectedness, and there always exists a tradeoff between the two properties. Within the formulation (8) we can achieve sparsity and connectedness by using the following spectral constraint:
(10) 
with a proper choice of .
2.2.3 regular graph
All the nodes of a regular graph have the same weighted degree (), where weighted degree is defined as , which implies:
Within the above formulation (17) a regular structure on the matrix can be enforced by including the following constraints
(11) 
2.2.4 component regular graph
2.2.5 Cospectral graphs
In many applications, it is motivated to learn with specific eigenvalues which is also known as cospectral graph learning (Godsil and McKay, 1982). One example is spectral sparsification of graphs (see Spielman and Teng, 2011; Loukas and Vandergheynst, 2018) which aims to learn a graph to approximate a given graph , while is sparse and its eigenvalues satisfy , where are the eigenvalues of the given graph and is some specific function. Therefore, for cospectral graph learning, we introduce the following constraint
(13) 
2.3 Structured Graph Learning Via Adjacency Spectral Constraints
To enforce spectral constraints on adjacency matrix (i.e., in (7)), we introduce the following optimization problem:
(14) 
where is the desired Laplacian matrix, is the corresponding adjacency matrix which admits the decomposition with and . We enforce to be a Laplacian matrix by the constraint , while we incorporate some specific spectral constraints on its adjacency matrix by forcing , with containing priori spectral information of the desired graph structure.
Next, we will introduce various choices of that will enable (14) to learn bipartite graph structures.
2.3.1 General bipartite graph
A graph is said to be bipartite if its vertex set can be partitioned into two disjoint subsets such that no two points belonging to the same subset are connected by an edge (Zha et al., 2001), i.e. for each then . Spectral graph theory states that a graph is bipartite if and only if the spectrum of the associated adjacency matrix is symmetric about the origin (Van Mieghem, 2010, Ch.5) (Mohar, 1997).
Theorem 2.
(see Mohar, 1997) A graph is bipartite if and only if the spectrum of the associated adjacency matrix is symmetric about the origin
(15)  
2.3.2 Connected bipartite graph
The PerronFrobenius theorem states that if a graph is connected, then the largest eigenvalue of its adjacency matrix has multiplicity 1 (Mohar, 1997). Thus, a connected bipartite graph can be learned by including additional constraint on the multiplicity to be one on the largest and smallest eigenvalues, i.e. are not repeated. Figure 2 shows a connected bipartite graph and its adjacency symmetric eigenvalues.
Theorem 3.
(see Mohar, 1997) A graph is connected bipartite graph if and only if the spectrum of the associated adjacency matrix is symmetric about the origin with nonrepeated extreme eigenvalues
(16)  
2.4 Structured Graph Learning Via Joint Laplacian and Adjacency Spectral Constraints
To enforce spectral constraints on Laplacian matrix and adjacency matrix , we introduce the following optimization problem:
(17) 
where is the desired Laplacian matrix which admits the decomposition with , , and is the corresponding adjacency matrix which admits the decomposition with and . Observe that the above formulation learns a graph Laplacian matrix with a specific structure by enforcing the spectral constraints on the adjacency and Laplacian matrices simultaneously. Next, we will introduce various choices of and that will enable (17) to learn nontrivial complex graph popular graph structures.
2.4.1 component bipartite graph
A component bipartite graph, also known as bipartite graph clustering, has a significant relevance in many machine learning and financial applications (Zha et al., 2001). Recall that the bipartite structure can be enforced by utilizing the adjacency eigenvalues property (i.e., the constraints in (15)) and component structure can be enforced by the Laplacian eigenvalues (i.e., the zero eigenvalues with multiplicity ). These two disparate requirements can be simultaneously imposed in the current formulation (17), by choosing:
(18)  
2.4.2 component regular bipartite graph
The eigenvalue property of regular graph relates the eigenvalues of its adjacency matrix and Laplacian matrix, which is summarized in the following lemma.
Theorem 4.
(Mohar, 1997) Collecting the Laplacian eigenvalues in increasing order () and the adjacency eigenvalues in decreasing order (), then the eigenvalue pairs for a regular graph are related as follows:
(19) 
A component regular bipartite structure can be enforced by utilizing the adjacency eigenvalues property (for bipartite structure), Laplacian eigenvalues (for component structure) along with the joint spectral properties for the regular graph structure:
(20)  
2.5 Block Successive Upperbound Minimization algorithm
The resulting optimization formulations presented in (8), (14), and (17) are still complicated. The aim here is to develop efficient optimization methods with low computational complexity based on the BSUM and majorizationminimization framework (Razaviyayn et al., 2013; Sun et al., 2016). To begin with, we present a general schematic of the BSUM optimization framework
(21) 
where the optimization variable is partitioned into blocks as , with , is a closed convex set, and is a continuous function. At the th iteration, each block is updated in a cyclic order by solving the following:
(22) 
where with is a majorization function of at satisfying
(23a)  
(23b)  
(23c)  
(23d) 
where stands for the directional derivative at along (Razaviyayn et al., 2013). In summary, the framework is based on a sequential inexact block coordinate approach, which updates the variable in one block keeping the other blocks fixed. If the surrogate functions is properly chosen, then the solution to (22) could be easier to obtain than solving (21) directly.
3 Structured Graph Learning Via Laplacian Spectral Constraints (Sgl)
In this section, we develop a BSUMbased algorithm for Structured Graph learning via Laplacian spectral constraints (SGL). In particular, we consider solving (8) under component Laplacian spectral constraints (9). To enforce sparsity we use the regularization function (i.e., ). Next observing that the sign of is fixed by the constraints and , the regularization term can be written by , where , problem (8) becomes
(24) 
where . The resulting problem is complicated and intractable in the current form due to i) Laplacian structural constraints , ii) coupling variables , and iii) generalized determinant on . In order to derive a more feasible formulation, we first introduce a linear operator which transforms the Laplacian structural constraints to simple algebraic constraints and then relax the eigendecomposition expression into the objective function.
3.1 Graph Laplacian operator
The Laplacian matrix belonging to satisfies i) , ii)
, implying the target matrix is symmetric with degrees of freedom of
equal to . Therefore, we introduce a linear operator that transforms a nonnegative vector into the matrix that satisfies the Laplacian constraints ( and ).Definition 2.
The linear operator is defined as
where
We derive the adjoint operator of by making satisfy .
Lemma 1.
The adjoint operator is defined by
where satisfy and .
A toy example is given to illustrate the operators and more clearly. Consider a weight vector . The Laplacian operator on gives
(25) 
The operation of on a symmetric matrix returns a vector
(26) 
By the definition of , we have Lemma 2.
Lemma 2.
The operator norm is , where with .
Proof.
Follows from the definitions of and : see Appendix 8.1 for detailed proof. ∎
We have introduced the operator that helps to transform the complicated structural matrix variable to a simple vector variable . The linear operator is an important component of the SGL framework.
3.2 Sgl algorithm
To solve (24), we represent the Laplacian matrix as and then develop an algorithm based on quadratic methods (Nikolova and Ng, 2005; Ying et al., 2018). We introduce the term to keep close to instead of exactly solving the constraint , where . Note that this relaxation can be made tight by choosing sufficient large or iteratively increasing . Now, the original problem can be formulated as
(27) 
where means each entry of is nonnegative. When solving (27) to learn the component graph structure with the constraints in (9), the first zero eigenvalues as well as the corresponding eigenvectors can be dropped from the optimization formulation. Now the only contains nonzero eigenvalues in increasing order , then we can replace generalized determinant with determinant on in (27). contains the eigenvectors corresponding to the nonzero eigenvalues in the same order, and the orthogonality constraints on becomes . The nonzero eigenvalues are ordered and lie in the given set,
(28) 
Collecting the variables in three block as , we develop a BSUMbased algorithm which updates only one variable each time with the other variables fixed.
3.2.1 Update of
Treating as a variable with and fixed, and ignoring the terms independent of , we have the following subproblem:
(29) 
The problem (29) can be written as a nonnegative quadratic problem,
(30) 
where .
Lemma 3.
The subproblem (30) is a strictly convex optimization problem.
Proof.
From the definition of operator and the property of its adjoint , we have
(31) 
The above result implies that is a strictly convex function. Together with the fact that the nonnegativity set is convex, we conclude the subproblem (30) is strictly convex. But, it is not possible here to derive a closedform solution due to the nonnegativity constraint (), and thus we derive a majorijation function. ∎
Lemma 4.
It is easy to check the conditions (23) for the majorization function (See more details in Sun et al., 2016; Song et al., 2015) and we ignore the proof here. Note that the majorization function as in (32) is in accordance with the requirement of the majorization as in (23b), because in the problem (30), and the other coordinates () are fixed. For notational brevity, we present the majorization function as instead of .
After ignoring the constant terms in (32), the majorized problem of (30) at is given by
(33) 
where and .
Lemma 5.
From the KKT optimality conditions we can easily obtain the optimal solution to (33) as
(34) 
where .
3.2.2 Update of
Treating as a variable block, and fixing for and , we obtain the following subproblem:
(35) 
The equivalent problem is reformulated as follows
(36) 
The problem (36) is an optimization on the orthogonal Stiefel manifold . From (Absil et al., 2009; Benidis et al., 2016) the maximizer of (36) is the eigenvectors of (suitably ordered).
3.2.3 Update for
We obtain the following subproblem for the update
(38) 
The optimization (38) can be rewritten as
(39) 
With slight abuse of notation and for ease of exposition, we denote the indices for the nonzero eigenvalues in (28) from to instead of to . The problem (39) can be further written as
(40) 
where and with the th diagonal element of . We derive a computationally efficient method to solve (40) from KKT optimality conditions. The update rule for follows an iterative procedure summarized in Algorithm 1. The subproblem (40) is a convex optimization problem. One can solve the convex problem (40) with a solver (e.g., CVX) but we can do it more efficiently with our algorithm for large scale problems.
Lemma 7.
Proof.
Please refer to the Appendix 8.2 for the detailed proof. ∎
To update ’s, Algorithm 1 iteratively check situations [cf. steps 6, 10 and 14] and updates the ’s accordingly until is satisfied. If some situation happens, then the corresponding ’s need to be updated accordingly. Note that the situations are independent from each other, i.e., each will not involve two situations simultaneously. Furthermore, ’s are updated iteratively according to the above situations until all of them satisfy the KKT conditions, the maximum number of iterations is .
Remark 3.
The problem of the form (40) is popularly known as a regularized isotonic regression problem. The isotonic regression is a wellresearched problem that has found applications in numerous domains see (see Best and Chakravarti, 1990; Lee et al., 1981; Barlow and Brunk, 1972; Luss and Rosset, 2014; Bartholomew, 2004). To the best of our knowledge, however, there does not exist any computationally efficient method comparable to the Algorithm 1. The proposed algorithm can obtain a globally optimal solution within a maximum of iterations for the dimensional regularized isotonic regression problem, and can be potentially adapted to solve other isotonic regression problems. The computationally efficient Algorithm 1 also holds an important contribution for the isotonic regression literature.
3.2.4 Sgl algorithm summary
SGL in Algorithm 2 summarizes the implementation of the structured graph learning via Laplacian spectral constraints.
In Algorithm 2, the computationally most demanding step is the eigendecomposition step required for the update of . Implying as the worstcase computational complexity of the algorithm. This can further be improved by utilizing the sparse structure and the properties of the symmetric Laplacian matrix for eigendecomposition. The most widely used GLasso method (Friedman et al., 2008) has similar worstcase complexity, although the GLasso learns a graph without structural constraints. While considering specific structural requirements, the SGL algorithm has a considerable advantage over other competing structured graph learning algorithms in Marlin and Murphy (2009); Hao et al. (2018); Ambroise et al. (2009).
Proof.
The detailed proof is deferred to the Appendix 8.3. ∎
Remark 4.
Note that the SGL is not only limited to component graph learning, but can be easily adapted to learn other graph structures under aforementioned spectral constraints in (10), (11), (12), and (13). Furthermore, the SGL can also be utilized to learn popular connected graph structures (e.g., ErdosRenyi graph, modular graph, grid graph, etc.) even without specific spectral constraints just by choosing the eigenvalue constraints corresponding to one component graph (i.e., ) and setting to very small and large values respectively. Detailed experiments with important graph structures are carried out in the simulation section.
4 Structured Graph Learning Via Adjacency Spectral Constraints (Sga)
In this section, we develop a BSUMbased algorithm for Structured Graph learning via Adjacencny spectral constraints (SGA). In particular, we consider to solve (14) for connected bipartite graph structure by introducing the spectral constraints on the adjacency eigenvalues (15). Since is a connected graph, the term can be simplified according to the following lemma.
Lemma 8.
If is a Laplacian matrix for a connected graph, then
(41) 
where .
Proof.
It is easy to establish (41) by the fact that . ∎
4.1 Graph adjacency operator
To guarantee the structure of adjacency matrix, we introduce a linear operator .
Definition 3.
We define a linear operator
Comments
There are no comments yet.