The study of random networks has been a topic of great interest in recent years, e.g. see Kolaczyk (2009) and Newman (2010). A network is defined as a structure composed of nodes and edges connecting nodes in various relationships. The observed network can be represented by an adjacency matrix , where is the total number of nodes within the network. For a binary relation network, as considered here, if there is an edge from node to node and otherwise. In the following we identify an adjacency matrix with the network itself.
Most relational phenomena are dependent phenomena, and dependence is often of substantive interest. Frank and Strauss (1986) and Wasserman and Pattison (1996) introduced exponential random graph models which allow the modelling of a wide range of dependences of substantive interest, including transitive closure. For such models, and the distribution of is assumed to follow the exponential family form where and are the sufficient statistics, e. g. the total number of edges. However, as mentioned in Schweinberger and Handcock (2014), exponential random graph models are lacking neighborhood structure, and that makes modelling dependencies challenging for such networks. Neighborhoods (or communities, blocks) are in general defined as a group of individuals (nodes), such that individuals within a group interact with each other more frequently than with those outside the group. Recently, Schweinberger and Handcock (2014) proposed the concept of local dependence in stochastic networks. This concept allows for dependence within neighborhoods, while different neighborhoods are independent.
In contrast to that, our work is considering dependence between blocks, while the connections within blocks are assumed independent. We also assume the blocks to be known. We then propose to analyze dependencies between blocks by means of graphical models. To this end, we assume an undirected network so that
where indicate block memberships in one of blocks; govern the intensities of the connectivities within and between blocks, ; and is a symmetric matrix. We then put a Gaussian logistic model on the . More precisely, for the diagonal elements , assume that
where is a vector of given co-variables corresponding to block , and is the parameter vector. Furthermore, with
where is an nonsingular covariance matrix. Each off-diagonal element () is assumed to be independent with all the other elements of . The latter assumption is made to simplify the exposition. A similar model can be found in Xu and Hero (2014).
The dependence between the induces dependence between blocks. We can thus analyze this induced dependence in our network model, by using methods from Gaussian graphical models, via selecting the zeros in the precision matrix . Adding dependencies between the with would increase the dimension of , and induce ‘second order dependencies’ to the network structure, namely, dependencies of block connections between different pairs of blocks.
It is crucial to observe that this Gaussian graphical model is defined in terms of the (or, more precisely, in terms of their log-adds ratios), and that these quantities obviously are not observed. Thus, they need to be estimated from our network data, and, to this end, we here assume the availability of iid observations of the network. This estimation, in turn, induces additional randomness to our analysis of the graphical model. We are therefore facing similar challenges as in the analysis of Gaussian graphical models under uncertainty. However, our situation is more complex, as will become clear below.
The methods for neighborhood selection considered here, are based on the column-wise methodology of Meinshausen and Bühlmann (2006). We apply this methodology (under uncertainty) to some known selection methods from the literature, thereby, adjusting these methods for the additional uncertainty. The selection methods considered here are (i) the graphical Lasso of Meinshausen and Bühlmann (2006), (ii) a class of Dantzig-type selectors, that includes the Dantzig selector of Candes and Tao (2007), and (iii) the matrix uncertainty selector of Rosenbaum et al. (2010). This will lead to ‘graphical’ versions of the respective procedures. The graphical Dantzig selector already has been studied in Yuan (2010), but without the additional uncertainty we are facing here. This leads to novel selection methodologies for which we derive statistical guarantees. We also present numerical studies to illustrate their finite sample performance.
More details on our latent variable block model is discussed in Section 2. Thereby we also introduce some basic notation. Section 3 introduces our neighborhood selection method-ologies, and presents results on their large sample performance. Tuning parameter selection is also discussed there. Numerical studies are presented in Section 4, and the proofs of our main results are in Appendix 5..
2 Some important preliminary facts
Let withbe the design matrix. Our latent variable block model (1.1) - (1.3) says that . The dependence among the encoded in is propagated to the . Let , then the following fact holds.
In other words, if
denotes the edge set of the graph corresponding to , then, under our latent variable block model, if and only if is conditionally independent with given the other variables . Identifying nonzero elements in thus will reveal the conditional dependence structure of the blocks in our underlying network.
We will use the relative number of edges within each block, as estimates for the unobserved values . Let denote the total number of edges in the blocks.
3 Neighborhood selection
Here we discuss the identification of the nonzero elements in . We first assume that (1.1) - (1.3) holds with a known , and we write . We also assume that for all . Let denote iid observed networks with corresponding independent unobserved random vectors following our model. Let denote the blocks of the networks and be the node set. Assume and are mutually exclusive for so that . The number of possible edges within each block is for , and the number of possible edges between block and block is then for . We would like to point out again that the block membership variable is assumed to be known.
3.1 Controlling the estimation error
Given a network , let denote the number of edges within block in network . Natural estimates of and are
Let , and let be the minimum number of possible edges within a block, which of course measures the minimum blocksize.
This result tells us that, if we base our edge selection on , then, for large, we are close to a Gaussian model, and thus we can hope that our analysis is similar to that of a Gaussian graphical model. However, the approximation error has to be examined carefully. In order to do that, we first truncate the ’s, or, equivalently, the . For let
This truncation corresponds to
In what follows, we work with these truncated versions. Note that the dependence on is not indicated explicitly in this notation.
The magnitude of is important, as it reflects the accuracy of our estimates. This estimation error will crucially enter the performance of the graphical model based inverse covariance estimator. Under the latent variable block model, we have the following concentration result:
Note that the larger , the larger we need to choose both and . A large will cause problems, because the then might be too close to zero or one, causing challenges by definition of . A large makes our approximation less tight. Therefore we will have to control the size of (even if is known); see assumption A1.6 and B1.5.
To better understand the bound in (3.2), suppose that the number of blocks, , grows with such that for some While is allowed to grow with , we assume that is bounded. If we further choose for some , then, there exists , such that as ,
The last term on the right-hand side of (3.2) can be controlled similarly, by choosing . With these choices, we obtain an approximation error of by choosing the minimum blocksize large enough
3.2 Edge selection under uncertainty
In order to identify the nonzero elements in , we consider the graphical model in terms of the distribution of . Recall that , where each component of belongs to one of the blocks, thus
are not only the block labels, but also the node set in the underlying graph corresponding to the joint distribution of the. Using Gaussianity of , the set is the neighborhood of node of the associated graph. We follow the idea of Meinshausen and Bühlmann (2006)
to convert the problem into a series of linear regression problems: for each,
with the residual independent of . Let with , then the neighborhood can also be written as .
Meinshausen and Bühlmann (2006) consider the case of i.i.d. observations of . However, under the assumption of our model, we only have observations of . Under our assumptions, we have available independent realizations . Let be the -matrix with columns , . Similarly denote by the -matrix whose rows are independent copies of . Its column are vectors of independent observations of . That is, we can also write and . With this notation, for all ,
Let . The new matrix model can be written as
Moreover, for each , let , and , . We can write the above model as
3.3 Edge selection under uncertainty using the Lasso
As in Meinshausen and Bühlmann (2006), we define our Lasso estimate of as
The corresponding neighborhood estimate is
and the full edge set can be estimated by
In order to formulate statistical guarantees for the behavior of these estimates, we need the following assumptions. On top of the assumptions from Meinshausen and Bühlmann (2006), which are assumptions A1.1 - A1.5, we need further assumption on the underlying network.
Assumptions on the underlying Gaussian graph
High-dimensionality: There exists some so that for .
Nonsingularity: For all and , and there exists so that
There exists some so that for .
There exists some so that for all neighboring nodes and all ,
Magnitude of partial correlations: There exist a constant and some , so that for all ,
where is the partial correlation between and .
Neighborhood stability: There exists some so that for all with ,
Asymptotic upper bound on the mean: for .
Block size of networks: There exists constants and such that
The following theorem shows that, for proper choice of , our selection procedure finds the correct neighborhoods with high probability, provided is large enough.
Let assumptions A1 and A2 hold, and assume to be known. Let be such that
If, for some we have and 111For two sequence , of real numbers, we write for for some . , respectively, then there exists a constant such that
Assumption A2 says that the rate of increase of the minimum block size, which behaves like depends on the neighborhood size in our graphical model, and on the magnitude of the partial correlations in the graphical model. Roughly speaking, large neighborhoods (large ), and small partial correlations (small ), both require a large minimum block size (large ), which appears reasonable. The choice of a proper penalty parameter also depends on these two parameters.
3.4 Edge selection with a class of Dantzig-type selectors under uncertainty
In this section, we propose a novel class of Dantzig-type selectors that are iterated over all . For a linear model as in (3.3), i.e. for fixed , Candes and Tao (2007) introduced the Dantzig selector as a solution to the convex problem
where is a tuning parameter, and for a matrix , Under our model, we define the Dantzig selector as a solution of the minimization problem
with . Moreover, when considering (3.6), the idea of matrix uncertainty selector (MU-selector) comes into our mind. In our setting, we define an MU-selector, a generalization of the Dantzig selector under matrix uncertainty, as a solution of the minimization problem
with tuning parameters and . Note that our MU-selector deals with matrix uncertainty directly, rather than replacing by in the optimization equations like the Lasso or the Dantzig selector. What we mean by this is that our MU-selector is based on the structural equation (3.6), while both Lasso-based estimator and Dantzig selector are based on the linear model (3.3) with the unknown ’s simply replaced by their estimators.
Now we consider a class of Dantzig-type selectors, which can be considered as generalizations of the Dantzig selector and the MU-selector. For each , let the Dantzig-type selector be a solution of the optimization problem
where for each , is a set of functions such that
For each and , is an increasing function.
For all , is lower bounded by some constant , i.e, for all , there exists some so that
, i.e, there exist and , so that, for all ,
The Dantzig-type selector always exists, because the LSE defined as and belongs to the feasible set , where
for any . It may not be unique, however. We will show that, similar to Candes and Tao (2007) and Rosenbaum et al. (2010), under certain conditions, for large , there exists a constant such that the -norm of the difference between the Dantzig-type selector and the population quantity can be bounded by for all with large probability, where can be a constant large enough or of order . However, in general, sparseness cannot be guaranteed. This already has been observed in Rosenbaum et al. (2010). Therefore, we consider a thresholded version of the Dantzig-type selector, which can also significantly improve the accuracy of the estimation of the sign. Let be defined as
where is the indicator function, and is a sequence that satisfies and . The corresponding neighborhood selector is, for all defined as and the corresponding full edge selector is
Similar to the Section 3.3, in order to derive some consistency properties, we need assumption about the underlying Gaussian graph (B1), and the minimum block size in the underlying network (B2).
Assumptions on the underlying Gaussian graph
Dimensionality: There exists such that as .
Nonsingularity: For all and , and there exists so that
There exists , so that , as .
Magnitude of partial correlations: There exist a constant and , so that, for all , .
Asymptotic upper bound on the mean: for .
Block size of networks: with some for .
Here, the assumption on (assumption B2) is weaker than that assumed for the Lasso-based estimator (assumption A2). Similar remarks as given for A2 also apply to B2 (see Remark right below Theorem 3.1).
Assumptions A1 and B1 are similar but not equivalent: A1.1 and B1.1, A1.2 and B1.2, A1.4 and B1.4 respectively, are exactly the same. B1.2(a) is stronger than A1.3.(a), indicating the underlying graph should be even sparser than the graph in Section 3.3; assumption B1 does not have an analog to A1.3.(b) and A1.5.
Let assumptions B1 and B2 hold, and assume is known. Let be such that . If with some , and , there exists , so that
The choice of proper depends on the three parameters and . However, even the best scenario does not allow for the order which often can be found in the literature. This stems from the fact that we have to deal with an additional estimation error (coming in through the estimation of ).
Here we consider the case of an unknown coefficient vector or unknown mean . Recall that are i.i.d. Given , a natural way to estimate is via the MLE . Recall, however, that we only have estimates , available. Using the estimates , we estimate the underlying mean by . Moreover, we can estimate via , where is the Moore-Penrose pseudoinverse of (when , ). In order to derive consistency properties for , assumptions on the design matrix are needed. Theorem 3.3 below states asymptotic properties of the estimators.
Let assumptions A1.1 (or B1.1) and A1.6 (or B1.5) hold. If for some , then, for any , and fixed , there exists some so that
If, moreover, the design matrix is of full rank and the singular value of
If, moreover, the design matrix is of full rank and the singular value ofis asymptotically upper bounded, that is, and , then there exists so that
Next we consider the estimation of the edge set based on . We write and consider as the observations. We estimate the edge set in the same way as described in Section 3.3, but replace by and replace by in (3.7), where and is as above. The following consistency result parallels Theorem 3.1 and Theorem 3.2, but stronger assumption are needed to control the additional estimate error.
Let assumptions A1 - A2 hold with , and let be such that
Suppose that , for and that the penalty parameter satisfies for some . Then, there exists so that
Let assumptions B1 - B2 hold with . Let be such that . If for some , and