# Neighborhood selection with application to social networks

The topic of this paper is modeling and analyzing dependence in stochastic social networks. Using a latent variable block model allows the analysis of dependence between blocks via the analysis of a latent graphical model. Our approach to the analysis of the graphical model then is based on the idea underlying the neighborhood selection scheme put forward by Meinshausen and Bühlmann (2006). However, because of the latent nature of our model, estimates have to be used in lieu of the unobserved variables. This leads to a novel analysis of graphical models under uncertainty, in the spirit of Rosenbaum et al. (2010), or Belloni et al. (2017). Lasso-based selectors, and a class of Dantzig-type selectors are studied.

There are no comments yet.

## Authors

• 8 publications
• 10 publications
• ### Rejoinder: Latent variable graphical model selection via convex optimization

Rejoinder to "Latent variable graphical model selection via convex optim...
11/05/2012 ∙ by Venkat Chandrasekaran, et al. ∙ 0

• ### Bayesian Regularization for Functional Graphical Models

Graphical models, used to express conditional dependence between random ...
10/11/2021 ∙ by Jiajing Niu, et al. ∙ 0

• ### Thresholded Graphical Lasso Adjusts for Latent Variables: Application to Functional Neural Connectivity

In neuroscience, researchers seek to uncover the connectivity of neurons...
04/13/2021 ∙ by Minjie Wang, et al. ∙ 0

• ### A Sequence of Relaxations Constraining Hidden Variable Models

Many widely studied graphical models with latent variables lead to nontr...
06/08/2011 ∙ by Greg Ver Steeg, et al. ∙ 0

• ### A Flexible, Scalable and Efficient Algorithmic Framework for Primal Graphical Lasso

We propose a scalable, efficient and statistically motivated computation...
10/25/2011 ∙ by Rahul Mazumder, et al. ∙ 0

• ### Analysis of an interventional protein experiment using a vine copula based structural equation model

While there is considerable effort to identify signaling pathways using ...
11/19/2021 ∙ by Claudia Czado, et al. ∙ 0

• ### Learning Latent Permutations with Gumbel-Sinkhorn Networks

Permutations and matchings are core building blocks in a variety of late...
02/23/2018 ∙ by Gonzalo Mena, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

The study of random networks has been a topic of great interest in recent years, e.g. see Kolaczyk (2009) and Newman (2010). A network is defined as a structure composed of nodes and edges connecting nodes in various relationships. The observed network can be represented by an adjacency matrix , where is the total number of nodes within the network. For a binary relation network, as considered here, if there is an edge from node to node and otherwise. In the following we identify an adjacency matrix with the network itself.

Most relational phenomena are dependent phenomena, and dependence is often of substantive interest. Frank and Strauss (1986) and Wasserman and Pattison (1996) introduced exponential random graph models which allow the modelling of a wide range of dependences of substantive interest, including transitive closure. For such models, and the distribution of is assumed to follow the exponential family form where and are the sufficient statistics, e. g. the total number of edges. However, as mentioned in Schweinberger and Handcock (2014), exponential random graph models are lacking neighborhood structure, and that makes modelling dependencies challenging for such networks. Neighborhoods (or communities, blocks) are in general defined as a group of individuals (nodes), such that individuals within a group interact with each other more frequently than with those outside the group. Recently, Schweinberger and Handcock (2014) proposed the concept of local dependence in stochastic networks. This concept allows for dependence within neighborhoods, while different neighborhoods are independent.

In contrast to that, our work is considering dependence between blocks, while the connections within blocks are assumed independent. We also assume the blocks to be known. We then propose to analyze dependencies between blocks by means of graphical models. To this end, we assume an undirected network so that

 Yij|(P,z)∼Bernoulli(pz[i],z[j]), (1.1)

where indicate block memberships in one of blocks; govern the intensities of the connectivities within and between blocks, ; and is a symmetric matrix. We then put a Gaussian logistic model on the . More precisely, for the diagonal elements , assume that

 log(pk,k1−pk,k)=xTkβ+ϵk,1≤k≤K, (1.2)

where is a vector of given co-variables corresponding to block , and is the parameter vector. Furthermore, with

 ϵ∼N(0,Σ), (1.3)

where is an nonsingular covariance matrix. Each off-diagonal element () is assumed to be independent with all the other elements of . The latter assumption is made to simplify the exposition. A similar model can be found in Xu and Hero (2014).

The dependence between the induces dependence between blocks. We can thus analyze this induced dependence in our network model, by using methods from Gaussian graphical models, via selecting the zeros in the precision matrix . Adding dependencies between the with would increase the dimension of , and induce ‘second order dependencies’ to the network structure, namely, dependencies of block connections between different pairs of blocks.

It is crucial to observe that this Gaussian graphical model is defined in terms of the (or, more precisely, in terms of their log-adds ratios), and that these quantities obviously are not observed. Thus, they need to be estimated from our network data, and, to this end, we here assume the availability of iid observations of the network. This estimation, in turn, induces additional randomness to our analysis of the graphical model. We are therefore facing similar challenges as in the analysis of Gaussian graphical models under uncertainty. However, our situation is more complex, as will become clear below.

The methods for neighborhood selection considered here, are based on the column-wise methodology of Meinshausen and Bühlmann (2006). We apply this methodology (under uncertainty) to some known selection methods from the literature, thereby, adjusting these methods for the additional uncertainty. The selection methods considered here are (i) the graphical Lasso of Meinshausen and Bühlmann (2006), (ii) a class of Dantzig-type selectors, that includes the Dantzig selector of Candes and Tao (2007), and (iii) the matrix uncertainty selector of Rosenbaum et al. (2010). This will lead to ‘graphical’ versions of the respective procedures. The graphical Dantzig selector already has been studied in Yuan (2010), but without the additional uncertainty we are facing here. This leads to novel selection methodologies for which we derive statistical guarantees. We also present numerical studies to illustrate their finite sample performance.

More details on our latent variable block model is discussed in Section 2. Thereby we also introduce some basic notation. Section 3 introduces our neighborhood selection method-ologies, and presents results on their large sample performance. Tuning parameter selection is also discussed there. Numerical studies are presented in Section 4, and the proofs of our main results are in Appendix 5..

## 2 Some important preliminary facts

Let with

be the vector of log odds of the within-block connection probabilities, and let

be the design matrix. Our latent variable block model (1.1) - (1.3) says that . The dependence among the encoded in is propagated to the . Let , then the following fact holds.

###### Fact 2.1.

Under (1.1) - (1.3), we have if and only if, is independent of given the other variables , or just given .

In other words, if

 E={(k,l):dkl≠0,k≠l}

denotes the edge set of the graph corresponding to , then, under our latent variable block model, if and only if is conditionally independent with given the other variables . Identifying nonzero elements in thus will reveal the conditional dependence structure of the blocks in our underlying network.

We will use the relative number of edges within each block, as estimates for the unobserved values . Let denote the total number of edges in the blocks.

###### Fact 2.2.

Under (1.1) - (1.3), we have

 sign(σkl)=sign(Cov(Sk,Sl)).

For proofs of the two facts see Oliveira (2012) (page 13, Theorem 1.35) and Liu et al. (2009) (Section 3, Lemma 2), respectively.

## 3 Neighborhood selection

Here we discuss the identification of the nonzero elements in . We first assume that (1.1) - (1.3) holds with a known , and we write . We also assume that for all . Let denote iid observed networks with corresponding independent unobserved random vectors following our model. Let denote the blocks of the networks and be the node set. Assume and are mutually exclusive for so that . The number of possible edges within each block is for , and the number of possible edges between block and block is then for . We would like to point out again that the block membership variable is assumed to be known.

### 3.1 Controlling the estimation error

Given a network , let denote the number of edges within block in network . Natural estimates of and are

 ˜p(t)k,k=S(t)kmk % and˜η(t)k=log(˜p(t)k,k1−˜p(t)k,k) (3.1)

respectively.

Let , and let be the minimum number of possible edges within a block, which of course measures the minimum blocksize.

###### Fact 3.1.

Assume that is fixed. Then, under (1.1) - (1.3), we have for each ,

 ˜η(t)→N(Xβ,Σ)in distribution as mmin→∞.

This result tells us that, if we base our edge selection on , then, for large, we are close to a Gaussian model, and thus we can hope that our analysis is similar to that of a Gaussian graphical model. However, the approximation error has to be examined carefully. In order to do that, we first truncate the ’s, or, equivalently, the . For let

 ˆη(t)k=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩−Tif ˜η(t)k<−T˜η(t)kif |˜η(t)k|≤TTif ˜η(t)k>T.

This truncation corresponds to

 ˆp(t)k,k=⎧⎪ ⎪ ⎪⎨⎪ ⎪ ⎪⎩(1+eT)−1if ˜p(t)k,k<(1+eT)−1 ˜p(t)kif (1+eT)−1≤˜p(t)k,k≤(1+e−T)−1 (1+e−T)−1if ˜p(t)k,k>(1+e−T)−1.

In what follows, we work with these truncated versions. Note that the dependence on is not indicated explicitly in this notation.

The magnitude of is important, as it reflects the accuracy of our estimates. This estimation error will crucially enter the performance of the graphical model based inverse covariance estimator. Under the latent variable block model, we have the following concentration result:

###### Lemma 3.1.

Let and . Then, under (1.1) - (1.3), we have, for , and , that

 P(max1≤k≤K1≤t≤n|ˆη(t)k−η(t)k|<8MeL√log(nK)mmin)≥1−√2πnKσmin{L,T}−μBexp(−(min{L,T}−μB)22σ2)−(1nK)2M2−1. (3.2)
###### Remark.

Note that the larger , the larger we need to choose both and . A large will cause problems, because the then might be too close to zero or one, causing challenges by definition of . A large makes our approximation less tight. Therefore we will have to control the size of (even if is known); see assumption A1.6 and B1.5.

To better understand the bound in (3.2), suppose that the number of blocks, , grows with such that for some While is allowed to grow with , we assume that is bounded. If we further choose for some , then, there exists , such that as ,

The last term on the right-hand side of (3.2) can be controlled similarly, by choosing . With these choices, we obtain an approximation error of by choosing the minimum blocksize large enough

 m−1min=O(n−2p(logn)−2e−2L).

### 3.2 Edge selection under uncertainty

In order to identify the nonzero elements in , we consider the graphical model in terms of the distribution of . Recall that , where each component of belongs to one of the blocks, thus

are not only the block labels, but also the node set in the underlying graph corresponding to the joint distribution of the

. Using Gaussianity of , the set is the neighborhood of node of the associated graph. We follow the idea of Meinshausen and Bühlmann (2006)

to convert the problem into a series of linear regression problems: for each

,

 ηa−μa=∑b∈\widebarK∖{a}θab(ηb−μb)+va

with the residual independent of . Let with , then the neighborhood can also be written as .

Meinshausen and Bühlmann (2006) consider the case of i.i.d. observations of . However, under the assumption of our model, we only have observations of . Under our assumptions, we have available independent realizations . Let be the -matrix with columns , . Similarly denote by the -matrix whose rows are independent copies of . Its column are vectors of independent observations of . That is, we can also write and . With this notation, for all ,

 ηa−μa1n=∑b∈\widebarKθab(ηb−μb1n)+va. (3.3)

Let . The new matrix model can be written as

 (ˆH−1nμT)=(H−1nμT)+R (3.4) (η(t)−μ)∼N(0,Σ)i.% i.d. for t=1,⋯,n. (3.5)

Moreover, for each , let , and , . We can write the above model as

 ˆηa−μa1n=(H−a−1nμT−a)θa−a+ξa (3.6) (ˆH−a−1nμT−a)=(H−a−1nμT−a)+R−a,

where and . Note that (3.6) has a similar structure as the model considered by Rosenbaum et al. (2010). The important difference is that in our situation, we do not have independence of and .

### 3.3 Edge selection under uncertainty using the Lasso

As in Meinshausen and Bühlmann (2006), we define our Lasso estimate of as

 ˆθa,λ,lasso=argminθ:θa=0(n−1∥(ˆηa−μa1n)−(ˆH−1nμT)θ∥22+λ∥θ∥1). (3.7)

The corresponding neighborhood estimate is

 ˆneλ,lassoa={b∈\widebarK:ˆθa,λ,lassob≠0};

and the full edge set can be estimated by

 ˆEλ,∧,lasso={(a,b):a∈ˆneλ,lassob and b∈ˆneλ,lassoa}

or

 ˆEλ,∨,lasso={(a,b):a∈ˆneλ,lassob or b∈ˆneλ,lassoa}.

In order to formulate statistical guarantees for the behavior of these estimates, we need the following assumptions. On top of the assumptions from Meinshausen and Bühlmann (2006), which are assumptions A1.1 - A1.5, we need further assumption on the underlying network.

• Assumptions on the underlying Gaussian graph

1. High-dimensionality: There exists some so that for .

2. Nonsingularity: For all and , and there exists so that

 Var(ηa|η\widebarK∖{a})≥υ2.
3. Sparsity

1. There exists some so that for .

2. There exists some so that for all neighboring nodes and all ,

 ∥θa,neb∖{a}∥1≤ϑ.
4. Magnitude of partial correlations: There exist a constant and some , so that for all ,

 |πab|≥cn−(1−ξ)/2,

where is the partial correlation between and .

5. Neighborhood stability: There exists some so that for all with ,

 |Sa(b)|<ϱ

where

 Sa(b)=∑k∈neasign(θa,neak)θb,neak.
6. Asymptotic upper bound on the mean: for .

• Block size of networks: There exists constants and such that

 mmin(n)≥c⋅nνfor n≥n0,

where

The following theorem shows that, for proper choice of , our selection procedure finds the correct neighborhoods with high probability, provided is large enough.

###### Theorem 3.1.

Let assumptions A1 and A2 hold, and assume to be known. Let be such that

 0

If, for some we have and 111For two sequence , of real numbers, we write for for some . , respectively, then there exists a constant such that

 P(ˆEλ,lasso=E)=1−O(exp(−c(logn)2))as n→∞.
###### Remark.

Assumption A2 says that the rate of increase of the minimum block size, which behaves like depends on the neighborhood size in our graphical model, and on the magnitude of the partial correlations in the graphical model. Roughly speaking, large neighborhoods (large ), and small partial correlations (small ), both require a large minimum block size (large ), which appears reasonable. The choice of a proper penalty parameter also depends on these two parameters.

### 3.4 Edge selection with a class of Dantzig-type selectors under uncertainty

In this section, we propose a novel class of Dantzig-type selectors that are iterated over all . For a linear model as in (3.3), i.e. for fixed , Candes and Tao (2007) introduced the Dantzig selector as a solution to the convex problem

 min{∥θ∥1:θ∈RK(n),θa=0 and ∣∣1n(H−a−1nμT−a)T((ηa−μa1n)−(H−1nμT)θ)∣∣∞≤λ},

where is a tuning parameter, and for a matrix , Under our model, we define the Dantzig selector as a solution of the minimization problem

 min{∥θ∥1:θ∈RK(n),θa=0 and ∣∣1n(ˆH−a−1nμT−a)T((ˆηa−μa1n)−(ˆH−1nμT)θ)∣∣∞≤λ} (3.8)

with . Moreover, when considering (3.6), the idea of matrix uncertainty selector (MU-selector) comes into our mind. In our setting, we define an MU-selector, a generalization of the Dantzig selector under matrix uncertainty, as a solution of the minimization problem

 min{∥θ∥1:θ∈RK(n),θa=0 and∣∣1n(ˆH−a−1nμT−a)T((ˆηa−μa1n)−(ˆH−1nμT)θ)∣∣∞≤μ∥θ∥1+λ} (3.9)

with tuning parameters and . Note that our MU-selector deals with matrix uncertainty directly, rather than replacing by in the optimization equations like the Lasso or the Dantzig selector. What we mean by this is that our MU-selector is based on the structural equation (3.6), while both Lasso-based estimator and Dantzig selector are based on the linear model (3.3) with the unknown ’s simply replaced by their estimators.

Now we consider a class of Dantzig-type selectors, which can be considered as generalizations of the Dantzig selector and the MU-selector. For each , let the Dantzig-type selector be a solution of the optimization problem

 min{∥θ∥1:θ∈RK(n):θa=0 and ∣∣1n(ˆH−a−1nμT−a)T((ˆηa−μa1n)−(ˆH−1nμT)θ)∣∣∞≤λa,n(∥θ∥1)}, (3.10)

where for each , is a set of functions such that

• For each and , is an increasing function.

• For all , is lower bounded by some constant , i.e, for all , there exists some so that

 mina∈\widebarKminθ∈RK:θa=0λa,n(∥θ∥1)≥λn.
• , i.e, there exist and , so that, for all ,

 λa,n(∥θa∥1)≤unn−1−ξ2(logn)−1, for all a∈\widebarK.

The Dantzig-type selector always exists, because the LSE defined as and belongs to the feasible set , where

 Θa={θ∈RK(n):θa=0 and ∣∣1n(ˆH−a−1nμT−a)T((ˆηa−μa1n)−(ˆH−1nμT)θ)∣∣∞≤λa,n(∥θ∥1)}

for any . It may not be unique, however. We will show that, similar to Candes and Tao (2007) and Rosenbaum et al. (2010), under certain conditions, for large , there exists a constant such that the -norm of the difference between the Dantzig-type selector and the population quantity can be bounded by for all with large probability, where can be a constant large enough or of order . However, in general, sparseness cannot be guaranteed. This already has been observed in Rosenbaum et al. (2010). Therefore, we consider a thresholded version of the Dantzig-type selector, which can also significantly improve the accuracy of the estimation of the sign. Let be defined as

 (3.11)

where is the indicator function, and is a sequence that satisfies and . The corresponding neighborhood selector is, for all defined as and the corresponding full edge selector is

 ˆEλ,∧,ds={(a,b):a∈ˆneλ,dsb and b∈ˆneλ,dsa}

or

 ˆEλ,∨,ds={(a,b):a∈ˆneλ,dsb or b∈ˆneλ,dsa}.

Similar to the Section 3.3, in order to derive some consistency properties, we need assumption about the underlying Gaussian graph (B1), and the minimum block size in the underlying network (B2).

• Assumptions on the underlying Gaussian graph

1. Dimensionality: There exists such that as .

2. Nonsingularity: For all and , and there exists so that

 Var(ηa|η\widebarK∖{a})≥υ2.
3. Sparsity

1. There exists , so that , as .

2. as .

4. Magnitude of partial correlations: There exist a constant and , so that, for all , .

5. Asymptotic upper bound on the mean: for .

• Block size of networks: with some for .

Here, the assumption on (assumption B2) is weaker than that assumed for the Lasso-based estimator (assumption A2). Similar remarks as given for A2 also apply to B2 (see Remark right below Theorem 3.1).

Assumptions A1 and B1 are similar but not equivalent: A1.1 and B1.1, A1.2 and B1.2, A1.4 and B1.4 respectively, are exactly the same. B1.2(a) is stronger than A1.3.(a), indicating the underlying graph should be even sparser than the graph in Section 3.3; assumption B1 does not have an analog to A1.3.(b) and A1.5.

###### Theorem 3.2.

Let assumptions B1 and B2 hold, and assume is known. Let be such that . If with some , and , there exists , so that

 P(ˆEλ,ds=E)=1−O(exp(−c(logn)2))as n→∞.
###### Remark.

The choice of proper depends on the three parameters and . However, even the best scenario does not allow for the order which often can be found in the literature. This stems from the fact that we have to deal with an additional estimation error (coming in through the estimation of ).

### 3.5 Extension

Here we consider the case of an unknown coefficient vector or unknown mean . Recall that are i.i.d. Given , a natural way to estimate is via the MLE . Recall, however, that we only have estimates , available. Using the estimates , we estimate the underlying mean by . Moreover, we can estimate via , where is the Moore-Penrose pseudoinverse of (when , ). In order to derive consistency properties for , assumptions on the design matrix are needed. Theorem 3.3 below states asymptotic properties of the estimators.

###### Theorem 3.3.

Let assumptions A1.1 (or B1.1) and A1.6 (or B1.5) hold. If for some , then, for any , and fixed , there exists some so that

 P(nb−γ2∥\widebarˆη−μ∥2>δ)=O(exp(−c(logn)2))as n→∞.

If, moreover, the design matrix is of full rank and the singular value of

is asymptotically upper bounded, that is, and , then there exists so that

 P(nb−γ2∥ˆβ−β∥2>δ)=O(exp(−c(logn)2))% as n→∞.

Next we consider the estimation of the edge set based on . We write and consider as the observations. We estimate the edge set in the same way as described in Section 3.3, but replace by and replace by in (3.7), where and is as above. The following consistency result parallels Theorem 3.1 and Theorem 3.2, but stronger assumption are needed to control the additional estimate error.

###### Corollary 3.1.

Let assumptions A1 - A2 hold with , and let be such that

 max{κ+1/2,3−ξ3,4−ξ−ν3,2+2κ−ν2}<ϵ<ξ.

Suppose that , for and that the penalty parameter satisfies for some . Then, there exists so that

 P(ˆEλ,lasso=E)=1−O(exp(−c(logn)2))as n→∞.
###### Corollary 3.2.

Let assumptions B1 - B2 hold with . Let be such that . If for some , and