# A network Poisson model for weighted directed networks with covariates

The edges in networks are not only binary, either present or absent, but also take weighted values in many scenarios (e.g., the number of emails between two users). The covariate-p_0 model has been proposed to model binary directed networks with the degree heterogeneity and covariates. However, it may cause information loss when it is applied in weighted networks. In this paper, we propose to use the Poisson distribution to model weighted directed networks, which admits the sparsity of networks, the degree heterogeneity and the homophily caused by covariates of nodes. We call it the network Poisson model. The model contains a density parameter μ, a 2n-dimensional node parameter θ and a fixed dimensional regression coefficient γ of covariates. Since the number of parameters increases with n, asymptotic theory is nonstandard. When the number n of nodes goes to infinity, we establish the ℓ_∞-errors for the maximum likelihood estimators (MLEs), θ̂ and γ̂, which are O_p( (log n/n)^1/2 ) for θ̂ and O_p( log n/n) for γ̂, up to an additional factor. We also obtain the asymptotic normality of the MLE. Numerical studies and a data analysis demonstrate our theoretical findings. ) for bθ and Op(log n/n) for bγ, up to an additional factor. We also obtain the asymptotic normality of the MLE. Numerical studies and a data analysis demonstrate our theoretical findings.

## Authors

• 1 publication
• 5 publications
06/07/2021

### A sparse p_0 model with covariates for directed networks

We are concerned here with unrestricted maximum likelihood estimation in...
09/27/2012

### Sparse Ising Models with Covariates

There has been a lot of work fitting Ising models to multivariate binary...
06/07/2018

### Undirected network models with degree heterogeneity and homophily

The degree heterogeneity and homophily are two typical features in netwo...
04/29/2022

### Incorporating Actor Heterogeneity into Large Network Models through Variational Approximations

The analysis of network data has gained considerable interest in the rec...
02/18/2020

### Latent Poisson models for networks with heterogeneous density

Empirical networks are often globally sparse, with a small average numbe...
03/21/2020

### Weighted directed networks with a differentially private bi-degree sequence

The p_0 model is an exponential random graph model for directed networks...
08/21/2021

### A Sparse Random Graph Model for Sparse Directed Networks

An increasingly urgent task in analysis of networks is to develop statis...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Background

Many complex interactive behaviors can be conveniently represented as networks or graphs, where nodes denotes entities depending on different contexts and edges denote interactions between entities. Examples include friendships between people in social networks, emails between users in email networks, hyperlinks between internet webs in hyperlink networks, citations between papers and authors in citation networks, following behaviors between blogs in social media such as Twitter, chemical reactions between proteins in biological networks, to just name a few. Many statistical methodologies have been developed to analyze network data; see Goldenberg et al. (2010), Fienberg (2012), Robins et al. (2007a), Robins et al. (2007b), Bhattacharyya and Bickel (2016) and Kim et al. (2018) for reviews and references therein. The monograph by Kolaczyk (2009) provides a comprehensive introduction on statistical analysis of network data.

Networks could be undirected or directed, weighted or unweighted. Most realistic networks exhibit three typical features including sparsity, degree heterogeneity and homophily. Sparsity means that the density of networks is small, in which many nodes do not have direct connections. The degree heterogeneity describes such phenomenon that degrees of nodes vary greatly. Some nodes may have many connections while others may have relatively less connections. The homophily characterizes the tendency that individuals with same attributes such as age and sex are easier to form connections. For example, the directed friendship network in Lazega,E. (2001) displays a strong homophily effect as shown in Yan et al. (2019).

One of the most popular models to model the degree heterogeneity in undirected networks is the -model [Chatterjee et al. (2011)] that assigns one degree parameter to each node. It is an undirected version of the well-known model in Holland and Leinhardt (1981). Asymptotic theory is nonstandard because of an increasing dimension of parameter space. Exploring theoretical properties in the -model and its variants has received wide attentions in recent years [Chatterjee et al. (2011); Perry and Wolfe (2012); Hillar and Wibisono (2013); Yan and Xu (2013); Rinaldo et al. (2013); Yan et al. (2016b); Graham (2017); Yan (2018); Chen et al. (2021)]. In particular, Chatterjee et al. (2011) proved the uniform consistency of the maximum likelihood estimator (MLE); Yan and Xu (2013) derived its asymptotic normality by approximating the inverse of the Fisher information matrix. In the directed case, Yan et al. (2016a) proved the consistency and asymptotic normality of the MLE in the model that is an exponential random graph model with out-degree and in-degree sequences as sufficient statistics. In the framework of non-exponential random graph model, Wang et al. (2020)

proposed Probit model to model the degree heterogeneity of the directed networks and proved the consistency and asymptotic normality of the moment estimator. Besides,

Wang et al. (2021) use the probit distribution to model the degree heterogeneity of the affiliation networks and established the uniform consistency and the asymptotic normality of the moment estimator.

Yan et al. (2019) proposed the covariate- model to model the aforementioned three network features in unweighted directed networks. They established the consistency and asymptotic normality of the restricted MLE by using the restricted maximum likelihood method because of the challenge of exploring asymptotic theory [Fienberg (2012); Graham (2017)]. Wang (2021) further incorporated a sparsity parameter to the covariate- model to allow the sparsity and developed the unrestricted maximum likelihood theory including the consistency and asymptotic normality of the MLE. However, the covariate- model is only designed to unweighted directed networks. When we apply it to weighted networks, we need to neglect the weight information (i.e., treating all positive weight value as “1” and others as “0”). This may cause the information loss. As one example, all covariates are not significant when we apply the covariate- model to fit the well-known Enran email network [Cohen (2004)]; see Table 5. This motivates the present paper. We extend the covariate- model to weighted networks by using the Poisson distribution to model weighted edges.

### 1.2 The Model

We now introduce our model. Consider a weighted directed graph on () nodes labeled by . Let be the adjacency matrix , where is the weight of the directed edge from head node to tail node . We do not consider self-loops here, i.e., . Let be the out-degree of vertex and be the out-degree sequence of the graph G. Similarly, define as the in-degree of vertex and as the in-degree sequence. The pair or

is the bi-degree sequence. We assume that all edges are independently distributed as Poisson random variables with the probability distributions:

 P(aij=k)=ek(μ+αi+βj+Z⊤ijγ)k!exp(−eμ+αi+βj+Z⊤ijγ),1≤i≠j≤n. (1)

The parameter quantifies the network sparsity. quantifies the effect of establishing outbound edges from sender while quantifies the effect of attracting inbound edges from receiver

. The vector parameter

is a -dimensional regression coefficient for the covariate . We will call the above model the network Poisson model hereafter.

The covariate is either a vector associated with edges or a function of node-specific covariates. If denotes a -dimensional vector of node-level attributes, then these node-level attributes can be used to construct a vector , where is a function of its arguments. As one example, if we let being equal to , then it measures the similarity between nodes and .

Motivated by techniques for the analysis of the unrestricted likelihood inference in the covariate- model for directed graphs in Wang (2021), we generalize their approaches to weighted directed graphs here. When the number of nodes goes to infinity, we derive the -error between the MLE and its true value . This is done by using a two-stage Newton process that first finds the error bound between and with a fixed and then derives the error bound between and . They are for and for , up to an additional factor on parameters. Further, we derive the asymptotic normality of the MLE. The asymptotic distribution of has a bias term while does not have such a bias, which collaborates the findings in Yan et al. (2019). This is because of different convergence rates for and . Wide simulations are carried out to demonstrate our theoretical findings. In our simulations, this bias is very small, even could be neglected, which is different from the significant bias effect in Yan et al. (2019) and Wang (2021). This may be due to that the weighted values attenuate the bias. The application to the Enran email data set illustrates the utility of the proposed model.

For the remainder of the paper, we proceed as follows. In Section 2, we give the maximum likelihood estimation. In section 3, we present theoretical properties of the MLE. Numerical studies are presented in Section 4. We provide further discussion in Section 5. The proofs of theorems are relegated to the Appendix. All supported lemmas and detailed calculations are in the Supplementary Material.

## 2 Maximum Likelihood Estimation

Let and . If one transforms to

, then the probability in (

1) does not change. For the identifiability of the model, several possible restriction conditions immediately appear in our mind, including , , or , , or , . When we , , it will keep the density parameter .

The logarithm of the likelihood function is

 ℓ(μ,α,β,γ)=∑i≠j(aij(μ+αi+βj+Z⊤ijγ)−eμ+αi+βj+Z⊤ijγ−log(aij!)) (2)

The notation is a shorthand for . The score equations for the vector parameters , are easily seen as

 n∑i,j=1,i≠jaij=∑i≠jeμ+αi+βj+Z⊤ijγ,n∑i,j=1,i≠jaijZij=∑i≠jZijeμ+αi+βj+Z⊤ijγ,di=n∑j=1,j≠ieμ+αi+βj+Z⊤ijγ,i=1,…,n,bj=n∑i=1,i≠jeμ+αi+βj+Z⊤ijγ,j=1,…,n. (3)

Under the restriction and , the first equation and the last equation with in the above system of equations will be excluded. Although there are a total of equations, the number of minimal equations is only .

The MLE of the parameter vector is the solution of the above equations if it exist. The Newton-Raphson algorithm can be used to solve the above equations. We can also simply use the function “glm” in R language to calculate the solution.

## 3 Theoretical Properties

Let and , then . With this reparameterized technique, we could set for convenience. Further, we set jointly as the identification condition in this section. The asymptotic properties of in the restriction is the same as those of in the restriction .

Notations. Let be the real domain. For a subset , let and denote the interior and closure of , respectively. For a vector , denote by for a general norm on vectors with the special cases and for the - and -norm of respectively. When is fixed, all norms on vectors are equivalent. Let be an -neighborhood of . For an matrix , let denote the matrix norm induced by the -norm on vectors in , i.e.,

 ∥J∥∞=maxx≠0∥Jx∥∞∥x∥∞=max1≤i≤nn∑j=1|Ji,j|,

and be a general matrix norm. Define the matrix maximum norm: . The notation denotes the summarization over all and is a shorthand for . The notation or means that there exists a constant such that .

For convenience of our theoretical analysis, define . We use the superscript “*” to denote the true parameter under which the data are generated. When there is no ambiguity, we omit the super script “*”. Further, define

 πij:=αi+βj+Z⊤ijγ,  λ(x):=ex.

Write , and as the first, second and third derivative of on , respectively. Direct calculations give that . Let and be two small positive number. When , , we have

 |αi+βj+ZTijγ|⩽maxi≠j∣∣α∗i+β∗j∣∣+2ϵn1+pq∥γ∗∥∞+pqϵn2:=ρn

where , and we assume that is a constant. It is not difficult to verify, When and ,

 e−ρn⩽λ(πij)=λ′(πij)=λ′′(πij)=λ′′′(πij)⩽eρn. (4)

When causing no confusion, we will simply write stead of for shorthand, where

 λij(θ,γ)=eαi+βj+Z⊤ijγ=λ(πij).

Write the partial derivative of a function vector on as

 ∂F(ˆθ,ˆγ)∂θ⊤=∂F(θ,γ)∂θ⊤∣∣∣θ=ˆθ,γ=ˆγ.

Throughout the remainder of this paper, we make the following assumption.

###### Assumption 1.

Assume that , the dimension of , is fixed and that the support of is , where is a compact subset of .

The above assumption is made in Graham (2017) and Yan et al. (2019). In many real applications, the attributes of nodes have a fixed dimension and is bounded. As one example, if nodal variables are indicator such as sex, then the assumption holds. If is not bounded, we could make the transform .

### 3.1 Consistency

In order to establish asymptotic properties of , we define a system of functions

 Fi(θ,γ) =n∑j=1,j≠iλij(θ,γ)−di, i=1,…,n, Fn+j(θ,γ) =n∑i=1,i≠jλij(θ,γ)−bj, j=1,…,n, F(θ,γ) =(F1(θ,γ),…,Fn(θ,γ),Fn+1(θ,γ),…,F2n−1(θ,γ))⊤,

which are based on the score equations for . Define be the value of when is fixed. Let be a solution to if it exists. Correspondingly, we define two functions for exploring the asymptotic behaviors of the estimator of :

 Q(θ,γ)=n∑i,j=1;i≠jZij(λij(θ,γ)−aij), (5) Qc(γ)=n∑i,j=1;i≠jZij(λij(ˆθγ,γ)−aij). (6)

could be viewed as the concentrated or profile function of in which the degree parameter is profiled out. It is clear that

 F(ˆθ,ˆγ)=0,  Fγ(ˆθγ)=0,  Q(ˆθ,ˆγ)=0,  Qc(ˆγ)=0.

We first present the error bound between with and . This is proved by constructing the Newton iterative sequence with initial value and obtaining a geometrically fast convergence rate of the iterative sequence, where and . Details are given in the Supplementary Material. The error bound is stated below.

###### Lemma 1.

If and , then as goes to infinity, with probability at least , exists and satisfies

 ∥ˆθγ−θ∗∥∞=Op(e7ρn√lognn)=op(1).

Further, if exists, then it is unique.

By the compound function derivation law, we have

 0=∂Fγ(ˆθγ)∂γ⊤=∂F(ˆθγ,γ)∂θ⊤∂ˆθγ∂γ⊤+∂F(ˆθγ,γ)∂γ⊤, (7) ∂Qc(γ)∂γ⊤=∂Q(ˆθγ,γ)∂θ⊤∂ˆθγ∂γ⊤+∂Q(ˆθγ,γ)∂γ⊤. (8)

By solving in (7) and substituting it into (8), we get the Jacobian matrix :

 ∂Qc(γ)∂γ⊤=∂Q(ˆθγ,γ)∂γ⊤−∂Q(ˆθγ,γ)∂θ⊤[∂F(ˆθγ,γ)∂θ⊤]−1∂F(ˆθγ,γ)∂γ⊤. (9)

The asymptotic behavior of crucially depends on the Jacobian matrix . Since does not have a closed form, conditions that are directly imposed on are not easily checked. To derive feasible conditions, we define

 H(θ,γ)=∂Q(θ,γ)∂γ⊤−∂Q(θ,γ)∂θ⊤[∂F(θ,γ)∂θ⊤]−1∂F(θ,γ)∂γ⊤, (10)

which is a general form of . Because is the Fisher information matrix of the profiled log-likelihood , we assume that it is positively definite. Otherwise, the network Poisson model will be ill-conditioned. When , we have the equation:

 1n2H(θ,γ∗)=1n2H(θ∗,γ∗)+o(1), (11)

whose proof is omitted. Note that the dimension of is fixed and every its entry is a sum of terms. We assume that there exists a number such that

 supθ∈B(θ∗,ϵn1)∥H−1(θ,γ∗)∥∞≤κnn2. (12)

If converges to a constant matrix, then is bounded. Because is positively definite,

 κn=√p×supθ∈B(θ∗,ϵn1)1/λmin(θ),

where

is the smallest eigenvalue of

. Now we formally state the consistency result.

###### Theorem 1.

If , then the MLE exists with probability at least , and is consistent in the sense that

 ∥ˆγ−γ∗∥∞ =Op(κne21ρnlognn)=op(1) ∥ˆθ−θ∗∥∞ =Op(e7ρn√lognn)=op(1).

Further, if exists, it is unique.

From the above theorem, we can see that has a convergence rate and has a convergence rate , up to an additional factor. If and are constants, then and are constants such that the condition in Theorem 1 holds.

### 3.2 Asymptotic normality of ˆθ and ˆγ

The asymptotic distribution of depends crucially on the inverse of the Jacobian matrix , which generally does not have a closed form. In order to characterize this matrix, we introduce a general class of matrices that encompass the Fisher matrix. Given two positive numbers and with , we say the matrix belongs to the class if the following holds:

 0≤vi,i−∑2n−1j=n+1vi,j≤M,  i=1,…,2n−1,vi,j=0,  i,j=1,…,n, i≠j,vi,j=0,  i,j=n+1,…,2n−1, i≠j,m≤vi,j=vj,i≤M,  i=1,…,n, j=n+1,…,2n, j≠n+i, (13)

Clearly, if , then is a diagonally dominant, symmetric nonnegative matrix. It must be positively definite. The definition of here is wider than that in Yan et al. (2016a), where the diagonal elements are equal to the row sum excluding themselves for some rows. One can easily show that belongs to this matrix class. With some abuse of notation, we use to denote the Fisher information matrix for the vector parameter .

To describe the exact form of elements of , for , , we define

 uij=λ(πij),  uii=0,  ui⋅=n∑j=1uij,  u⋅j=n∑i=1uij.

Note that

is the variance of

. Then the elements of are

 vij=⎧⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪⎩ui⋅,i=j=1,…,n,ui,j−n,i=1,…,n,j=n+1,…,2n−1,j≠i+n,uj,i−n,i=n+1,…,2n,j=1,…,n−1,j≠i−n,u⋅j−n,i=j=n+1,…,2n−1,0,others.

Yan et al. (2016a) proposed to approximate the inverse by the matrix , which is defined as

 si,j=⎧⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎨⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎩δi,jui⋅+1u⋅n,i,j=1,…,n,−1u⋅n,i=1,…,n,  j=n+1,…,2n−1,−1u⋅n,i=n+1,…,2n,  j=1,…,n−1,δi,ju⋅(j−n)+1u⋅n,i,j=n+1,…,2n−1, (14)

where when and when .

We derive the asymptotic normality of by representing as a function of and with an explicit expression and a remainder term. This is done via applying a second Taylor expansion to and showing various remainder terms asymptotically neglect.

###### Theorem 2.

If , then for any fixed , as , the vector consisting of the first elements of is asymptotically multivariate normal with mean and covariance matrix given by the upper left block of defined in (14).

###### Remark 1.

By Theorem 2, for any fixed , as , the asymptotic variance of is , whose magnitude is between and .

###### Remark 2.

Under the restriction

, the central limit theorem for the MLE

is stated as: , converges in distribution to the standard normality.

Now, we present the asymptotic normality of . Let

 V=∂F(θ∗,γ∗)∂θ⊤,  Vγγ=∂Q(θ∗,γ∗)∂γ⊤,  Vθγ=∂F(θ∗,γ∗)∂γ⊤.

Following Amemiya (1985) (p. 126), the Hessian matrix of the concentrated log-likelihood function is . To state the form of the limit distribution of , define

 In(γ∗)=1(n−1)n(Vγγ−V⊤θγV−1Vθγ). (15)

Assume that the limit of exists as goes to infinity, denoted by . Then, we have the following theorem.

###### Theorem 3.

If , then as goes to infinity, the -dimensional vector

is asymptotically multivariate normal distribution with mean

and covariance matrix , where and is the bias term:

 B∗=limn→∞12√N[n∑i=1∑nj=1,j≠iλ′′(π∗ij)Zij∑nj=1,j≠iλ′(π∗ij)+n∑j=1∑ni=1,i≠jλ′′(π∗ij)Zij∑ni=1,i≠jλ′(π∗ij)].
###### Remark 3.

The limiting distribution of is involved with a bias term

 μ∗=I−1∗(γ)B∗√n(n−1).

Since the MLE

is not centered at the true parameter value, the confidence intervals and the p-values of hypothesis testing for

may not achieve the nominal level without bias-correction under the null: . This is referred to as the so-called incidental parameter problem in econometric literature [Neyman and Scott (1948)]. The produced bias is due to different convergence rates of and . As discussed in Yan et al. (2019), we could use the analytical bias correction formula: , where and are the estimates of and by replacing and in their expressions with their MLEs and , respectively. In the simulation in next section, we can see that there is a little difference between uncorrected estimates and bias-corrected estimates, which is different from the covariate- model for binary directed networks in Yan et al. (2019). This may be due to that the infinitely weighted values for edges attenuate the bias effect.

## 4 Numerical Studies

In this section, we evaluate the asymptotic results of the MLEs for model (1) through simulation studies and a real data example.

### 4.1 Simulation studies

Similar to Yan et al. (2019), the parameter values take a linear form. Specifically, we set for and let , for simplicity. Note that there are nodes in the simulations. We considered four different values for as . By allowing the true values of and to grow with , we intended to assess the asymptotic properties under different asymptotic regimes. Being different from Yan et al. (2019) with , we set here. The first element of was generated from standard normal distribution; the second element formed by letting , where is the first entry of the -dimensional node-specific covariate is independently generated from a distribution; and the third element , where follows a discrete distribution taking values and with probabilities and . The density parameter is .

Note that by Theorems 2, , , and are all asymptotically distributed as standard normal random variables, where is the estimate of by replacing with . We record the coverage probability of the 95% confidence interval, the length of the confidence interval, and the frequency that the MLE does not exist. The results for , and are similar, thus only the results of are reported. We simulated networks with or . Each simulation is repeated times.

Table 1 reports the coverage frequencies for and the length of the confidence interval. As we can see, the length of the confidence interval increases with and decreases with , which qualitatively agrees with the theory. All simulated coverage frequencies are all close to the nominal level. This indicates that the conditions in the theorems may be relaxed greatly.

Table 2 reports simulation results for the estimate and bias correction estimate at the nominal level

. The coverage frequencies for the uncorrected estimate are all close to those for corrected estimates. All simulated coverage frequencies achieves the nominal level. As expected, the standard error increases with

and decreases with .