1 Introduction
1.1 Background
Many complex interactive behaviors can be conveniently represented as networks or graphs, where nodes denotes entities depending on different contexts and edges denote interactions between entities. Examples include friendships between people in social networks, emails between users in email networks, hyperlinks between internet webs in hyperlink networks, citations between papers and authors in citation networks, following behaviors between blogs in social media such as Twitter, chemical reactions between proteins in biological networks, to just name a few. Many statistical methodologies have been developed to analyze network data; see Goldenberg et al. (2010), Fienberg (2012), Robins et al. (2007a), Robins et al. (2007b), Bhattacharyya and Bickel (2016) and Kim et al. (2018) for reviews and references therein. The monograph by Kolaczyk (2009) provides a comprehensive introduction on statistical analysis of network data.
Networks could be undirected or directed, weighted or unweighted. Most realistic networks exhibit three typical features including sparsity, degree heterogeneity and homophily. Sparsity means that the density of networks is small, in which many nodes do not have direct connections. The degree heterogeneity describes such phenomenon that degrees of nodes vary greatly. Some nodes may have many connections while others may have relatively less connections. The homophily characterizes the tendency that individuals with same attributes such as age and sex are easier to form connections. For example, the directed friendship network in Lazega,E. (2001) displays a strong homophily effect as shown in Yan et al. (2019).
One of the most popular models to model the degree heterogeneity in undirected networks is the model [Chatterjee et al. (2011)] that assigns one degree parameter to each node. It is an undirected version of the wellknown model in Holland and Leinhardt (1981). Asymptotic theory is nonstandard because of an increasing dimension of parameter space. Exploring theoretical properties in the model and its variants has received wide attentions in recent years [Chatterjee et al. (2011); Perry and Wolfe (2012); Hillar and Wibisono (2013); Yan and Xu (2013); Rinaldo et al. (2013); Yan et al. (2016b); Graham (2017); Yan (2018); Chen et al. (2021)]. In particular, Chatterjee et al. (2011) proved the uniform consistency of the maximum likelihood estimator (MLE); Yan and Xu (2013) derived its asymptotic normality by approximating the inverse of the Fisher information matrix. In the directed case, Yan et al. (2016a) proved the consistency and asymptotic normality of the MLE in the model that is an exponential random graph model with outdegree and indegree sequences as sufficient statistics. In the framework of nonexponential random graph model, Wang et al. (2020)
proposed Probit model to model the degree heterogeneity of the directed networks and proved the consistency and asymptotic normality of the moment estimator. Besides,
Wang et al. (2021) use the probit distribution to model the degree heterogeneity of the affiliation networks and established the uniform consistency and the asymptotic normality of the moment estimator.Yan et al. (2019) proposed the covariate model to model the aforementioned three network features in unweighted directed networks. They established the consistency and asymptotic normality of the restricted MLE by using the restricted maximum likelihood method because of the challenge of exploring asymptotic theory [Fienberg (2012); Graham (2017)]. Wang (2021) further incorporated a sparsity parameter to the covariate model to allow the sparsity and developed the unrestricted maximum likelihood theory including the consistency and asymptotic normality of the MLE. However, the covariate model is only designed to unweighted directed networks. When we apply it to weighted networks, we need to neglect the weight information (i.e., treating all positive weight value as “1” and others as “0”). This may cause the information loss. As one example, all covariates are not significant when we apply the covariate model to fit the wellknown Enran email network [Cohen (2004)]; see Table 5. This motivates the present paper. We extend the covariate model to weighted networks by using the Poisson distribution to model weighted edges.
1.2 The Model
We now introduce our model. Consider a weighted directed graph on () nodes labeled by . Let be the adjacency matrix , where is the weight of the directed edge from head node to tail node . We do not consider selfloops here, i.e., . Let be the outdegree of vertex and be the outdegree sequence of the graph G. Similarly, define as the indegree of vertex and as the indegree sequence. The pair or
is the bidegree sequence. We assume that all edges are independently distributed as Poisson random variables with the probability distributions:
(1) 
The parameter quantifies the network sparsity. quantifies the effect of establishing outbound edges from sender while quantifies the effect of attracting inbound edges from receiver
. The vector parameter
is a dimensional regression coefficient for the covariate . We will call the above model the network Poisson model hereafter.The covariate is either a vector associated with edges or a function of nodespecific covariates. If denotes a dimensional vector of nodelevel attributes, then these nodelevel attributes can be used to construct a vector , where is a function of its arguments. As one example, if we let being equal to , then it measures the similarity between nodes and .
Motivated by techniques for the analysis of the unrestricted likelihood inference in the covariate model for directed graphs in Wang (2021), we generalize their approaches to weighted directed graphs here. When the number of nodes goes to infinity, we derive the error between the MLE and its true value . This is done by using a twostage Newton process that first finds the error bound between and with a fixed and then derives the error bound between and . They are for and for , up to an additional factor on parameters. Further, we derive the asymptotic normality of the MLE. The asymptotic distribution of has a bias term while does not have such a bias, which collaborates the findings in Yan et al. (2019). This is because of different convergence rates for and . Wide simulations are carried out to demonstrate our theoretical findings. In our simulations, this bias is very small, even could be neglected, which is different from the significant bias effect in Yan et al. (2019) and Wang (2021). This may be due to that the weighted values attenuate the bias. The application to the Enran email data set illustrates the utility of the proposed model.
For the remainder of the paper, we proceed as follows. In Section 2, we give the maximum likelihood estimation. In section 3, we present theoretical properties of the MLE. Numerical studies are presented in Section 4. We provide further discussion in Section 5. The proofs of theorems are relegated to the Appendix. All supported lemmas and detailed calculations are in the Supplementary Material.
2 Maximum Likelihood Estimation
Let and . If one transforms to
, then the probability in (
1) does not change. For the identifiability of the model, several possible restriction conditions immediately appear in our mind, including , , or , , or , . When we , , it will keep the density parameter .The logarithm of the likelihood function is
(2) 
The notation is a shorthand for . The score equations for the vector parameters , are easily seen as
(3) 
Under the restriction and , the first equation and the last equation with in the above system of equations will be excluded. Although there are a total of equations, the number of minimal equations is only .
The MLE of the parameter vector is the solution of the above equations if it exist. The NewtonRaphson algorithm can be used to solve the above equations. We can also simply use the function “glm” in R language to calculate the solution.
3 Theoretical Properties
Let and , then . With this reparameterized technique, we could set for convenience. Further, we set jointly as the identification condition in this section. The asymptotic properties of in the restriction is the same as those of in the restriction .
Notations. Let be the real domain. For a subset , let and denote the interior and closure of , respectively. For a vector , denote by for a general norm on vectors with the special cases and for the  and norm of respectively. When is fixed, all norms on vectors are equivalent. Let be an neighborhood of . For an matrix , let denote the matrix norm induced by the norm on vectors in , i.e.,
and be a general matrix norm. Define the matrix maximum norm: . The notation denotes the summarization over all and is a shorthand for . The notation or means that there exists a constant such that .
For convenience of our theoretical analysis, define . We use the superscript “*” to denote the true parameter under which the data are generated. When there is no ambiguity, we omit the super script “*”. Further, define
Write , and as the first, second and third derivative of on , respectively. Direct calculations give that . Let and be two small positive number. When , , we have
where , and we assume that is a constant. It is not difficult to verify, When and ,
(4) 
When causing no confusion, we will simply write stead of for shorthand, where
Write the partial derivative of a function vector on as
Throughout the remainder of this paper, we make the following assumption.
Assumption 1.
Assume that , the dimension of , is fixed and that the support of is , where is a compact subset of .
The above assumption is made in Graham (2017) and Yan et al. (2019). In many real applications, the attributes of nodes have a fixed dimension and is bounded. As one example, if nodal variables are indicator such as sex, then the assumption holds. If is not bounded, we could make the transform .
3.1 Consistency
In order to establish asymptotic properties of , we define a system of functions
which are based on the score equations for . Define be the value of when is fixed. Let be a solution to if it exists. Correspondingly, we define two functions for exploring the asymptotic behaviors of the estimator of :
(5)  
(6) 
could be viewed as the concentrated or profile function of in which the degree parameter is profiled out. It is clear that
We first present the error bound between with and . This is proved by constructing the Newton iterative sequence with initial value and obtaining a geometrically fast convergence rate of the iterative sequence, where and . Details are given in the Supplementary Material. The error bound is stated below.
Lemma 1.
If and , then as goes to infinity, with probability at least , exists and satisfies
Further, if exists, then it is unique.
By the compound function derivation law, we have
(7)  
(8) 
By solving in (7) and substituting it into (8), we get the Jacobian matrix :
(9) 
The asymptotic behavior of crucially depends on the Jacobian matrix . Since does not have a closed form, conditions that are directly imposed on are not easily checked. To derive feasible conditions, we define
(10) 
which is a general form of . Because is the Fisher information matrix of the profiled loglikelihood , we assume that it is positively definite. Otherwise, the network Poisson model will be illconditioned. When , we have the equation:
(11) 
whose proof is omitted. Note that the dimension of is fixed and every its entry is a sum of terms. We assume that there exists a number such that
(12) 
If converges to a constant matrix, then is bounded. Because is positively definite,
where
is the smallest eigenvalue of
. Now we formally state the consistency result.Theorem 1.
If , then the MLE exists with probability at least , and is consistent in the sense that
Further, if exists, it is unique.
From the above theorem, we can see that has a convergence rate and has a convergence rate , up to an additional factor. If and are constants, then and are constants such that the condition in Theorem 1 holds.
3.2 Asymptotic normality of and
The asymptotic distribution of depends crucially on the inverse of the Jacobian matrix , which generally does not have a closed form. In order to characterize this matrix, we introduce a general class of matrices that encompass the Fisher matrix. Given two positive numbers and with , we say the matrix belongs to the class if the following holds:
(13) 
Clearly, if , then is a diagonally dominant, symmetric nonnegative matrix. It must be positively definite. The definition of here is wider than that in Yan et al. (2016a), where the diagonal elements are equal to the row sum excluding themselves for some rows. One can easily show that belongs to this matrix class. With some abuse of notation, we use to denote the Fisher information matrix for the vector parameter .
To describe the exact form of elements of , for , , we define
Note that
is the variance of
. Then the elements of areYan et al. (2016a) proposed to approximate the inverse by the matrix , which is defined as
(14) 
where when and when .
We derive the asymptotic normality of by representing as a function of and with an explicit expression and a remainder term. This is done via applying a second Taylor expansion to and showing various remainder terms asymptotically neglect.
Theorem 2.
If , then for any fixed , as , the vector consisting of the first elements of is asymptotically multivariate normal with mean and covariance matrix given by the upper left block of defined in (14).
Remark 1.
By Theorem 2, for any fixed , as , the asymptotic variance of is , whose magnitude is between and .
Remark 2.
Under the restriction
, the central limit theorem for the MLE
is stated as: , converges in distribution to the standard normality.Now, we present the asymptotic normality of . Let
Following Amemiya (1985) (p. 126), the Hessian matrix of the concentrated loglikelihood function is . To state the form of the limit distribution of , define
(15) 
Assume that the limit of exists as goes to infinity, denoted by . Then, we have the following theorem.
Theorem 3.
If , then as goes to infinity, the dimensional vector
is asymptotically multivariate normal distribution with mean
and covariance matrix , where and is the bias term:Remark 3.
The limiting distribution of is involved with a bias term
Since the MLE
is not centered at the true parameter value, the confidence intervals and the pvalues of hypothesis testing for
may not achieve the nominal level without biascorrection under the null: . This is referred to as the socalled incidental parameter problem in econometric literature [Neyman and Scott (1948)]. The produced bias is due to different convergence rates of and . As discussed in Yan et al. (2019), we could use the analytical bias correction formula: , where and are the estimates of and by replacing and in their expressions with their MLEs and , respectively. In the simulation in next section, we can see that there is a little difference between uncorrected estimates and biascorrected estimates, which is different from the covariate model for binary directed networks in Yan et al. (2019). This may be due to that the infinitely weighted values for edges attenuate the bias effect.4 Numerical Studies
In this section, we evaluate the asymptotic results of the MLEs for model (1) through simulation studies and a real data example.
4.1 Simulation studies
Similar to Yan et al. (2019), the parameter values take a linear form. Specifically, we set for and let , for simplicity. Note that there are nodes in the simulations. We considered four different values for as . By allowing the true values of and to grow with , we intended to assess the asymptotic properties under different asymptotic regimes. Being different from Yan et al. (2019) with , we set here. The first element of was generated from standard normal distribution; the second element formed by letting , where is the first entry of the dimensional nodespecific covariate is independently generated from a distribution; and the third element , where follows a discrete distribution taking values and with probabilities and . The density parameter is .
Note that by Theorems 2, , , and are all asymptotically distributed as standard normal random variables, where is the estimate of by replacing with . We record the coverage probability of the 95% confidence interval, the length of the confidence interval, and the frequency that the MLE does not exist. The results for , and are similar, thus only the results of are reported. We simulated networks with or . Each simulation is repeated times.
Table 1 reports the coverage frequencies for and the length of the confidence interval. As we can see, the length of the confidence interval increases with and decreases with , which qualitatively agrees with the theory. All simulated coverage frequencies are all close to the nominal level. This indicates that the conditions in the theorems may be relaxed greatly.
n  

100  
200  

indicates the simulated results for .
Table 2 reports simulation results for the estimate and bias correction estimate at the nominal level
. The coverage frequencies for the uncorrected estimate are all close to those for corrected estimates. All simulated coverage frequencies achieves the nominal level. As expected, the standard error increases with
and decreases with .
Comments
There are no comments yet.