A Sparse β-Model with Covariates for Networks
Data in the form of networks are increasingly encountered in modern science and humanity. This paper concerns a new generative model, suitable for sparse networks commonly observed in practice, to capture degree heterogeneity and homophily, two stylized features of a typical network. The former is achieved by differentially assigning parameters to individual nodes, while the latter is materialized by incorporating covariates. Similar models in the literature for heterogeneity often include as many nodal parameters as the number of nodes, leading to over-parametrization and, as a result, strong requirements on the density of the network. For parameter estimation, we propose the use of the penalized likelihood method with an ℓ_1 penalty on the nodal parameters, giving rise to a convex optimization formulation which immediately connects our estimation procedure to the LASSO literature. We highlight the differences of our approach to the LASSO method for logistic regression, emphasizing the feasibility of our model to conduct inference for sparse networks, study the finite-sample error bounds on the excess risk and the ℓ_1-error of the resulting estimator, and develop a central limit theorem for the parameter associated with the covariates. Simulation and data analysis corroborate the developed theory. As a by-product of our main theory, we study what we call the Erdős-Rényi model with covariates and develop the associated statistical inference for sparse networks, which can be of independent interest.
READ FULL TEXT