1 Introduction
Hardtoreach populations are groups of people that are not easily sampled by commonly used surveys, potentially due to their stigmatized status (e.g. female sex workers) or their infeasibility to be reached (e.g. people who committed suicide). There is a long history of developing methods to estimate the sizes of hardtoreach populations, such as direct survey estimates, capturerecapture, enumeration, and venuebased sampling, but still no method has emerged as the gold standard. UNAIDS/WHO outlined strengths and weaknesses of many of the methods (UNAIDS and WHO, 2010). Direct estimates typically use random surveys of the general population and calculate what percent of respondents belong to the hardtoreach population, but are inefficient or lead to biased results for small and hardtoreach populations. For reasonable sample sizes, many of these surveys don’t even reach the hardtoreach populations, making it impossible to estimate the population size. Other methods require working with members of the hardtoreach populations directly, which can lead to more accurate and precise estimates, but it is often difficult to contact populations directly that desire to stay hidden due to poor treatment and negative social stigma.
Originally motivated by estimating the size of people who have died in the Mexico earthquake (Bernard et al., 1989), the network scaleup method (NSUM) avoids the need for samples from the hardtoreach population entirely, making it more convenient to implement and shining a light on the scale of impossibletoreach population. NSUM uses aggregated relational data (ARD), which contains the answers to surveys with questions of the form “How many X’s do you know?” The ARD is collected from the general population, rather than from the target population.
After being introduced to UNAIDS as a promising method to estimate mostatrisk people for HIV infection, many countries/cities have attempted implementing ARD surveys in their general population. One of the largest surveys is the 2009 Ukraine survey, in which the Kiev International Institute of Sociology (KIIS) collected ARD from 10,866 respondents aged 14 and above from December 2008 to February 2009 to estimate the size of eight HIVrelated subpopulations in Ukraine. The ultimate goal of the survey is to improve HIV response efficiency in Ukraine with the help of accurate size estimates. The authors relied on the NSUM to estimate these population sizes since existing methods like multiplier method and capturerecapture were too resourceintensive to obtain accurate estimates for all of Ukraine and would require studies in at least 60 settlements to obtain national size estimates (Paniotto et al., 2009).
Despite the attractive features of NSUM, many researchers found that the size estimates from NSUM differed greatly from other methods and that the uncertainty intervals were too small (Salganik et al., 2011). Without a gold standard, the reliability of this relatively new method is being doubted. How to assess the NSUM results becomes one of the biggest stumbling blocks of it being widely used.
The basic NSUM model assumes that the number of people that respondent knows in subpopulation , denoted by , follows
(1) 
where is the degree (or total number of people that respondent knows), is the size of subpopulation , and is the total population size (Bernard et al., 1989, 1991)
. In words, this model assumes that the probability that a member of respondent
’s social network belongs to subpopulation is proportional to the prevalence of in the general population. could be estimated given ’s, ’s and .Furthermore, is typically unknown and difficult to estimate directly, requiring the models to estimate both and , either sequentially or simultaneously. The estimation of relies on survey questions about “known populations” (e.g. people named John or postal workers, where the population sizes are known through census or other means), while the estimation of the unknown relies on survey questions about “unknown populations” (e.g. female sex workers), where the population size is unknown.
Early frequentist models provided a solid foundation for quickly and easily estimating degrees and subpopulation sizes from ARD surveys (Killworth et al. (1998a), Killworth et al. (1998b)). Recent Bayesian models have improved size estimates and answered important scientific questions about social networks. Zheng et al. (2006) included additional overdispersion in the model through a negative binomial overdispersion parameter, both better capturing the variability in the data than the existing methods and providing an estimate of the variation in respondents’ propensities to know someone in each group. For example, the authors found that localized groups like American Indians and homeless persons had the highest overdispersion estimates, indicating that people are likely to either know no members of those populations or know far more members than the average person. Later, Maltiel et al. (2015) aimed to model the NSUM biases (barrier effects, transmission error, and recall error) directly through the priors, again further improving size estimates and estimating the strength of each bias within the subpopulations. Most recently, Teo et al. (2019) included respondent covariates both about the respondent (e.g. age, gender) and how the respondent felt about each unknown subpopulation (e.g. what levelofrespect do you feel towards female sex workers) to adjust size estimates and study how these covariates influenced the number of people the respondents knew in each subpopulation. One finding of their study in Singapore was that younger respondent knew more male clients of female sex workers, men who have sex with men, and drug users than older respondents, and no female respondents reported knowing any female sex workers. However, the authors needed to fit two separate models, one for male respondents and one for female respondents, to account for the behavioral differences. More importantly, the model ignored the extra variability in the data and it resulted in small uncertainty intervals, similar to the Killworth et al. (1998b) estimates. We refer readers to Laga et al. (2021) for a more complete review of the existing NSUM models and ARD properties.
Until now, all models assume that after controlling for the overdispersion, biases, and covariates, the responses for a single participant is independent across all subpopulations. However, we conjecture that this is not the case. Zheng et al. (2006) observed that the residuals from their model were correlated, and respondents who knew individuals who had suffered from one negative experience (e.g. suicide or rape) were more likely to know individuals who suffered from other negative experiences. We aim to properly model the correlation structure to further improve NSUM estimates and answer the sociological question of how different subpopulations are related.
In the Ukraine survey, information about the respondents’ demographics and their acquaintance to multiple known and unknown subpopulations is collected. To better utilize all of the information and learn more about the connections among subpopulations, we propose a new ARD model that accounts for overdispersion, decomposes the biases, and incorporates respondent characteristics, while also capturing the correlations between subpopulations. Our regression framework allows for more flexibility and easeofuse than the existing approaches and provides quantitative measures of how covariates influence the number of people known in both known and unknown groups. The correlation estimates from our model provide insight into how social networks form and can hint at how different subpopulations overlap in society. In addition, we propose various measures to assess the reliability of the model estimates.
This paper is organized as follows. First, Section 2 describes the Ukraine dataset. We introduce our proposed NSUM models in Section 3. The benefits and limitations of the models are discussed and our modeling choices are explained. We also show how the overall bias term in our model can be deconstructed into the three NSUM biases. In Section 4, we review existing ARD diagnostics and introduce several diagnostics that are common in network and other literature but new to the NSUM literature. We fit our proposed model to the Ukraine data set in Section 5. We establish empirical properties of our model in Section 6 and discuss practical advice for future collection and analysis of ARD in Section 7. Final remarks and discussion are found in Section 8.
2 Ukraine Data
Of the Eastern European countries, Ukraine has the second highest rate of new HIV infections in the WHO European Region, motivating the study of key populations (European Centre for Disease Prevention and Control/WHO Regional Office for Europe, 2017; Paniotto et al., 2009). From December 2008 to February 2009, the Kiev International Institute of Sociology interviewed 10,866 respondents aged 14 and above, asking “How many X’s do you know?” questions about 22 known groups and 8 unknown groups (Paniotto et al., 2009). The 8 unknown groups in the ARD survey include people injecting drugs over the last 12 months (IDUs), women providing sexual services for payment over the last 12 months (FSW), men providing sexual services for payment over the last 12 months (MSW), men who have sex with men (MSM), people living with HIV, IDUs sexual partners, FSW clients, and MSMs’ women sexual partners. We consider 4 of the 8 unknown subpopulations (FSW, MSW, MSM, and IDUs) since these subpopulations belong to the World Health Organization’s list of main key population groups vulnerable to HIV (World Health Organization and others, 2016). Examples of the known groups include men aged 2030, women who gave birth to a child in 2007, and men who served sentences in places of imprisonment in 2007. Supplementary Table 1 lists the known and unknown groups and the sizes of the known groups.
In addition to the ARD (), respondents were also asked demographic questions about their gender, age, education, nationality, profession, and whether they had access to the internet (), as well as “what level of respect” the respondent believed there was in Ukraine for each group on a 15 Likert scale, where 1 represents very low level of respect (). After removing respondents with missing responses, the remaining sample has 8,451 respondents, which is 77.77% of the original data set. Furthermore, based on the accuracy of leaveoneout size estimates for the known groups, Paniotto et al. (2009) recommended keeping only 11 of the 22 known groups, so for our analysis, and .
There are significant differences between the distributions of responses across subpopulations. Figure 1 shows frequency barplots for three subpopulations, women aged 70+, kids, and IDUs. For women aged 70+ and kids, the average responses are very similar (9.99 and 10.17, respectively). However, compared to women aged 70+, respondents tend to know either 0 kids or many kids – there were nearly 1.7 times as many respondents who knew 0 kids than respondents who knew 0 women aged 70+. In this situation, the distribution of kids known by the respondents is more overdispersed than the distribution of women aged 70+ known by the respondents. For the hardtoreach populations, the overdispersion is even more significant. Most respondents know 0 IDUs (92.6%), while some respondents report knowing 40, 50, 60, and even 130 IDUs. The distribution of responses indicates that there are likely significant barrier effects for certain populations like IDUs that violate the random mixing assumption.
3 Models
In this section, we introduce our correlated NSUM model, discuss its properties, and describe the computation techniques used to estimate the model parameters. We first introduce the ARD notation used in the remainder of the manuscript. Given an ARD survey with respondents about subpopulations, the number of people that respondent reports knowing in group is . For ease of notation, we will assume that there are known subpopulations and only one unknown subpopulation, and they are ordered such that the subpopulation is unknown (although in practice, there may be any number of unknown subpopulations). Then, the ARD is an matrix, where is the number of respondents in the survey, and is known while is unknown. The respondent demographics (e.g. age, gender, occupation, etc.) are denoted by , where is the information about respondent for variable , . In some surveys, respondents are also asked how they feel about members in group using questions of the form “what is your level of respect towards group ?” The exact phrasing of the question can vary, but the answers are denoted with entries for respondent and group . The key distinction between and is that is a respondentonly feature while contains information about the interaction between the respondents and the subpopulations. All columns of and are centered to have mean zero.
3.1 The Correlated NSUM Model
Our correlated NSUM model is written as
(2) 
where after scaling, represents the degree of respondent , represents the prevalence of group , and correct for the demographic and populationspecific information, and represents the bias between respondent and group . We complete the formulation with the priors,
(3) 
This parameterization is such that , a property shared by the Gamma prior in the Zheng et al. (2006) overdispersed model. The normal prior on comes from the evidence by Maltiel et al. (2015) that the distribution of degrees seems to follow a lognormal distribution. The halfCauchy priors on and are recommended by Gelman (2006) to restrict the parameters away from very large values. Note that and are not sampled, and are only transformations of the sampled parameters . We separate the biases into the different terms (barrier, tranmission, and recall) after all parameters have been estimated, and these details are shown in Supplementary Section 1.
3.2 Computation
Parameters are estimated using Markov Chain Monte Carlo (MCMC). As presented above, the MCMC implementations would have trouble producing unbiased results without prohibitively long sampling chains. The hierarchical form of the model for the bias terms suffers from very inefficient MCMC sampling. Specifically, when the
are all close to one another, will shrink towards 0. Then, since is small, can only take very small MCMC steps, keeping both close to one another and close to 0. To break this dependence between and , the model can be reparameterized through parameter expansion, which allows the size of the residuals to move independently of the variance parameters
(Van Dyk and Meng, 2001; Liu, 2003; Gelman, 2004; Gelman et al., 2008). Expanding our model, we reparameterize the bias as(4) 
where represents the lowertriangular Cholesky decomposition of the correlation matrix and . By sampling instead of the directly, the values of can comfortably jump around even when is small because of the additional error terms .
4 Diagnostics
Diagnostics to evaluate how well the assumed model fits the data have been largely underdeveloped for models involving ARD. Leaveoneout (LOO) subpopulation size estimates have been considered in almost all NSUM literature. In addition to LOO, Zheng et al. (2006) used standardized residuals from their model (defined as ) to study the residual correlation between subpopulations and the relationship between the residuals and the individuallevel predictors. McCormick and Zheng (2012) and Maltiel et al. (2015) also used absolute relative error of the estimates to evaluate the performance of models under different assumptions, although this line of thinking has limited applicability to real data because the true is unknown for the hardtoreach populations. In this section, we review existing informative diagnostics and propose several new diagnostics to evaluate the goodness of fit. We apply the diagnostics to the real data analysis in Section 5
4.1 Leaveoneout
A leaveoneout procedure can be implemented to evaluate the subpopulation size estimates. For models where the known subpopulation sizes are fixed in the estimation procedure (e.g. the Killworth MLE estimator), the known sizes are assumed to be unknown one at a time and estimated using the remaining known subpopulations. Thus, the model needs to be refit for each known subpopulation. For models in which all subpopulation sizes are estimands (e.g. the overdispersed model and our correlated model), the LOO procedure does not require reestimating the parameters. Instead, since the subpopulation sizes are scaled using known sizes, the known size of the left out subpopulation is excluded when scaling the size estimates. The estimates are then compared to the truth, either visually or using some metric.
4.2 Posterior predictive checks
Posterior predictive checks are a staple of Bayesian model diagnostics, but have remained largely unexplored in models for ARD. Zheng et al. (2006) looked at posterior predictive checks to evaluate the performance of the model in accurately estimating the correct proportions of responses for equal to each nonnegative integer. They concluded their model underestimated the proportion of respondents who know exactly one person in each group of the McCarty data set, and that data was “heaped” around nice numbers, like 10. We develop the use of posterior predictive checks and present here several more informative checks for ARD models and recommend their use in evaluating model fit for future studies.
Probability mass function: First, we recommend replicating the probability mass function diagnostic from Zheng et al. (2006). We especially recommend looking at and , as these values don’t typically suffer from rounding, but can inform the user about unexpected behavior in the data (e.g. fewer people know 0 doctors than expected from the number of doctors because only respondents who don’t have a family doctor will report knowing 0 doctors).
Conditionallypositive mean and standard deviation
: Second, we recommend plotting the mean of the positive responses against the standard deviation of the positive responses, grouped by each subpopulation (i.e. the columns of
). It is natural to look at the mean and standard deviation of the responses for each group, but posterior predictive checks are most informative when they capture features of the data not directly addressed by the model. Thus, we instead choose the mean and standard deviation conditional on the data or the posterior replicate being positive since NSUM models do not estimate these parameters directly. The mean and standard deviation for subpopulation are calculated bywhere is the number of positive responses in group , i.e. .
Correlation: Posterior predictive checks can also be used to measure the ability to capture the correlation between a respondent’s responses across subpopulations. For each posterior sample, the sample correlation is calculated, producing pairwise correlations between each subpopulation. The distribution of pairwise correlations across the posterior samples are then compared to the pairwise correlations of the original Ukraine data. For two given subpopulations and , the pairwise correlation is calculated by
4.3 Surrogate residuals
Given their broader use, we consider surrogate residuals separately. Residuals for discrete data often have limited use because of the unusual visual appearance of the residuals and the lack of any theoretical distribution for the residuals under the correctly specified model. Liu and Zhang (2018) introduced the surrogate residuals, which solve these problems by generating continuous residuals that follow a known distribution under the null distribution, allowing the user to interpret the residuals in a familiar way. For ARD models, surrogate residuals are most useful because they can be used to detect anomalies between the observed data and the assumed distribution, inform variable selection, and diagnose the relationship between the response and the covariates. While our model is not in a class of cumulative link regression models as was the primary focus of Liu and Zhang (2018), the authors provided two methods to calculate surrogate residuals for general models. Specifically, the authors showed under their jittering strategy (B) that given an outcome , jittered surrogate variables , and residuals , the residuals have properties and
under the null hypothesis
.Calculating the surrogate residuals in our setting is not obvious. Liu and Zhang (2018) proposed the surrogate residuals under a frequentist paradigm, while our model is inherently Bayesian. To the best of our knowledge, the surrogate residuals have only been applied to one Bayesian model (Park et al., 2021). In this case the model was ordinal, and the authors provided little detail on how to calculate the residuals. Thus, we introduce our approach for calculating surrogate residuals under the Bayesian paradigm. While the formulation and theorem are stated simply, actually calculating the residuals is difficult since is typically analytically intractable. However, it is straightforward to estimate using Monte Carlo approximations. In Algorithm 1 we outline our approach for calculating the residuals , which relies on samples from the posterior predictive distribution. Note that the algorithm is given for a single and the process should be repeated for all combinations of and .
In our case, (
) is the CDF of the Poisson distribution evaluated at the simulated responses (observed responses). Note that under the Bayesian paradigm, for each observation
, there is actually a vector of surrogate residuals
. Thus, we can study the posterior distribution of the surrogate residuals, and we recommend looking at the behavior of the surrogate residuals across multiple posterior samples. Alternatively, we could use a point estimate for (e.g. posterior mean) to calculate a single vector of surrogate residuals. This approach is more similar to the original frequentist approach, but doesn’t capture the uncertainty of the parameter estimates.5 Ukraine ARD Results
We fit our correlated NSUM model to the Ukraine data and compare our model to the Zheng et al. (2006) overdispersed model. We used RStan to implement both models (Stan Development Team, 2019). Only one group was namerelated, so we choose to include all known groups in the scaling term. The R code for the Maltiel et al. (2015) model provided in the Maltiel and Baraff (2015) R package and the JAGS code for the Teo et al. (2019) model were prohibitively slow to model the large Ukraine ARD, so results from these two models were not reported here.
5.1 Model performance
Leaveoneout
: First, we show the accuracy and validity of our subpopulation size estimates using LOO. The estimates with corresponding 95% credible intervals are shown in Figure
2. Overall, our correlated model is typically closer to the true sizes, and among 7 cases that our point estimates are outside the intervals of the overdispersed model, 5 of them are closer to the truth. The correlated model produces larger subpopulation size credible intervals intervals, consistent with the increased complexity of the model. This is desirable since NSUM models have historically underestimated the variability in the data, placing too much trust in the resulting size estimates. Additionally, both models produce smaller credible intervals for the larger subpopulations where the responses contain more information.Many authors have noted that models typically overestimate large subpopulations and underestimate small subpopulations (Zheng et al., 2006; McCormick et al., 2007; Maltiel et al., 2015). For the correlated and overdispersed models with the Ukraine data set, this does not seem to be the case.
Conditionallypositive mean and standard deviation
: Next, we consider the joint distribution of positive mean and positive standard deviation for insample data. The results are shown in Supplementary Figure 1. First, the overdispersed model typically has a larger uncertainty. While the correlated model does not perfectly capture the conditional mean and standard deviations for all subpopulations, it displays a significant improvement over the overdispersed model. The overdispersed model typically captures the mean of the data but underestimates the standard deviations, while the correlated model sometimes underestimates the mean but captures the standard deviation).
Probability mass function: For the probability mass function, we consider 9 different probabilities in Supplementary Figure 2 where the true proportions are plotted against the simulated proportions with 95% credible intervals included. We denote the overdispersed model using empty shapes, the correlated model using filled shapes, the known subpopulations as circles, and the unknown subpopulations as triangles. We find that for , both models perform similarly across all subpopulations. The correlated model consistently estimates fewer zeros than observed while the overdispersed model estimates more zeros than observed in the data set. We also observe the heaping pointed out by Zheng et al. (2006), especially for and . Perhaps surprisingly, is estimated fairly accurately for our correlated model while is estimated poorly for both models. Our correlated model also does a better job of capturing larger responses, as indicated by the plot for . Overall, the correlated model again outperforms the overdispersed model.
Correlation: While we do include correlated random effects in our model, we do not explicitly model the correlation of the responses. Thus, we still consider the posterior predictive pvalues of the correlation of the responses, as shown in Supplementary Figure 3 with the pvalues in the left plot corresponding to the overdispersed model and in the right plot corresponding to the correlated model. Under the true model, we typically expect the posterior predictive pvalues to be be concentrated near 0.5. It is clear from the clustering of the overdispersed pvalues around 0 and 1 that the model does not produce responses that share the same correlated as the observed data. One the other hand, the clustering around 0.5 for our correlated model indicates that the correlated random effects are accurately capturing the correlation between subpopulations in the data set.
Surrogate residuals
: Finally, we review some results from the surrogate residuals. While the surrogate residuals can be used in many ways, in this section we show their ability to detect missing covariates, as is common in ordinary linear regression. Figure
3 shows one realization of the surrogate residuals corresponding to a single randomly chosen posterior sample against the standardized age covariate for the subpopulations men aged 2030 and IDUs. Since there is a vector of surrogate residuals for each posterior sample, to visualize the variability in the residuals across these samples, loess curves for 25 random samples were are added for reference. In general, there is very little variability across the posterior samples, except in areas with very few observations (respondents aged older than 80). For men aged 2030, the overdispersed model residuals are highly correlated with age, resulting in a quadratic loess curve. On the other hand, the residuals for the correlated model are mostly uncorrelated with age, indicating that the regression structure in the correlated model does accurately capture the relationship between age and the responses. There is no indication that higher order terms are needed.For IDUs, we see that the relationship between the residuals and age is much weaker, and even the loess curve for the overdispersed model is relatively flat with a slight downward trend. For the correlated model the loess curve is almost absolutely flat. In this case, age seems to play a larger role in the responses for men aged 2030 than it does for IDUs, which is expected. Not shown here is a QQ plot of the surrogate residuals against the uniform distribution, where both the overdispersed and correlated models perform well, although the correlated model produces slightly better QQ plots.
5.2 Results
Our model performs well with respect to all considered diagnostics, showing the reliability of our model. We first show the results of the size estimates for the hidden populations and corresponding 95% credible intervals in Table 1. We notice that the size estimates from our correlated model are considerably larger than the estimates from the overdispersed model, but are very close to the size estimates from the adjusted estimates from Paniotto et al. (2009) which weighted observations based on the level of respect responses. While there is no gold standard, we believe the estimates from our correlated model is much closer to the truth based on the inclusion of additional information and the performance of the models in the previous diagnostics.
The correlated model credible intervals are also much wider than the other two models. This is a desired outcome, since simpler models often lead organizations to put too much confidence in size estimates, while the uncertainty around hardtoreach population sizes is typically very large.
Subpopulation  Overdispersed  Correlated  Ukraine Adjusted  

FSW 




MSW 




MSM 




IDU 



Next, we consider the covariate parameters estimates, and . Table 2 includes the group parameters corresponding to age, age, and level of respect. Overall, the parameter estimates are consistent with the expected results. For example, older respondents are more likely to know people 70 or older and younger people are more likely to know kids and people aged 20 to 30. Similarly, older respondents know more people who died in 2007. For the hardtoreach populations, younger respondents are more likely to know people in all unknown groups.
Subpopulation  Age  Age  Level of Respect 

Men 2030  0.38*  0.19*  0.05* 
Men 70+  0.22*  0.05*  0.04* 
Female 2030  0.38*  0.16*  0.08* 
Female 70+  0.25*  0.01  0.05* 
Kids  0.31*  0.00  0.12* 
Invalids  0.28*  0.08*  0.03 
Died 2007  0.47*  0.06*  NA 
Pavlo  0.18*  0.00  0.18* 
Prisoners 2007  0.27*  0.26*  0.21* 
Divorced Men 2007  0.24*  0.34*  0.02 
Birth 2007  0.17*  0.12*  0.16* 
FSW  0.74*  0.07  0.26* 
MSW  0.86*  0.12  0.02 
MSM  1.05*  0.28  0.62* 
IDU  0.60*  0.31*  0.09 
Regarding level of respect, all significant parameters are positive, consistent with the belief that respondents with a more positive perception of a subpopulation will tend to know more people from the subpopulation. The two largest significant parameters for known groups are for prisoners and kids, which are perhaps more likely to have significant barrier effects. The parameters for FSW and MSM are the largest across all groups, which is also unsurprising.
We also report the estimates for the nongroup level parameters (gender, education, nationality, profession, and access to internet). The parameter estimates corresponding to male was less than 0.01, 0.20 for Ukraine, 0.13 for employed, 0.09 for access to internet, 0.17 for secondary education, and 0.19 for vocational education, with a baseline of candidate of sciences or doctor of sciences. All parameters except for male were significant at . While previous studies have found that men have larger network sizes than women, for the populations in this study, there is not a significant difference between the number of people reported between the gender of the respondents. The other global covariates are consistent with previous findings and are expected. The network literature has shown that employed and educated individuals typically have larger network sizes. While we are not aware of literature that studies how access to internet affects network sizes, our finding is not surprising since access to internet typically means the individual can reach a broader range of contacts, for example through email.
Finally, we can also look at the estimated correlations, the novel feature of our model. The correlation matrix is shown in Figure 4
and is sorted using a hierarchical clustering algorithm. Our model produced many expected correlations, e.g. respondents who knew more men aged 70+ also knew more women aged 70+. For example, respondents who know more men and women aged 20 to 30 are also less likely to know people who died in 2007 than those respondents who know more men and women aged 70+. The results also highlight some interesting relationships that are less obvious. First, we find that respondents who know more men named Pavlo are less likely to know young men, and even less likely to know young women. Without census information about the birth records of men named Pavlo in Ukraine, we can guess that Pavlo is more common among older men in Ukraine.
Consistent with the residual analysis in Zheng et al. (2006) for the McCarty et al. (2001) data that found residuals for groups associated with negative events were highly correlated, the key populations of the Ukraine survey were all highly correlated with one another, prisoners, and to a lesser extent divorced men. Also consistent with the findings in Teo et al. (2019) that younger participants knew more IDU and MSM than older participants, even after accounting for age in our model, we still find that men and women aged 20 to 30 is more positively correlated with IDU and MSM than men and women aged 70+, which are negatively correlated with the hardtoreach populations.
It is important to note that our correlation estimates correspond to the correlation of the reported number of connections. Therefore, some of the correlation between the hardtoreach populations may be an indication of a respondent’s willingness to answer questions truthfully. That is, if respondents are unwilling to truthfully divulge how many FSW they know, then they may also be unwilling to answer honestly about IDU, MSW, and MSM, leading to two groups of people: those who are willing to report knowing members of hardtoreach populations and those not willing, potentially increasing the observed correlation.
6 Model Misspecification
It is important to understand how violations of the correlated model affect parameter estimates, specifically the estimates. In practice, the number of respondents in ARD surveys varies widely from around 200 to over 10,000. Thus, the performance of the model under several different sample sizes is studied. The model is studied under both simulation for small and large sample sizes (below) and under asymptotic theory using numerical approaches (Supplementary Section 4). In this simulation study, we generate data from 5 models of varying degrees of misspecification. These allow us to quantify the effect of each model assumption, i.e. each set of covariates and the correlation structure.
For our simulations, we created data sets from the correlated model in Equation (2) using the covariates from the Ukraine data set with parameters estimated from the Ukraine results. Specifically, , ,
, and all hyperparameters were fixed at the posterior means of the real data analysis MCMC chains. For each simulated data set, new network sizes
and biases were sampled from their prior distributions with the estimated hyperparameters. The covariates and were sampled from the observed Ukraine covariates with replacement. We performed the simulation at five different respondent sample sizes, 100, 300, 1000, 3000, and 10,000.We fit five different models, summarized by Table 3, where Model 5 represents the data generating model. The results are shown in Figure 5 for four known groups and the four unknown groups. In all cases, Model 5 (the true model) converges to the true value as the sample size increases. For Model 4, the effect that missing the covariate has on the estimate is directly proportional to how large is. For men aged 2030, is large and thus affects the parameter estimate more than for the unknown subpopulations where is small. The same trend is true for Model 3 and . In all cases, there is a much weaker relationship between the responses and than the responses and , suggesting that for the Ukraine data set, excluding does not have a large effect on the subpopulation size estimates. It is thus important to phrase the level of respect questions well so that contains more information, a point we expand on in Section 7.
For Model 2, assuming no correlation in the presence of correlation also biases the results. The difference between Model 2 and Model 5 is most clear in the larger and highly correlated populations, e.g. men aged 2030 and women aged 2030. In both cases, assuming no correlation leads to underestimating the prevalence. The bias from the missing correlation structure seems to diminish, but not disappear entirely, as the number of respondents increases to 10,000.
X?  Z?  Correlated?  

Model 1  No  No  No 
Model 2  Yes  Yes  No 
Model 3  Yes  No  Yes 
Model 4  No  Yes  Yes 
Model 5  Yes  Yes  Yes 
7 Practical Advice
In this section, we offer guidance on how to better collect and analyze ARD.
Level of respect: The first suggestion is to phrase “level of respect” questions to maximize the correlation between the level of respect and the ARD. In Ukraine, the level of respect question was phrased as, “What level of respect do following groups have in Ukraine…”. The phrasing focuses on how people in Ukraine feel about the groups rather than how the respondent feels, leading to a weaker relationship between the level of respect and ARD responses. However, in Teo et al. (2019), their measure of respondent social acceptability rating of the hardtoreach populations results in a closer connection between the ARD and the level of respect. For the three shared hardtoreach groups (FSW, MSM, and IDU), the pairwise correlations in Ukraine between the number of people the respondent reports knowing and their level of respect for the three groups are 0.038, 0.031, and 0.020, respectively. On the other hand, the pairwise correlations in Singapore are 0.040, 0.129, and 0.131, respectively. Thus while level of respect does not seem to be correlated strongly to the number of FSW respondents know in either sample, the level of respect does seem to be a relatively strong predictor for MSM and IDU in the Singapore data. While there may not always be a connection between ARD and level of respect questions, it is important to ensure that questions are phrased in order to detect as much correlation as possible.
Similarly, it is important to collect the level of respect questions for all subpopulations. While the Teo et al. (2019) level of respect responses are more correlated with the ARD than in the Ukraine dataset, the authors only included information about the hardtoreach populations. This results in a loss of information. In our analysis, we were able to account for the level of respect for all subpopulations, further improving the results.
Covariates of respondents: Second, it is important to ask respondents about themselves. We have shown through both the simulation study and the numerical analysis that ignoring important covariates can bias subpopulation size estimates. It is impossible to account for the covariate effect for ARD surveys that do not collect such information. Additionally, it is difficult to tell prior to collection which covariates are important, so survey questions should capture multiple respondent properties, including age, social behavior, and wealth.
Inclusion of similar subpopulations: Finally, we recommend including more known groups which face similar stigma to hardtoreach populations. Zheng et al. (2006) found that other populations associated with negative experiences (e.g. prisoners, homicide victims, rape victims, people who have committed suicide, and people who were in auto accidents) were also highly correlated. While including correlated groups may improve the size estimates in our correlated model, understanding how connected different subpopulations are is also important. If groups are identified as being highly correlated with hardtoreach populations, future researchers can better understand the social networks of members of hardtoreach populations, making future survey sampling easier and more efficient.
8 Discussion
Aggregated relational data (ARD) is an extremely useful tool to not only population size estimation, but also to learn about properties of social networks. Many models have been developed to better capture the behavior of the data and account for the different sources of bias. One major limitation of the models is that uncertainty estimates are far too small, so researchers are too confident in their estimates. We improve upon these models by incorporating covariate and subpopulation coefficients in an easy to understand manner, addressing the empirical correlation between subpopulations, and providing a better fit and more reasonable uncertainty estimates for model parameters. Another benefit of our model is that we make very few assumptions about the biases in the model, allowing the data to drive the parameter estimates. The proposed various model diagnostics are also useful in other general settings when there is not a groundtruth to compare with. Satisfactory diagnostic results will increase the credibility of the new NSUM estimates and make them more acceptable by decision makers.
From this Ukraine ARD, we have identified the relationship between several predictors and the number of people known by respondents in both known and unknown groups, as well as produced reliable size intervals and the associated uncertainty intervals of the hardtoreach populations associated with HIV prevalence. Our estimates align with the adjusted estimates of Paniotto et al. (2009) and can be used to inform government HIV prevention policy. Of the four key populations we considered, we estimated that there are nearly four times as many injection drug users as there are FSW, MSW, and MSM. This is consistent with other studies that estimated IDUs and their sexual partners make up 64% of people living with HIV in Ukraine (Des Jarlais et al., 2009). Combined with estimates of HIV prevalence among key populations, our size estimates illustrate the severity of the HIV epidemic in Ukraine.
Our analysis hints at, but does not explicitly model, the increased risk in individuals that belong to more than one key population. Our correlated model has shown that respondents who report knowing more people in one hidden population are more likely to know people in the other hidden populations. World Health Organization and others (2011) estimated that the HIV prevalence among female sex workers who inject drugs is around 43%, while only around 8.5% in female sex workers who do not inject drugs. Based on these relationships, it is clear that is it not sufficient to understand only the behavior of these key populations as a whole, but it is also necessary better understand the relationship between populations in order to effectively lower new HIV infections.
We believe that future ARD models should head in two directions. The first direction is to better exploit the relationship between populations, as illustrated by both our estimated correlation matrix and the covariate effects. It is clear that some individuals are closer to the key populations than others, either because of their age, gender, and similar characteristics, or because of their existing social networks. We would be able to better understand the properties and behavior of the key populations if we were able to survey respondents who were more familiar with the populations of interest.
The second direction is to connect ARD to more traditional network analysis, and a few researchers have already started exploring this area. Breza et al. (2017) showed when ARD is sufficient to estimate node or graphlevel statistics, removing the need for expensive complete network data. Breza et al. (2019) extended the theory and provided justification for why ARD can be used in place of complete networks. If future research can use ARD instead of complete networks, the cost of the studies will be greatly reduced, providing additional financial capital for other studies.
References
 Estimating the size of an average personal network and of an event subpopulation: some empirical results. Social Science Research 20 (2), pp. 109–121. Cited by: §1.
 Estimating the size of an average personal network and of an event subpopulation. In The Small World, pp. 159–175. Cited by: §1, §1.
 Using aggregated relational data to feasibly identify network structure without network data. Technical report National Bureau of Economic Research. Cited by: §8.
 Consistently estimating graph statistics using aggregated relational data. arXiv preprint arXiv:1908.09881. Cited by: §8.
 HIV among injecting drug users: current epidemiology, biologic markers, respondentdriven sampling, and supervisedinjection facilities. Current Opinion in HIV and AIDS 4 (4), pp. 308. Cited by: §8.
 HIV/AIDS surveillance in Europe 2017 – 2016 data. ECDC. Cited by: §2.
 Using redundant parameterizations to fit hierarchical models. Journal of Computational and Graphical Statistics 17 (1), pp. 95–122. Cited by: §3.2.
 Parameterization and Bayesian modeling. Journal of the American Statistical Association 99 (466), pp. 537–545. Cited by: §3.2.
 Prior distributions for variance parameters in hierarchical models (comment on article by browne and draper). Bayesian Analysis 1 (3), pp. 515–534. Cited by: §3.1.
 A social network approach to estimating seroprevalence in the united states. Social Networks 20 (1), pp. 23–50. Cited by: §1.
 Estimation of seroprevalence, rape, and homelessness in the united states using a social network approach. Evaluation Review 22 (2), pp. 289–308. Cited by: §1.
 Thirty years of the network scaleup method. Journal of the American Statistical Association 116 (535), pp. 1548–1559. Cited by: §1.
 Alternating subspacespanning resampling to accelerate markov chain monte carlo simulation. Journal of the American Statistical Association 98 (461), pp. 110–117. Cited by: §3.2.
 Residuals and diagnostics for ordinal regression models: a surrogate approach. Journal of the American Statistical Association 113 (522), pp. 845–854. Cited by: §4.3, §4.3.
 NSUM: network scale up method. Note: R package version 1.0 External Links: Link Cited by: §5.
 Estimating population size using the network scale up method. The Annals of Applied statistics 9 (3), pp. 1247. Cited by: §1, §3.1, §4, §5.1, §5.
 Comparing two methods for estimating network size. Human Organization 60 (1), pp. 28–39. Cited by: §5.2.
 Adjusting for recall bias in “How many X’s do you know?” surveys. In Proceedings of the Joint Statistical Meetings, Cited by: §5.1.
 Latent demographic profile estimation in hardtoreach groups. The Annals of Applied Statistics 6 (4), pp. 1795. Cited by: §4.
 Estimating the size of populations with high risk for HIV using the network scaleup method. Ukraine: Kiev International Institute of Sociology. Cited by: §1, §2, §2, §5.2, §8.

Bayesian approach to multivariate componentbased logistic regression: analyzing correlated multivariate ordinal data
. Multivariate Behavioral Research, pp. 1–18. Cited by: §4.3.  Assessing network scaleup estimates for groups most at risk of HIV/AIDS: evidence from a multiplemethod study of heavy drug users in Curitiba, Brazil. American Journal of Epidemiology 174 (10), pp. 1190–1196. Cited by: §1.
 RStan: the R interface to Stan. Note: R package version 2.19.2 External Links: Link Cited by: §5.
 Estimating the size of key populations for HIV in Singapore using the network scaleup method. Sexually Transmitted Infections 95 (8), pp. 602–607. Cited by: §1, §5.2, §5, §7, §7.
 Guidelines on estimating the size of populations most at risk to HIV. Geneva, Switzerland: World Health Organization, pp. 51. Cited by: §1.
 The art of data augmentation. Journal of Computational and Graphical Statistics 10 (1), pp. 1–50. Cited by: §3.2.
 Global HIV/AIDS response: epidemic update and health sector progress towards universal access: progress report 2011. Cited by: §8.
 Consolidated guidelines on HIV prevention, diagnosis, treatment and care for key populations. Cited by: §2.
 How many people do you know in prison? Using overdispersion in count data to estimate social structure in networks. Journal of the American Statistical Association 101 (474), pp. 409–423. Cited by: §1, §1, §3.1, §4.2, §4.2, §4, §5.1, §5.1, §5.2, §5, §7.
Supplementary Section 1 Ukraine Data
Subpopulation  Known Size 

Men aged 2030  4,088,438 
Men aged 1517  935,153 
Men above 70  1,328,606 
Women aged 2030  3,966,956 
Women aged 1517  899,443 
Women above 70  2,993,846 
Children (boys and girls) aged 1013  1,869,098 
Moldavians  258,619 
Romanians  150,989 
Poles  144,130 
Jews  104,186 
Romany  47,587 
1st group invalids  278,195 
Doctors of any speciality  222,884 
People who died in 2007  762,877 
Men named Pavlo  323,863 
Men who served sentences in places of imprisonment in 2007  108,511 
Men who officially divorced in 2007  178,364 
Women who gave birth to a child in 2007  472,657 
Doctors and Candidates of Science who received a scientific degree in Ukraine over the last 15 years  69,471 
Nurse women, nurse men, aidmen and aidwomen  487,148 
Militiamen  273,200 
Supplementary Section 2 Bias Decomposition
In this section, we show how to decompose our bias estimates into the transmission, barrier, and recall effects. While our main focus is on subpopulation size estimates, the effect of each bias is also an important question. We have found that without overly strict assumptions on the structure of the biases, estimating each individual bias separately is infeasible, but imposing incorrect assumptions can negatively affect the other parameter estimates. Therefore, we estimate only an overall bias term that accounts for all sources of bias and decompose the bias into the individual biases after fitting the model, leaving the other model parameters unaffected by the bias assumptions. The form of our bias gives the model more flexibility in capturing the heterogeneity present in the data.
Our decomposition approach assumes the probability component of the binomial likelihood of the maltiel2015estimating recall error model is equivalent to the mean parameter of our Poisson likelihood. The recall error model is given by:
(1) 
Then, we fit a Bayesian regression model with the posterior means of our probabilities as the response. Specifically, we first calculate the posterior means as
(2) 
where is the number of posterior samples. Then, using the prior assumptions from maltiel2015estimating, we assume the model
(3) 
where and are fixed from prior information and flat noninformative priors are placed on the remaining hyperparameters. The above priors can easily be adapted to suit new prior information and different assumptions about the structure of the biases.
Supplementary Section 3 Supplementary Figures
Supplementary Section 4 Asymptotic Misspecification
In this section, we implement a theoretical asymptotic analysis to study the influence of global
covariates and correlation. We use the pioneering work of white1982maximum to numerically calculate the asymptotic bias of the size estimate parameter under these two forms of misspecification. Specifically, given a vector of parameter values that satisfies(4) 
where the expectation is taken with respect to the true parameter vector , then . Additionally,
Using this setup, we can calculate the theoretical asymptotic distribution of parameters for a misspecified model. Unfortunately, is analytically intractable in our case, necessitating a numerical solution to Equation (4). heagerty2001misspecified faced the same problem under a Binomial generalized linear model. Here we adopt a similar numerical study for our Poisson likelihood.
Supplementary Section 4.1 Missing Covariates
First we study the effect of missing a covariate. In this case the distribution of network sizes and random effects is identical ( , and ), but the likelihood for the true model is given by , whereas the misspecified model is given by . We set , , and and vary , , and together. Here, and with probability 0.5 and with probability 0.5.
The results under different random effect variances are shown in Table 2. While there is numerical imprecision for the asymptotic results, it is clear that both the prevalence and standard deviation estimates are asymptotically biased when fitting a misspecified model which ignores important covariates since the relative errors are many times larger than for the true model. Interestingly, the asymptotic bias of the parameters does not seem to depend on the variance of random effects.
Model  
True  1  0  1  3  1  3  
Misspecified  4  6  120  89  101  86  
True  0  0  1  0  0  0  
Misspecified  1  6  112  59  84  760  
True  0  0  1  9  1  2  
Misspecified  4  6  120  18  56  50  
True  0  0  1  0  1  0  
Misspecified  4  8  128  36  54  42 
Supplementary Section 4.2 Missing Correlation
Next, we study the effect of excluding the correlation structure. The likelihood and distribution of network sizes for both the true and misspecified model is given by where , . For the true model, , whereas . For all simulations, , , and , where and is specified below. The correlations were chosen to roughly mimic the observed correlation in the Ukraine dataset, where some populations were moderately correlated, while other populations were essentially uncorrelated with the others.
Table 3 presents the asymptotic bias of the prevalence and standard deviation parameters when fitting an uncorrelated model when the true model is correlated. We find that ignoring the correlation does not seem to asymptotically bias prevalence or standard deviation estimates. In all cases of the misspecified model, the relative error is less than 1%. In most cases, this error is larger than the error found under the true model, due to the numerical imprecision of the 4dimensional integral required to calculate the asymptotic bias. We stress that while inference is likely unaffected asymptotically, the simulation study showed that the results are affected for a finite number of respondents, and estimating the correlation is still important because it answers the scientific questions of how different social groups connect.
Model  
True  73  48  1  753  130  351  
Misspecified  6  32  27  120  50  221  
True  53  121  48  434  112  12  
Misspecified  1  10  8  410  10  84  
True  33  12  112  333  101  145  
Misspecified  1  32  28  27  70  90  
True  21  5  10  212  150  182  
Misspecified  9  42  2  16  145  190 
apalike NSUM_bib2
Comments
There are no comments yet.