Network driven sampling; a critical threshold for design effects

05/20/2015
by   Karl Rohe, et al.
0

Web crawling, snowball sampling, and respondent-driven sampling (RDS) are three types of network sampling techniques used to contact individuals in hard-to-reach populations. This paper studies these procedures as a Markov process on the social network that is indexed by a tree. Each node in this tree corresponds to an observation and each edge in the tree corresponds to a referral. Indexing with a tree (instead of a chain) allows for the sampled units to refer multiple future units into the sample. In survey sampling, the design effect characterizes the additional variance induced by a novel sampling strategy. If the design effect is some value DE, then constructing an estimator from the novel design makes the variance of the estimator DE times greater than it would be under a simple random sample with the same sample size n. Under certain assumptions on the referral tree, the design effect of network sampling has a critical threshold that is a function of the referral rate m and the clustering structure in the social network, represented by the second eigenvalue of the Markov transition matrix, λ_2. If m < 1/λ_2^2, then the design effect is finite (i.e. the standard estimator is √(n)-consistent). However, if m > 1/λ_2^2, then the design effect grows with n (i.e. the standard estimator is no longer √(n)-consistent). Past this critical threshold, the standard error of the estimator converges at the slower rate of n^_m λ_2. The Markov model allows for nodes to be resampled; computational results show that the findings hold in without-replacement sampling. To estimate confidence intervals that adapt to the correct level of uncertainty, a novel resampling procedure is proposed. Computational experiments compare this procedure to previous techniques.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2018

Asymptotic Seed Bias in Respondent-driven Sampling

Respondent-driven sampling (RDS) collects a sample of individuals in a n...
research
10/01/2020

Neighbourhood Bootstrap for Respondent-Driven Sampling

Respondent-Driven Sampling (RDS) is a form of link-tracing sampling, a s...
research
08/08/2020

Clustering Network Tree Data From Respondent-driven sampling with application to opioid users in New York City

There is great interest in finding meaningful subgroups of attributed ne...
research
11/10/2021

Accurate confidence interval estimation for non-centrality parameters and effect size indices

We recently proposed a robust effect size index (RESI) that is related t...
research
06/20/2018

Regression adjustment in randomized experiments with a diverging number of covariates

Extending R. A. Fisher and D. A. Freedman's results on the analysis of c...
research
12/04/2018

Reducing Seed Bias in Respondent-Driven Sampling by Estimating Block Transition Probabilities

Respondent-driven sampling (RDS) is a popular approach to study marginal...
research
12/01/2020

General Regression Methods for Respondent-Driven Sampling Data

Respondent-Driven Sampling (RDS) is a variant of link-tracing sampling t...

Please sign up or login with your details

Forgot password? Click here to reset