# Sampling on Social Networks from a Decision Theory Perspective

Some of the most used sampling mechanisms that propagate through a social network are defined in terms of tuning parameters, for instance, Respondent-Driven Sampling (RDS) is specified by the number of seeds and maximum number of referrals. We are interested in the problem of optimising these tuning parameters with the purpose of improving the inference of a population quantity, where such quantity is a function of the network and measurements taken at the nodes. This is done by formulating the problem in terms of Decision Theory. The optimisation procedure for different sampling mechanisms is illustrated via simulations in the fashion of the ones used for Bayesian clinical trials.

## Authors

• 9 publications
• 18 publications
05/09/2012

### Learning Continuous-Time Social Network Dynamics

We demonstrate that a number of sociology models for social network dyna...
11/23/2020

### Approximate Tolerance and Prediction in Non-normal Models with Application to Clinical Trial Recruitment and End-of-study Success

A prediction interval covers a future observation from a random process ...
10/08/2018

### Social Network Mediation Analysis: a Latent Space Approach

Social networks contain data on both actor attributes and social connect...
01/03/2016

### How can one sample images with sampling rates close to the theoretical minimum?

A problem is addressed of minimization of the number of measurements nee...
07/15/2021

### Clustering of heterogeneous populations of networks

Statistical methods for reconstructing networks from repeated measuremen...
12/22/2020

### Empirically Classifying Network Mechanisms

Network models are used to study interconnected systems across many phys...
05/20/2017

We consider goods that can be shared with k-hop neighbors (i.e., the set...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Respondent driven sampling (RDS) is a widely used sampling mechanism that takes advantage of social network structure. It was proposed by Heckathorn (1997) and it is implemented with the aim of sampling from hidden populations, as it happens in problems from Epidemiology and Marketing. The distribution of RDS is specified by the sample size and two tuning parameters; these parameters are: the number of maximum referrals per individual and the number of starting points (also known as seeds

). If we assume that prior information is available regarding the joint distribution of social network structure and the distribution of the responses (

i.e., observations at the node level), then it is of interest (from a methodological and a practical point of view) how such information could be used in to calibrate the RDS tuning parameters. Note that, for this question to make sense, it is necessary to specify criteria for evaluating the performance of different sampling mechanisms.

We argue that Decision Theory, and more specifically, Lindley’s formulation of Bayesian experimental design (Lindley (1972)

), is the appropriate formalism for this task. Decision theory allows us to evaluate a design based on the average quality of the inference, here the ‘quality of the inference’ is encoded in the loss function and the average is taken with respect to the prior predictive distribution.

In this paper, we illustrate the use of Decision Theory in calibrating the tuning parameters of sampling mechanisms on networks. As a by-product, we provide intuition about the usefulness of more general versions of RDS mechanisms.

### 1.1 Related work

Our work is related to the work of Chaloner and Verdinelli (1995), since, like them, we discuss how to implement ideas from Bayesian experimental design to solve applied problems. We adopt the formulation of Bayesian experimental design proposed by Lindley (1972) and the framework for sequential decision making proposed by Bellman (1957). In the context of social network data and, more specifically, the performance of RDS, our work relates to the simulation studies performed by Blitzstein and Nesterko (2012). To incorporate the different sources of uncertainty, we rely on the approach developed by Lunagómez and Airoldi (2016)

and on the Markov Chain Monte Carlo approaches developed by

M ller et al. (2006) and Andrieu and Roberts (2009).

### 1.2 Contributions

The main contributions of this paper are: First, we cast the most common problems that involve comparisons between sampling mechanisms on social networks into a Bayesian decision theory framework, thus providing a principled approach, for example, to calibrate tuning parameters of existing mechanisms and to evaluate new mechanisms; second, we discuss the process of finding optimal designs when decisions and data appear sequentially, we frame this discussion in terms of the theory of backwards induction; third, for the proposed approach, the calibration takes into account the model specification (this encompasses the priors), the inference, and the loss function that encodes the criteria or metric needed to evaluate the quality of the inference; fourth, we propose a new network sampling mechanism and carry out comparative performance evaluation.

## 2 Problem Set-up

### 2.1 Respondent-Driven Sampling

Let be a social network with nodes and assume those nodes are labeled. Let

denote a vector that has as

-th component (denoted by ) a measurement to be taken at the -th node, here . The objective is to perform inference on a feature of the joint distribution of

(for instance: the average probability that

for binary with possible values in , ). The network and the responses , , are accessible only trough sampling. Respondent-Driven Sampling (RDS) is a procedure, proposed by Heckathorn (1997), that deals with this problem. It is defined as a set of policies that allow the sampling to propagate trough the network, conditional on a set of starting points or seeds.

Respondent-Driven Sampling can be understood as a stochastic process on discrete time that is conditional on the underlying network and has as state space ; where denotes the labels of nodes recruited at time ; denotes the subgraph of implied by the nodes recruited up to time and the edges that encode the information about which node from time recruited which node at time , where ; and denote, respectively, the observed responses and the reported degrees up to time .

The way the sampling propagates trough the network is defined by the following policies:

1. Sample nodes uniformly from . This is known as the -th wave. The selected nodes constitute .

2. For each node in Step 1, record the response in and the corresponding reported degree in .

3. For each node in the -th wave, sample uniformly nodes among its neighbours relative to and such that they have not been sampled before. This is known as the -th wave. The indices for these nodes constitute .

4. For each node sampled in Step 3, record the response in , the corresponding reported degree in , and the edge that connected to to construct .

5. Repeat Steps 3 and 4 until the pre-specified sample size has been attained. Interrupt the current wave if necessary.

Observe that Step 1 was set this way for the sake of simplicity, since our interest is on evaluating sampling procedures. The distribution of the starting points can be modified depending on the question at hand. Clearly, the notion of wave encodes the discrete time involved in the sampling process.

Given the sample size , RDS has two tuning parameters: , which denotes the number of referrals; and , which denotes the number of seeds. It is of interest to calibrate by taking into account the type of inference that will be performed and any prior assumptions on the joint distribution of and .

### 2.2 Notion of Non-Ignorability

Let denote the full data and represent the observed data. Let denote the distribution for the full data. A sampling mechanism is ignorable if

 p(I∣Z,η)=p(I∣ZINC,η),

and the parameters for the sampling mechanism () and the full data () are distinct. If a sampling mechanism is ignorable, then the term corresponding to the distribution of can be omitted from the likelihood.

In the case of RDS, the distribution of the sampling mechanism given the full data and the tuning parameters () is given by:

 p(I∣G,η)=1(~d0m)⎛⎜⎝w0∏j1=11(~dj1m)⎡⎢⎣wj1∏j2,j1=11(~dj2,j1m)⋯⎡⎢⎣wjk−1,…,j1∏jk,…,j1=11(~djk,…,j1m)⎤⎥⎦⋯⎤⎥⎦⎞⎟⎠.

Here denotes the number of adjacent nodes to a given vertex (with respect to ) that have not been sampled yet; following Lunagómez and Airoldi (2016), this is called the adjusted degree. denotes the number of recruited individuals by a given node during the previous wave, while represents the number of waves needed to recruit individuals; denotes the maximum number of referrals. Here . Most of the modified versions of RDS discussed in this paper will imply a similar expression for .

Observe that RDS is ignorable when the vector of ’s is fully observed. There are situations, that often arise in practice, that prevent this from happening, for example:

1. The degrees are reported with noise, this is common in Epidemiology, more specifically in HIV studies. Populations such as men that have sex with men tend to round the number of sexual partners they had. In this context the rounding tends to be coarser as the true number of partners gets higher. This phenomenon is known as heaping.

2. The degrees are reported exactly, but the number of neighbours in the network that have not been sampled yet (i.e., the adjusted degree) is unknown.

The methodology we propose in this paper is able to calibrate even when is non-ignorable. This is possible since we adopt a model that takes into account the main sources of uncertainty for dealing with this issue.

### 2.3 A Realistic Model of Respondent-Driven Sampling

As in Lunagómez and Airoldi (2016), we assume a probabilistic model of the form:

 p(G,Y,I,α,γ)=p(G∣α)p(α)p(I∣G,η)p(Y∣G,γ)p(γ). (1)

Here denotes the social network, which is assumed to be a realization of a random graph (statistical network) model with parameter . denotes the sampling mechanism, which is understood as a probabilistic process that propagates through and is determined by a set of policies (the tuning parameters of the design). denotes a response vector, the response is associated to the th node of . The joint distribution of the vector is assumed to be specified in terms of (using a Markov random field formulation) and a parameter which controls the ‘strength’ of the dependence among units. We define as the observed portion of conditional on ; denote by the unobserved portion of the network. We define and in an analogous manner. Here and are, respectively, the priors for and .

To ease the exposition, we assume specific distributions for the factors in Expression 1. We adopt an Erdös-Rényi model (Erdös and Rényi (1960)) for the random graph and a Beta prior for . For the vector of responses , we assumed the Markov Random Field (MRF) implied by the following Boltzmann distribution:

 P(Y=y∣G,γ)∝exp(γ0V0+γ1V1), (2)

where

 V0=N∑i=1y(i),andV1=∑{(i,j)∣A(i,j)=1}y(i)y(j).

Here denotes the adjacency matrix for and . This implies that the conditional distribution of the response of node given the values of all the other responses, and is given by:

 P(Y(i)=y(i)∣Y(−i)=y(−i),G,γ)∝exp⎛⎝γ0y(i)+γ1∑{(i,j)∣A(i,j)=1}y(i)y(j)⎞⎠.

As in M ller et al. (2006), we assume a uniform prior on

 γ∈Γ=[minγ0,maxγ0]×[0,maxγ1].

Let denote the vector that has as -th component the degree of node with respect to . For this paper we consider only the case where degrees are reported exactly, i.e., . Here is partitioned via into and , which are, respectively, the reported degrees and the degrees that would have been reported if we had access to the corresponding nodes via sampling. Observe that, for the case , is a deterministic function of and , therefore it can be included as part of the data without the need of adding an extra factor to Expression 1.

The computation of the posterior for the model given by Expression 1 is performed via Bayesian model averaging (BMA, see Raftery et al. (1996) and Robert (2001), Section 7.4), i.e., is equal to

 ∑wp(GEXC,w,αw∣GINC,ˇDINC,I)∫Θ(GEXC,w)pw(Q∣θw,φw)p(θw∣YINC,G% INC,GEXC,w)dθw. (3)

Here , i.e., the parameters of the dependence structure and the missing response data; let

. The reason we adopted this strategy for computing the posterior is the following: Since RDS is non-ignorable, it is necessary to impute missing data in order to compute the likelihood. Typically, the number of nodes and edges of the unobserved part of the network

is unknown, which turns this problem into one of variable dimension. BMA allows to decompose this problem into stages. The mixing distribution of the BMA is used to determine the nodes and edges to augment to . Conditioning on the imputation for the unobserved part of the network, the problem becomes one of fixed dimension and standard MCMC techniques can be used (in particular, to deal with the updates for the MRF parameters, we used the approach proposed by M ller et al. (2006)). The MCMC algorithms used to compute the mixing distribution and the posterior conditional on the graph are described in detail in Lunagómez and Airoldi (2016). We consider two possible choices for

: the first one corresponds to the problem of estimation,

 pw(Q(i)∣α(i),γ(i)),

where is the mean of , a vector of responses simulated from the predictive distribution implied by Expression 1; the second one corresponds to the problem of prediction,

 pw(Q(i)∣Y(i)EXC,w,Y(i)I),

where is the mean of the vector . Estimation and prediction will be discussed in more detail once the associated loss functions have been introduced.

## 3 Decision Theoretic Analysis of Network Sampling Designs

### 3.1 Old and New RDS-based Designs

In order to calibrate the vector of tuning parameters for RDS, which is denoted by , it is necessary to establish criteria for evaluating and comparing the sampled designs implied by different specifications for . The main objective of this paper is to provide priciple-based tools for comparing sampling designs on networks. To motivate the discussion in the remaining sections, we discuss some examples. All of the examples described in this section can be understood as settings where there is a finite family of designs and where it is of interest to find an optimal according to pre-specified criteria that take into account the type of inference to be performed and prior information on the parameters for the probabilistic model for .

###### Example 3.1

Let RDS be the sampling mechanism that propagates through the network. As in Section 2.1, denote by the number of seeds and regard this quantity as specified. A relevant question in this context is how to calibrate , the number of referrals for a fixed sample size . Let be the set of possible choices for the number of referrals .

###### Example 3.2

Let RDS be the sampling mechanism. As in Section 2.1, denote by the number of seeds and let be the number of referrals. Consider the problem of calibrating for fixed and a pre-specified sample size ,i.e., let be the set of possible choices for the number of seeds .

###### Example 3.3

Let us consider generalisations of the RDS setting. One could question the requirement of making constant across waves. Let be the set of policies that determine the number of referrals. More precisely, let the family of sampling mechanisms be defined by

 fη(x)=Λ[C∙(1W∙x)η],

where is an element of a finite grid , is the maximum permissible value for , is a cap for the number of waves, and

 Λ(z)=min{k∈N:z≤k}.

It is assumed that and the sample size is fixed.

###### Example 3.4

Let us consider a different generalisation of RDS. Think of a design were could only take two values: and , where . As in Example 3.3, one could adopt the convention that as a function of wave is non-decreasing. It seems reasonable to set the sample size as fixed and as a dependent variable. Let be the set of possible sampling mechanisms, more precisely: If the design is adopted, then for the first waves and for the following waves.

###### Example 3.5

For RDS, the number of seeds and the maximal number of referrals can be updated sequentially: First, is specified, based on prior knowledge. The second step consists in optimising based on the data collected on the -th wave; this is done while keeping the sample size fixed. Here .

### 3.2 Decision Theoretic Formulation of the Optimal Design Problem

The framework of Bayesian experimental design (see Lindley (1972) and the review by Chaloner and Verdinelli (1995)) allows the statistician to phrase the problem of specifying features of the experiment (assuming that they are under the control of the practitioner) in terms of Decision Theory. For problems involving social networks, tuning the parameters of the sampling mechanism is a key aspect of the design; that is where the focus of our discussion will be. We first introduce some notation:

1. Let denote the parameters of the statistical model; is an element of a parameter space .

2. We denote by a potential data set, which belongs to a sample space .

3. Let denote the decision or inference, which belongs to an action space .

4. Let represent a specific sampling design, which belongs to a family of designs .

5. The letter represents different loss functions.

The loss function is the component of this formalism that quantifies the quality of an inference , given and data . It is required for to be non-negative, and such that it takes the value zero when the inference is correct (e.g., when the estimated value of is equal to ). Remember that we are assuming a sampling mechanism of the form . As in Section 2.3, partitions the full data into and , which denote, respectively, the observed and unobserved parts of the data. Let be the space of potential

’s allowed by a given sampling design. From a decision theoretic point of view, Bayesian inference is conditional on the data

and it is given by the argument in the action space that minimises

 E(L(a(zINC),Q(θ))∣zINC)=∫ΘL(a(zINC),Q(θ))p(θ∣zINC)dθ. (4)

We adopted this notation to emphasise that the inference is a function of the data and that the object of the inference is a function of either parameters or missing data. According to the formulation proposed by Lindley (1972), the loss associated to a design is given by the average expected loss of the optimal inference over all possible data sets. This is:

 L(η)=∫ZINCmina∈A∫ΘL(a(zINC),Q(θ))p(θ∣zINC)dθp(zINC∣η)dzINC. (5)

Therefore, the optimal design is defined as the element in that minimises Expression 5:

 L(η⋆)=minη∈H∫Z% INCmina∈A∫ΘL(a(zINC),Q(θ))p(θ∣zINC)dθp(zINC∣η)dz%INC. (6)

In the context given by the model discussed in Section 2.3:

 zINC=(YINC,GINC,ˇDINC,I)andθ=(α,γ,YEXC,G%EXC). (7)

Therefore, is the joint posterior for the model parameters and missing data implied by Expression 1. To compute Expression 4, only a slight modification of Expression 3 is needed, more precisely:

 E(L(a(zINC),Q(θ))∣zINC)=∑wp(Mw∣zINC)∫Θ(Mw)L(a(zINC),Q(θw))p(θw∣zINC)dθw. (8)

Within this context, is the prior predictive distribution of implied by the model given in Expression 1. It is computed by performing the following steps:

1. Generate a sample from the distribution .

2. Given , simulate and let

 zINC=(YINC,GINC,ˇDINC,I).

We will focus on the case where the inference of interest is estimation. This implies . For this problem two loss functions are particularly relevant: the quadratic loss

 L(a(zINC),Q(θ))=(a(zINC)−Q(θ))2,

for which the optimal decision is given by the posterior mean, and the multilinear loss

 L(a(zINC),Q(θ))={k2(Q(θ)−a(zINC)) if θ>a,k1(a(zINC)−Q(θ)) otherwise. (9)

For this loss, the optimal decision is the fractile of the posterior (see Section 2.5 of Robert (2001)).

Sometimes the object of the inference is prediction. In this paper we discuss two approaches for dealing with this problem from the Decision Theory perspective: the first approach is to adopt a loss function that encodes the gain of information for the predictive distribution of the quantity of interest due to sampling. Such gain of information is measured via the Kullback-Liebler divergence between the prior and posterior predictive distributions of , i.e.,

 L(η)=∑zINC∈ZINC−∑qlog(p(Q(zAUG,zINC)=q∣zINC)p(Q(z)=q))p(Q(zAUG,zINC)=q∣zINC)p(zINC∣η),

where denotes an imputed value for

obtained from the posterior predictive distribution. The second approach for evaluating a design in terms of prediction is to use what is called an

intrinsic loss, which measures the distance between the predictive distribution of the quantity of interest given the parameter with respect to the distribution implied by the inference . For the sake of concreteness we will use the Hellinger distance:

 L(a(zINC),Q(θ))=12∑q⎛⎜⎝ ⎷p(Q(zAUG,zINC)=q∣zINC,θ)p(Q(z% AUG,zINC)=q∣zINC,a(zINC))−1⎞⎟⎠2p(Q(zAUG,zINC)=q∣zINC,θ).

### 3.3 Bayesian Experimental Design via Dynamic Programming

Lindley’s Formulation is a particular case of two-stage finite decision problem (Bellman (1957), Chapter 3 and Parmigiani and Inoue (2009), Section 12.3) since two decisions are taken sequentially: First, a decision regarding the design before any data is observed. After the data has been observed, a decision regarding the specific inference is performed. In this context, it is clear that the first decision imposes constrains on the potential data sets to be observed in the future, and also affects the value of the final Bayesian inference. Dynamic programming allows to generalise Lindley’s Formulation for the case where multiple decisions can be taken during the data collection process, by this it is meant that the practitioner will alternate between observing data and making a decision regarding the design, where such decision will be based on the posterior obtained from the data observed up to that point in time.

The two-stage decision problem is often visualised as a layered tree. The layers encode the temporal sequence involving decisions and data collection, more precisely, if two events (i.e., the nodes of the tree) are connected by an edge, the one located at the layer on the left precedes the one located at the layer on the right. Decisions are usually represented by squares and data collection events (which include the final hypothetical disclosure of the true value of the parameters of the model ) are represented by circles. The labels for the edges of the tree indicate the possible decisions and data sets involved in the decision problem. In this paper we adopt all of these conventions. An example of such decision tree is displayed in Figure 1. Figure 1 also serves to illustrate the fact that Lindley’s Formulation can be understood as a two-stage finite decision problem. The event in Lindley’s Formulation with highest precedence is the decision regarding the tuning parameters of the design , this event is followed by the collection of a potential data set , the next event in this process is the inference , and the last step is given by the hypothetical disclosure of the quantity of interest . Once all the decisions and information are available, the loss corresponding to each leaf of the tree can be computed; such loss is given by .

The algorithm that solves the the two-stage decision problem is called backwards induction; it was proposed by Bellman (1957), Chapter 3. To outline the backwards induction is to rephrase the procedure proposed by Lindley (Expressions 4 - 6). The two-stage decision problem can be easely generalised (at least conceptually) to multi-stage decision problems. The backwards induction algorithm for multi-stage decision problems is described in the appendix.

## 4 Simulation Study

### 4.1 Design of Simulation Study

As a first step, we outline how Lindley’s Formulation (Lindley (1972) and Chaloner and Verdinelli (1995)) can be implemented via Monte Carlo. Consider the model described in Section 2.3 and a finite family of designs , then:

1. Generate samples from the distribution and let be the realisation of associated to .

2. Given , simulate , for every , and let

 z(k)INC,ηj=(G(k)INC,ηj,Y(k)INC,ηj)

be the observed data entailed by .

3. Compute

 L(k)(ηj)=∫ΘL(^Q(z(k)INC,ηj),Q(θ))p(θ∣z(k)INC,ηj)dθ, (10)

which is a Monte Carlo version of Equation 4. Assume that is the Bayes rule corresponding to (see Section 4.2.1 of Robert (2001)).

4. Compute

 ˆL(ηj)=1KK∑k=1L(k)(ηj), (11)

which is the Monte Carlo version of Equation 5.

5. The optimal design is given by

 η⋆=argminη∈HˆL(η). (12)

Let be the optimal design and let be the design calibrated according to convention, i.e., the design such that the tuning parameters are set as the default values used by practitioners in publications. Our first series of simulation studies will be based in the following scheme:

1. Generate for .

2. Given , simulate and , and let

 z(k)INC,η∘=(G(k)INC,η∘,Y(k)INC,η∘)andz(k)% INC,η⋆=(G(k)INC,η⋆,Y(k)INC,η⋆).
3. Compute and .

4. Compute

 ˆL(η⋆)=1KK∑k=1L(k)(η⋆)andˆL(η∘)=1KK∑k=1L(k)(η∘). (13)

Different designs will be compared based on descriptive summaries of these quantities.

The objective of the previous simulation scheme is to provide means for illustrating the benefits of our approach in terms of posterior expected loss. Still, it is desirable to understand the average improvement gained by using a Bayesian experimental design framework when is specified. To achieve this, we performed simulations based on the idea of risk. We think that this is the best way to gain understanding of the frequentist properties of our procedure, while keeping the connection with decision theory.

1. For each , sample

 I(k)∣(G(k),η∘)andI(k)∣(G(k),η⋆).
2. Obtain

 z(k)INC,η∘=(G(k)INC,η∘,Y(k)INC,η∘)andz(k)% INC,η⋆=(G(k)INC,η⋆,Y(k)INC,η⋆).
3. Compute

 L(k)R(η∘)=L(^Q(z(k)INC,η∘),Q(θ))andL(k)R(η⋆)=L(^Q(z(k)INC,η⋆),Q(θ)).
4. Compute

 ˆR(η∘)=1KK∑k=1L(k)R(η∘)andˆR(η⋆)=1KK∑k=1L(k)R(η⋆), (14)

which are Monte Carlo estimates of the frequentist risk.

As in the previous simulation, comparisons between designs will be based on descriptive summaries of these quantities.

### 4.2 Empirical Results

###### Example 4.-4 (continued)

In this example, the number of referrals (denoted by ) is optimised for RDS. We considered the mean squared error and the mean posterior loss as optimisation criteria. The comparison was performed for 3 different choices of prior (Table 1). The number of seeds () was set as 5. In all simulations, we observed that was optimal for the 3 choices of the prior. The differences in terms of posterior loss were clearer for priors corresponding to higher density of the network.

###### Example 4.-3 (continued)

In this example, the number of seeds () was optimised for RDS. We considered the mean squared error and the mean posterior loss as optimisation criteria. The comparison was performed for 3 different choices of prior (Table 2). The number referrals () was set as 3. In all simulations, we observed that more seeds lead to a reduction in the posterior loss, being 15 the optimal in all scenarios.

###### Example 4.-2 (continued)

In this example, the number of referrals () in RDS is allowed to vary in time. This is done by making this number a function of wave; this function is shaped by a parameter . We optimised according to the mean squared error and the mean posterior loss. The comparison was performed for 3 different choices of prior (Table 3). The number of seeds was set as 5. In all simulations, we observed that was optimal for the 3 choices of the prior.

###### Example 4.-1 (continued)

In this example, we allow the number of referrals () in RDS to take one of two values in each wave and once the bigger value is picked, that parameter of the design remains constant. In a sense this is similar to Example 3.3. We computed the same summaries as in Examples 3.1 - 3.2, this is, we first compared the expected loss (Table 4) for different priors. The number of seeds () was set as 5. For two of the simulations we observed that setting the breakpoint equal to 3 was the optimal, for the simulation corresponding to the denser graphs, we observed that 2 for the breakpoint was the optimal.

###### Example 4.0 (continued)

In this example, we optimised the pair . This was done by using the dynamic programming formulation: The first decision involving , then, based on the reported degrees for the seeds, a second decision is made regarding . We computed the same summaries as in Examples 3.1 - 3.4, this is, we first compared the expected loss (Table 4) for different priors. For (?) scenarios, the pair (?) was optimal according to the squared (multilinear, intrinsic) loss.

## 5 Discussion

As far as we know, our methodology is the first one to apply tools from Bayesian experimental design to the problem of sampling on a social network. This allows a systematic calibration of the tuning parameters of different designs, given a full probability model for all sources of uncertainty and priors. The sources of uncertainty we considered were: the unobserved part of the network, the variability regarding the sampling mechanism and the uncertainty associated with the parameters of the model.

The challenges for this problem were, mainly: to incorporate all relevant sources of uncertainty, to deal with non-ignorable sampling designs and to write computer code that could be run in parallel. The first two challenges were addressed using the framework proposed in Lunagómez and Airoldi (2016).

As the results show, the choices of tuning parameters matter, in the sense that there can lead to substantial differences in the loss or MSE. It was interesting to observe that the relationship between the value of the tuning parameter and the MSE can be non-trivial (i.e., non-monotone).

Not surprisingly, the results sensitive to prior. In this case prior encodes the density of the network, since an Erdös-Rényi model was assumed. For networks with lower density, choices of the tuning parameter tend to have less impact on the expected loss (or MSE) when compared with networks with higher density.

Future work includes: To perform simulation studies exploring different random graph models apart from Erdös-Rényi, develop methodology for sequential problems, where learning from the network topology will assist the sampling, and propose adaptive designs, where not only learning from the topology, but learning from the responses can inform future decisions regarding the sampling. We also plan to apply this methodology for the case where degrees are observed (for the sampled nodes) with noise. This would involve dealing with issues as non-ignorable coarsening.

## Appendix 1

We present the backward induction algorithm for multi-stage decision problems. The original formulation can be found in Chapter 3 of Bellman (1957). First, we establish some additional notation: Let be a possible history, i.e., a sequence of decisions and observations that constitute a path from the root of the decision tree to a node associated to a decision at stage . Let the data required to augment to .

I. For stage S

1. Compute the loss function for all the leaves of the decision tree. Each leaf is associated to an inference and a value for ; the corresponding loss is given by .

2. Compute the expected loss for each . The expectation is taken with respect to the posterior for given ; this is

 E(L(a(zφ[S,i]),Q(θ))∣zφ[S,i])=∫ΘL(a(zφ[S,i]),Q(θ))p(θ∣zφ[S,i])dθ.
3. Compute the optimal decision associated to ; this is given by

II. For stage

1. Compute the value of the loss function associated to each pair of the form . This is given by

 L(a(zφ[s−1,i]),zφ[s,j]φ[s−1,i])=E(L(a⋆(zφ[s,j]))∣zφ[s,j]).
2. Compute the expected loss for each . The expectation is taken with respect to the predictive distribution for given ; this is

 E(L(a(zφ[s−1,i]),zφ[s,j]φ[s−1,i])∣zφ[s−1,i])=∫ΘL(a(zφ[s−1,i]),zφ[s,j]φ[s−1,i])p(zφ[s,j]φ[s−1,i]∣zφ[s−1,i])dθ.
3. Compute the optimal decision associated to ; this is given by

 a⋆(zφ[s−1,i])=argmina∈AE(L(a(zφ[s−1,i]),zφ[s,j]φ[s−1,i])∣zφ[s−1,i]).
4. Move to stage , or stop if .

## References

• Andrieu and Roberts [2009] Christophe Andrieu and Gareth O. Roberts. The pseudo-marginal approach for efficient monte carlo computations. The Annals of Statistics, 37(2):697–725, 2009.
• Bellman [1957] R.E. Bellman. Dynamic Programming. Princeton University Press, 1957.
• Blitzstein and Nesterko [2012] Joseph Blitzstein and Sergiy Nesterko.

Bias-variance and breath-depth tradeoffs in respondent-driven sampling.

2012.
• Chaloner and Verdinelli [1995] Kathryn Chaloner and Isabella Verdinelli. Bayesian experimental design: A review. Statistical Science, 10:273–304, 1995.
• Erdös and Rényi [1960] Paul Erdös and A. Rényi. The evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Kolz, 5:17–61, 1960.
• Heckathorn [1997] Douglas D. Heckathorn. Respondent-driven sampling: A new approach to the study of hidden populations. Social Problems, 44:174–199, 1997.
• Lindley [1972] D.B. Lindley. Bayesian Statistics: A Review. SIAM, 1972.
• Lunagómez and Airoldi [2016] S. Lunagómez and E.M. Airoldi. Valid inference from non-ignorable network sampling designs. 2016.
• M ller et al. [2006] J. M ller, A.N. Pettitt, R. Reeves, and K.K. Berthelsen. An efficient markov chain monte carlo method for distributions with intractable normalising constants. Biometrika, 93(2):451–458, 2006.
• Parmigiani and Inoue [2009] Giovanni Parmigiani and Lurdes Inoue. Decision Theory: Principles and Approaches. Wiley, 2009.
• Raftery et al. [1996] A. Raftery, D. Madigan, and C. Volinsky. Accounting for model uncertainty in survival analysis improves predictive performance (with discussion). volume 5 of Bayesian Statistics, pages 323–349. Oxford University Press, 1996.
• Robert [2001] Christian P. Robert. The Bayesian Choice, Second Edition. Springer-Verlag, 2001.