1 Introduction
Respondent driven sampling (RDS) is a widely used sampling mechanism that takes advantage of social network structure. It was proposed by Heckathorn (1997) and it is implemented with the aim of sampling from hidden populations, as it happens in problems from Epidemiology and Marketing. The distribution of RDS is specified by the sample size and two tuning parameters; these parameters are: the number of maximum referrals per individual and the number of starting points (also known as seeds
). If we assume that prior information is available regarding the joint distribution of social network structure and the distribution of the responses (
i.e., observations at the node level), then it is of interest (from a methodological and a practical point of view) how such information could be used in to calibrate the RDS tuning parameters. Note that, for this question to make sense, it is necessary to specify criteria for evaluating the performance of different sampling mechanisms.We argue that Decision Theory, and more specifically, Lindley’s formulation of Bayesian experimental design (Lindley (1972)
), is the appropriate formalism for this task. Decision theory allows us to evaluate a design based on the average quality of the inference, here the ‘quality of the inference’ is encoded in the loss function and the average is taken with respect to the prior predictive distribution.
In this paper, we illustrate the use of Decision Theory in calibrating the tuning parameters of sampling mechanisms on networks. As a byproduct, we provide intuition about the usefulness of more general versions of RDS mechanisms.
1.1 Related work
Our work is related to the work of Chaloner and Verdinelli (1995), since, like them, we discuss how to implement ideas from Bayesian experimental design to solve applied problems. We adopt the formulation of Bayesian experimental design proposed by Lindley (1972) and the framework for sequential decision making proposed by Bellman (1957). In the context of social network data and, more specifically, the performance of RDS, our work relates to the simulation studies performed by Blitzstein and Nesterko (2012). To incorporate the different sources of uncertainty, we rely on the approach developed by Lunagómez and Airoldi (2016)
and on the Markov Chain Monte Carlo approaches developed by
M ller et al. (2006) and Andrieu and Roberts (2009).1.2 Contributions
The main contributions of this paper are: First, we cast the most common problems that involve comparisons between sampling mechanisms on social networks into a Bayesian decision theory framework, thus providing a principled approach, for example, to calibrate tuning parameters of existing mechanisms and to evaluate new mechanisms; second, we discuss the process of finding optimal designs when decisions and data appear sequentially, we frame this discussion in terms of the theory of backwards induction; third, for the proposed approach, the calibration takes into account the model specification (this encompasses the priors), the inference, and the loss function that encodes the criteria or metric needed to evaluate the quality of the inference; fourth, we propose a new network sampling mechanism and carry out comparative performance evaluation.
2 Problem Setup
2.1 RespondentDriven Sampling
Let be a social network with nodes and assume those nodes are labeled. Let
denote a vector that has as
th component (denoted by ) a measurement to be taken at the th node, here . The objective is to perform inference on a feature of the joint distribution of(for instance: the average probability that
for binary with possible values in , ). The network and the responses , , are accessible only trough sampling. RespondentDriven Sampling (RDS) is a procedure, proposed by Heckathorn (1997), that deals with this problem. It is defined as a set of policies that allow the sampling to propagate trough the network, conditional on a set of starting points or seeds.RespondentDriven Sampling can be understood as a stochastic process on discrete time that is conditional on the underlying network and has as state space ; where denotes the labels of nodes recruited at time ; denotes the subgraph of implied by the nodes recruited up to time and the edges that encode the information about which node from time recruited which node at time , where ; and denote, respectively, the observed responses and the reported degrees up to time .
The way the sampling propagates trough the network is defined by the following policies:

Sample nodes uniformly from . This is known as the th wave. The selected nodes constitute .

For each node in Step 1, record the response in and the corresponding reported degree in .

For each node in the th wave, sample uniformly nodes among its neighbours relative to and such that they have not been sampled before. This is known as the th wave. The indices for these nodes constitute .

For each node sampled in Step 3, record the response in , the corresponding reported degree in , and the edge that connected to to construct .

Repeat Steps 3 and 4 until the prespecified sample size has been attained. Interrupt the current wave if necessary.
Observe that Step 1 was set this way for the sake of simplicity, since our interest is on evaluating sampling procedures. The distribution of the starting points can be modified depending on the question at hand. Clearly, the notion of wave encodes the discrete time involved in the sampling process.
Given the sample size , RDS has two tuning parameters: , which denotes the number of referrals; and , which denotes the number of seeds. It is of interest to calibrate by taking into account the type of inference that will be performed and any prior assumptions on the joint distribution of and .
2.2 Notion of NonIgnorability
Let denote the full data and represent the observed data. Let denote the distribution for the full data. A sampling mechanism is ignorable if
and the parameters for the sampling mechanism () and the full data () are distinct. If a sampling mechanism is ignorable, then the term corresponding to the distribution of can be omitted from the likelihood.
In the case of RDS, the distribution of the sampling mechanism given the full data and the tuning parameters () is given by:
Here denotes the number of adjacent nodes to a given vertex (with respect to ) that have not been sampled yet; following Lunagómez and Airoldi (2016), this is called the adjusted degree. denotes the number of recruited individuals by a given node during the previous wave, while represents the number of waves needed to recruit individuals; denotes the maximum number of referrals. Here . Most of the modified versions of RDS discussed in this paper will imply a similar expression for .
Observe that RDS is ignorable when the vector of ’s is fully observed. There are situations, that often arise in practice, that prevent this from happening, for example:

The degrees are reported with noise, this is common in Epidemiology, more specifically in HIV studies. Populations such as men that have sex with men tend to round the number of sexual partners they had. In this context the rounding tends to be coarser as the true number of partners gets higher. This phenomenon is known as heaping.

The degrees are reported exactly, but the number of neighbours in the network that have not been sampled yet (i.e., the adjusted degree) is unknown.
The methodology we propose in this paper is able to calibrate even when is nonignorable. This is possible since we adopt a model that takes into account the main sources of uncertainty for dealing with this issue.
2.3 A Realistic Model of RespondentDriven Sampling
As in Lunagómez and Airoldi (2016), we assume a probabilistic model of the form:
(1) 
Here denotes the social network, which is assumed to be a realization of a random graph (statistical network) model with parameter . denotes the sampling mechanism, which is understood as a probabilistic process that propagates through and is determined by a set of policies (the tuning parameters of the design). denotes a response vector, the response is associated to the th node of . The joint distribution of the vector is assumed to be specified in terms of (using a Markov random field formulation) and a parameter which controls the ‘strength’ of the dependence among units. We define as the observed portion of conditional on ; denote by the unobserved portion of the network. We define and in an analogous manner. Here and are, respectively, the priors for and .
To ease the exposition, we assume specific distributions for the factors in Expression 1. We adopt an ErdösRényi model (Erdös and Rényi (1960)) for the random graph and a Beta prior for . For the vector of responses , we assumed the Markov Random Field (MRF) implied by the following Boltzmann distribution:
(2) 
where
Here denotes the adjacency matrix for and . This implies that the conditional distribution of the response of node given the values of all the other responses, and is given by:
As in M ller et al. (2006), we assume a uniform prior on
Let denote the vector that has as th component the degree of node with respect to . For this paper we consider only the case where degrees are reported exactly, i.e., . Here is partitioned via into and , which are, respectively, the reported degrees and the degrees that would have been reported if we had access to the corresponding nodes via sampling. Observe that, for the case , is a deterministic function of and , therefore it can be included as part of the data without the need of adding an extra factor to Expression 1.
The computation of the posterior for the model given by Expression 1 is performed via Bayesian model averaging (BMA, see Raftery et al. (1996) and Robert (2001), Section 7.4), i.e., is equal to
(3) 
Here , i.e., the parameters of the dependence structure and the missing response data; let
. The reason we adopted this strategy for computing the posterior is the following: Since RDS is nonignorable, it is necessary to impute missing data in order to compute the likelihood. Typically, the number of nodes and edges of the unobserved part of the network
is unknown, which turns this problem into one of variable dimension. BMA allows to decompose this problem into stages. The mixing distribution of the BMA is used to determine the nodes and edges to augment to . Conditioning on the imputation for the unobserved part of the network, the problem becomes one of fixed dimension and standard MCMC techniques can be used (in particular, to deal with the updates for the MRF parameters, we used the approach proposed by M ller et al. (2006)). The MCMC algorithms used to compute the mixing distribution and the posterior conditional on the graph are described in detail in Lunagómez and Airoldi (2016). We consider two possible choices for: the first one corresponds to the problem of estimation,
where is the mean of , a vector of responses simulated from the predictive distribution implied by Expression 1; the second one corresponds to the problem of prediction,
where is the mean of the vector . Estimation and prediction will be discussed in more detail once the associated loss functions have been introduced.
3 Decision Theoretic Analysis of Network Sampling Designs
3.1 Old and New RDSbased Designs
In order to calibrate the vector of tuning parameters for RDS, which is denoted by , it is necessary to establish criteria for evaluating and comparing the sampled designs implied by different specifications for . The main objective of this paper is to provide priciplebased tools for comparing sampling designs on networks. To motivate the discussion in the remaining sections, we discuss some examples. All of the examples described in this section can be understood as settings where there is a finite family of designs and where it is of interest to find an optimal according to prespecified criteria that take into account the type of inference to be performed and prior information on the parameters for the probabilistic model for .
Example 3.1
Let RDS be the sampling mechanism that propagates through the network. As in Section 2.1, denote by the number of seeds and regard this quantity as specified. A relevant question in this context is how to calibrate , the number of referrals for a fixed sample size . Let be the set of possible choices for the number of referrals .
Example 3.2
Let RDS be the sampling mechanism. As in Section 2.1, denote by the number of seeds and let be the number of referrals. Consider the problem of calibrating for fixed and a prespecified sample size ,i.e., let be the set of possible choices for the number of seeds .
Example 3.3
Let us consider generalisations of the RDS setting. One could question the requirement of making constant across waves. Let be the set of policies that determine the number of referrals. More precisely, let the family of sampling mechanisms be defined by
where is an element of a finite grid , is the maximum permissible value for , is a cap for the number of waves, and
It is assumed that and the sample size is fixed.
Example 3.4
Let us consider a different generalisation of RDS. Think of a design were could only take two values: and , where . As in Example 3.3, one could adopt the convention that as a function of wave is nondecreasing. It seems reasonable to set the sample size as fixed and as a dependent variable. Let be the set of possible sampling mechanisms, more precisely: If the design is adopted, then for the first waves and for the following waves.
Example 3.5
For RDS, the number of seeds and the maximal number of referrals can be updated sequentially: First, is specified, based on prior knowledge. The second step consists in optimising based on the data collected on the th wave; this is done while keeping the sample size fixed. Here .
3.2 Decision Theoretic Formulation of the Optimal Design Problem
The framework of Bayesian experimental design (see Lindley (1972) and the review by Chaloner and Verdinelli (1995)) allows the statistician to phrase the problem of specifying features of the experiment (assuming that they are under the control of the practitioner) in terms of Decision Theory. For problems involving social networks, tuning the parameters of the sampling mechanism is a key aspect of the design; that is where the focus of our discussion will be. We first introduce some notation:

Let denote the parameters of the statistical model; is an element of a parameter space .

We denote by a potential data set, which belongs to a sample space .

Let denote the decision or inference, which belongs to an action space .

Let represent a specific sampling design, which belongs to a family of designs .

The letter represents different loss functions.
The loss function is the component of this formalism that quantifies the quality of an inference , given and data . It is required for to be nonnegative, and such that it takes the value zero when the inference is correct (e.g., when the estimated value of is equal to ). Remember that we are assuming a sampling mechanism of the form . As in Section 2.3, partitions the full data into and , which denote, respectively, the observed and unobserved parts of the data. Let be the space of potential
’s allowed by a given sampling design. From a decision theoretic point of view, Bayesian inference is conditional on the data
and it is given by the argument in the action space that minimises(4) 
We adopted this notation to emphasise that the inference is a function of the data and that the object of the inference is a function of either parameters or missing data. According to the formulation proposed by Lindley (1972), the loss associated to a design is given by the average expected loss of the optimal inference over all possible data sets. This is:
(5) 
Therefore, the optimal design is defined as the element in that minimises Expression 5:
(6) 
In the context given by the model discussed in Section 2.3:
(7) 
Therefore, is the joint posterior for the model parameters and missing data implied by Expression 1. To compute Expression 4, only a slight modification of Expression 3 is needed, more precisely:
(8) 
Within this context, is the prior predictive distribution of implied by the model given in Expression 1. It is computed by performing the following steps:

Generate a sample from the distribution .

Given , simulate and let
We will focus on the case where the inference of interest is estimation. This implies . For this problem two loss functions are particularly relevant: the quadratic loss
for which the optimal decision is given by the posterior mean, and the multilinear loss
(9) 
For this loss, the optimal decision is the fractile of the posterior (see Section 2.5 of Robert (2001)).
Sometimes the object of the inference is prediction. In this paper we discuss two approaches for dealing with this problem from the Decision Theory perspective: the first approach is to adopt a loss function that encodes the gain of information for the predictive distribution of the quantity of interest due to sampling. Such gain of information is measured via the KullbackLiebler divergence between the prior and posterior predictive distributions of , i.e.,
where denotes an imputed value for
obtained from the posterior predictive distribution. The second approach for evaluating a design in terms of prediction is to use what is called an
intrinsic loss, which measures the distance between the predictive distribution of the quantity of interest given the parameter with respect to the distribution implied by the inference . For the sake of concreteness we will use the Hellinger distance:3.3 Bayesian Experimental Design via Dynamic Programming
Lindley’s Formulation is a particular case of twostage finite decision problem (Bellman (1957), Chapter 3 and Parmigiani and Inoue (2009), Section 12.3) since two decisions are taken sequentially: First, a decision regarding the design before any data is observed. After the data has been observed, a decision regarding the specific inference is performed. In this context, it is clear that the first decision imposes constrains on the potential data sets to be observed in the future, and also affects the value of the final Bayesian inference. Dynamic programming allows to generalise Lindley’s Formulation for the case where multiple decisions can be taken during the data collection process, by this it is meant that the practitioner will alternate between observing data and making a decision regarding the design, where such decision will be based on the posterior obtained from the data observed up to that point in time.
The twostage decision problem is often visualised as a layered tree. The layers encode the temporal sequence involving decisions and data collection, more precisely, if two events (i.e., the nodes of the tree) are connected by an edge, the one located at the layer on the left precedes the one located at the layer on the right. Decisions are usually represented by squares and data collection events (which include the final hypothetical disclosure of the true value of the parameters of the model ) are represented by circles. The labels for the edges of the tree indicate the possible decisions and data sets involved in the decision problem. In this paper we adopt all of these conventions. An example of such decision tree is displayed in Figure 1. Figure 1 also serves to illustrate the fact that Lindley’s Formulation can be understood as a twostage finite decision problem. The event in Lindley’s Formulation with highest precedence is the decision regarding the tuning parameters of the design , this event is followed by the collection of a potential data set , the next event in this process is the inference , and the last step is given by the hypothetical disclosure of the quantity of interest . Once all the decisions and information are available, the loss corresponding to each leaf of the tree can be computed; such loss is given by .
The algorithm that solves the the twostage decision problem is called backwards induction; it was proposed by Bellman (1957), Chapter 3. To outline the backwards induction is to rephrase the procedure proposed by Lindley (Expressions 4  6). The twostage decision problem can be easely generalised (at least conceptually) to multistage decision problems. The backwards induction algorithm for multistage decision problems is described in the appendix.
4 Simulation Study
4.1 Design of Simulation Study
As a first step, we outline how Lindley’s Formulation (Lindley (1972) and Chaloner and Verdinelli (1995)) can be implemented via Monte Carlo. Consider the model described in Section 2.3 and a finite family of designs , then:

Generate samples from the distribution and let be the realisation of associated to .

Given , simulate , for every , and let
be the observed data entailed by .

The optimal design is given by
(12)
Let be the optimal design and let be the design calibrated according to convention, i.e., the design such that the tuning parameters are set as the default values used by practitioners in publications. Our first series of simulation studies will be based in the following scheme:

Generate for .

Given , simulate and , and let

Compute and .

Compute
(13)
Different designs will be compared based on descriptive summaries of these quantities.
The objective of the previous simulation scheme is to provide means for illustrating the benefits of our approach in terms of posterior expected loss. Still, it is desirable to understand the average improvement gained by using a Bayesian experimental design framework when is specified. To achieve this, we performed simulations based on the idea of risk. We think that this is the best way to gain understanding of the frequentist properties of our procedure, while keeping the connection with decision theory.

For each , sample

Obtain

Compute

Compute
(14) which are Monte Carlo estimates of the frequentist risk.
As in the previous simulation, comparisons between designs will be based on descriptive summaries of these quantities.
4.2 Empirical Results
Example 4.4 (continued)
In this example, the number of referrals (denoted by ) is optimised for RDS. We considered the mean squared error and the mean posterior loss as optimisation criteria. The comparison was performed for 3 different choices of prior (Table 1). The number of seeds () was set as 5. In all simulations, we observed that was optimal for the 3 choices of the prior. The differences in terms of posterior loss were clearer for priors corresponding to higher density of the network.
Size  Density  

2  3.387 0.0317  1.382 0.0734  
3  3.391 0.0288  1.385 0.0858  
4  3.385 0.0285  1.381 0.0663  
5  3.376 0.0300  1.378 0.0727  
6  3.376 0.0295  1.378 0.0661  
2  3.306 0.0184  1.385 0.0790  
3  3.321 0.0213  1.415 0.0853  
4  3.318 0.0210  1.404 0.0852  
5  3.311 0.0197  1.392 0.0732  
6  3.304 0.0185  1.383 0.0624 
Example 4.3 (continued)
In this example, the number of seeds () was optimised for RDS. We considered the mean squared error and the mean posterior loss as optimisation criteria. The comparison was performed for 3 different choices of prior (Table 2). The number referrals () was set as 3. In all simulations, we observed that more seeds lead to a reduction in the posterior loss, being 15 the optimal in all scenarios.
Size  Density  Number of seeds  

5  3.351 0.0337  2.3279 0.0196  
7  3.384 0.0347  2.4472 0.0342  
10  3.396 0.0338  2.6265 0.0434  
12  3.386 0.0377  2.7616 0.0206  
15  4.005 0.0370  2.9208 0.0290  
5  3.775 0.0334  2.7958 0.0333  
7  3.832 0.0371  2.8538 0.0430  
10  3.911 0.0353  2.9586 0.0347  
12  3.951 0.0381  2.9586 0.0370  
15  4.096 0.0329  3.0457 0.0387  
5  3.865 0.0396  2.8860 0.0224  
7  3.942 0.0388  2.9208 0.0426  
10  4.385 0.0359  3.0457 0.0237  
12  4.592 0.0344  3.0457 0.0248  
15  4.783 0.0376  3.0969 0.0207 
Example 4.2 (continued)
In this example, the number of referrals () in RDS is allowed to vary in time. This is done by making this number a function of wave; this function is shaped by a parameter . We optimised according to the mean squared error and the mean posterior loss. The comparison was performed for 3 different choices of prior (Table 3). The number of seeds was set as 5. In all simulations, we observed that was optimal for the 3 choices of the prior.
Size  Density  Exponent  

2  3.036 0.0281  3.148 0.0450  
1.5  3.187 0.0287  3.221 0.0382  
1  3.022 0.0275  2.920 0.0457  
0.5  3.070 0.0250  3.130 0.0362  
2.920 0.0283  3.154 0.0397  
2  3.187 0.0288  2.958 0.0364  
1.5  3.327 0.0283  3.096 0.0372  
1  3.251 0.0284  2.886 0.0473  
0.5  3.236 0.0284  2.795 0.0455  
3.214 0.0277  2.823 0.0402  
2  3.187 0.0282  2.920 0.0443  
1.5  3.327 0.0286  3.000 0.0465  
1  3.251 0.0268  2.853 0.0376  
0.5  3.236 0.0264  2.853 0.0432  
3.214 0.0251  2.823 0.0437 
Example 4.1 (continued)
In this example, we allow the number of referrals () in RDS to take one of two values in each wave and once the bigger value is picked, that parameter of the design remains constant. In a sense this is similar to Example 3.3. We computed the same summaries as in Examples 3.1  3.2, this is, we first compared the expected loss (Table 4) for different priors. The number of seeds () was set as 5. For two of the simulations we observed that setting the breakpoint equal to 3 was the optimal, for the simulation corresponding to the denser graphs, we observed that 2 for the breakpoint was the optimal.
Size  Density  Change Point  

1  3.376 0.0400  3.3665 0.0441  
2  3.387 0.0360  3.4466 0.0370  
3  3.396 0.0377  3.8206 0.0472  
4  3.376 0.0375  3.5461 0.0409  
5  3.366 0.0392  3.4420 0.0401  
1  3.366 0.0393  2.7212 0.0223  
2  3.408 0.0366  2.8239 0.0239  
3  3.431 0.0380  2.9586 0.0192  
4  3.387 0.0378  2.8860 0.0224  
5  3.376 0.0386  2.7695 0.0195  
1  3.382 0.0387  2.8239 0.0212  
2  3.422 0.0396  2.9586 0.0218  
3  3.545 0.0369  2.9208 0.0201  
4  3.392 0.0389  2.8860 0.0198  
5  3.381 0.0382  2.8239 0.0199 
Example 4.0 (continued)
In this example, we optimised the pair . This was done by using the dynamic programming formulation: The first decision involving , then, based on the reported degrees for the seeds, a second decision is made regarding . We computed the same summaries as in Examples 3.1  3.4, this is, we first compared the expected loss (Table 4) for different priors. For (?) scenarios, the pair (?) was optimal according to the squared (multilinear, intrinsic) loss.
Size  Density  

5 Discussion
As far as we know, our methodology is the first one to apply tools from Bayesian experimental design to the problem of sampling on a social network. This allows a systematic calibration of the tuning parameters of different designs, given a full probability model for all sources of uncertainty and priors. The sources of uncertainty we considered were: the unobserved part of the network, the variability regarding the sampling mechanism and the uncertainty associated with the parameters of the model.
The challenges for this problem were, mainly: to incorporate all relevant sources of uncertainty, to deal with nonignorable sampling designs and to write computer code that could be run in parallel. The first two challenges were addressed using the framework proposed in Lunagómez and Airoldi (2016).
As the results show, the choices of tuning parameters matter, in the sense that there can lead to substantial differences in the loss or MSE. It was interesting to observe that the relationship between the value of the tuning parameter and the MSE can be nontrivial (i.e., nonmonotone).
Not surprisingly, the results sensitive to prior. In this case prior encodes the density of the network, since an ErdösRényi model was assumed. For networks with lower density, choices of the tuning parameter tend to have less impact on the expected loss (or MSE) when compared with networks with higher density.
Future work includes: To perform simulation studies exploring different random graph models apart from ErdösRényi, develop methodology for sequential problems, where learning from the network topology will assist the sampling, and propose adaptive designs, where not only learning from the topology, but learning from the responses can inform future decisions regarding the sampling. We also plan to apply this methodology for the case where degrees are observed (for the sampled nodes) with noise. This would involve dealing with issues as nonignorable coarsening.
Acknowledgement
Appendix 1
We present the backward induction algorithm for multistage decision problems. The original formulation can be found in Chapter 3 of Bellman (1957). First, we establish some additional notation: Let be a possible history, i.e., a sequence of decisions and observations that constitute a path from the root of the decision tree to a node associated to a decision at stage . Let the data required to augment to .
I. For stage S

Compute the loss function for all the leaves of the decision tree. Each leaf is associated to an inference and a value for ; the corresponding loss is given by .

Compute the expected loss for each . The expectation is taken with respect to the posterior for given ; this is

Compute the optimal decision associated to ; this is given by
II. For stage

Compute the value of the loss function associated to each pair of the form . This is given by

Compute the expected loss for each . The expectation is taken with respect to the predictive distribution for given ; this is

Compute the optimal decision associated to ; this is given by

Move to stage , or stop if .
References
 Andrieu and Roberts [2009] Christophe Andrieu and Gareth O. Roberts. The pseudomarginal approach for efficient monte carlo computations. The Annals of Statistics, 37(2):697–725, 2009.
 Bellman [1957] R.E. Bellman. Dynamic Programming. Princeton University Press, 1957.

Blitzstein and Nesterko [2012]
Joseph Blitzstein and Sergiy Nesterko.
Biasvariance and breathdepth tradeoffs in respondentdriven sampling.
2012.  Chaloner and Verdinelli [1995] Kathryn Chaloner and Isabella Verdinelli. Bayesian experimental design: A review. Statistical Science, 10:273–304, 1995.
 Erdös and Rényi [1960] Paul Erdös and A. Rényi. The evolution of random graphs. Magyar Tud. Akad. Mat. Kutato Int. Kolz, 5:17–61, 1960.
 Heckathorn [1997] Douglas D. Heckathorn. Respondentdriven sampling: A new approach to the study of hidden populations. Social Problems, 44:174–199, 1997.
 Lindley [1972] D.B. Lindley. Bayesian Statistics: A Review. SIAM, 1972.
 Lunagómez and Airoldi [2016] S. Lunagómez and E.M. Airoldi. Valid inference from nonignorable network sampling designs. 2016.
 M ller et al. [2006] J. M ller, A.N. Pettitt, R. Reeves, and K.K. Berthelsen. An efficient markov chain monte carlo method for distributions with intractable normalising constants. Biometrika, 93(2):451–458, 2006.
 Parmigiani and Inoue [2009] Giovanni Parmigiani and Lurdes Inoue. Decision Theory: Principles and Approaches. Wiley, 2009.
 Raftery et al. [1996] A. Raftery, D. Madigan, and C. Volinsky. Accounting for model uncertainty in survival analysis improves predictive performance (with discussion). volume 5 of Bayesian Statistics, pages 323–349. Oxford University Press, 1996.
 Robert [2001] Christian P. Robert. The Bayesian Choice, Second Edition. SpringerVerlag, 2001.
Comments
There are no comments yet.