1 Introduction
Inferring species distribution and demography are two key questions in ecology (Begon et al., 2006)
. However, addressing these questions is challenging when studying animals and plants in their natural environmental because of the inherent imperfection in the detection process of species or individuals. When ignored, this issue of imperfect detection can lead to biased estimates of species distribution or survival
(Gimenez et al., 2008; GuilleraArroita, 2017). To cope with imperfect detection, occupancy models (MacKenzie et al., 2018) and capturerecapture models (McCrea and Morgan, 2014)were developed to provide unbiased estimates of species range and individual survival, with numerous applications in all fields of ecology.
As with any statistical model, inferences made from occupancy and capturerecapture analyses rely on assumptions which must be satisfied at least to a reasonable degree. Common to occupancy and capturerecapture models is the assumption of homogeneity of the detection process, which asserts that there is no unmodelled heterogeneity in species and individual detection probabilities. If ignored, heterogeneity in detection leads to flawed inference in occupancy and capturerecapture models
(Royle, 2006; Gimenez et al., 2018). Detection heterogeneity can be due to heterogeneous sampling effort, variation in animal abundance or behavior, site characteristics or even on account of varying observer skills. Ideally, covariates could be measured and incorporated in ecological models to account for detection heterogeneity. However, unexplained variation may still remain, or measuring the relevant covariates may simply be impossible.When unmodelled heterogeneity exists, it can be accommodated using finite mixtures in which discrete latent variables are used to assign sites or individuals to mixture components (i.e., uniform groups) each characterized by groupspecific parameters (Royle, 2006; Pledger et al., 2010, 2003; Louvrier et al., 2018). In simulation studies, finite mixtures were successful in decreasing bias in occupancy and survival probability estimates that was introduced by heterogeneity in the detection process (Pledger, 2005; Louvrier et al., 2018). However, from a practical perspective, the issue remains of selecting the number of mixture components for reallife data analyses, which is not straightforward (Cubaynes et al., 2012; Pohle et al., 2017).
Here, we propose a Bayesian nonparametric (BNP) approach to modeling heterogeneity in occupancy and capturerecapture models. BNP models provide a flexible approach that relaxes typical standard modeling assumptions, such as the choice of a specific parametric kernel in density estimation, or here the choice of a fixed number of groups in finite mixtures. Previous uses of a BNP approach in ecological models include the modelling of wildlife migration patterns (Matechou and Caron, 2017; Diana et al., n/a), the estimation of population size (ManriqueVallier, 2016; Dorazio et al., 2008) and that of the probability of remaining in or vacating a given area (Ford et al., 2015).
Perhaps the most widely used BNP model is the Dirichlet process mixture (DPM) model (Ferguson, 1973, 1974; Lo, 1984; Escobar, 1994; Escobar and West, 1995)
, which is a mixture model with infinitely many components. DPM models are a suitable fit for addressing the inherent heterogeneity present in ecological models. We consider DPM models of Bernoulli distributions, with the distribution of the Bernoulli detection probability parameters arising from a Dirichlet process. A DPM model can be represented in different, yet equivalent manners. Two of them are the the Chinese Restaurant Process
(CRP; Blackwell and MacQueen, 1973; Pitman, 1995, 1996) and the stickbreaking (SB; Sethuraman, 1994) representations.The flexibility of BNP models is usually translated into a hierarchical model, which relies on Markov chain Monte Carlo algorithms to sample from the resulting posterior distribution. The implementation of these algorithms is usually computationally complex and demanding in standard Bayesian software such as WinBUGS or JAGS (e.g. Ohlssen et al., 2007) and often requires writing specific code to implement the BNP model and sampling algorithms (Ford et al., 2015). Recently, the nimble R package introduced nonparametric functionality to address these difficulties, by supporting the use of nonparametric DPM models. Specialized functions, distributions, and samplers are provided for both the CRP and SB representations. Wehrhahn et al. (2018) contains additional details and examples of using these BNP modelling approaches in nimble.
Here, we focus on the CRP representation to address detection heterogeneity in capturerecapture and occupancy models. The article is organized as follows. In the next section, we present homogeneous capturerecapture and occupancy models, finite mixtures and nonparametric models formulated in a Bayesian framework as hierarchical models. In Section 3, we give the MCMC sampling scheme used to fit nonparametric models and introduce the nimble and nimbleEcology R packages. Section 4 gives the results of two simulation studies, using capturerecapture and occupancy data, which validate the ability of our approach to capture detection heterogeneity in occupancy and survival probabilities. Section 5 illustrates our method using data from a study of gray wolf (canis lupus), in trying to estimate occupancy and survival while accounting for detection heterogeneity. The final section provides general conclusions and discusses the potential of our approach.
2 Models
We consider three different approaches to handling heterogeneity. The first approach uses a homogeneous model, which disregards any heterogeneity and considers all individuals and sites to be identical. The second approach uses finite mixtures of population subgroups, where individuals or sites within each subgroup are identical, distinct population subgroups may differ in one or more characteristics, and the number of subgroups is prespecified. The homogeneous model is a special case of the finite mixture model, where the population contains only a single subgroup.
The third approach uses a nonparametric representation for modeling individual and species heterogeneity, where the number of population subgroups, the assignment of individuals and sites into subgroups, and the characteristics of each subgroup are determined by the data at the time of model fitting. The nonparametric approach does not require prespecifying a fixed number of population subgroups, but rather, the number of population subgroups itself is a model parameter. We now describe the three model formulations in detail, and provide specifications of ecological capturerecapture and occupancy models using each.
2.1 Homogeneous Models
In the homogeneous model, we assume that all observed individuals of the population are identical in their characteristics. There is no heterogeneity between individuals or sites, and thus for any parameter of the population, all individuals or sites share a common value of .
Homogeneous CaptureRecapture Model
We consider a basic ecological capturerecapture model, for binaryvalued observation data of individuals or sites occurring over observational time periods. We condition on the first observation of each individual occurring in time period , although it is straightforward to relax this assumption to allow firstdetections to occur in other time periods. We parameterize the model in terms of survival probability between time periods , and probability of detection conditional on being alive .
In subsequent models we introduce heterogeneity in , but in the homogeneous model all individuals are characterized by the constant probability of detection . We use a statespace formulation of the model, where binary states give the alive/dead status of individual at time , and using binary observation data for site at time . The homogeneous capturerecapture model is written:
Homogeneous Occupancy Model
We consider a homogeneous static occupancy model for a total of sites, each observed at distinct sampling occasions. We parameterize the model in terms of the constant site occupancy probability , and probability of detection conditional on site occupancy . In subsequent models we introduce heterogeneity in , but in the homogeneous model all sites share the probability of detection . We use a statespace formulation of the model, where binary states give the true occupancy status of site , and using binary observation data for the observation of site at sampling occasion . The homogeneous occupancy model is written:
2.2 Finite Mixture Models
In a finite mixture model specified as having distinct population subgroups, each individual or site is considered to be a member of any of the subgroups with equal probability . We introduce a discrete indicator variable for each individual or site, for individual or site , where denotes the “group” of individual or site . We use independent discrete uniform prior distributions over the set for each , and indicates that individual or site is a member of population subgroup .
Furthermore, each of the distinct subgroups may differ in one or more demographic characteristics. Again considering the demographic parameter , we introduce model parameters , which are given independent and identical prior distributions. Then, all individuals or sites belonging to population subgroup display . To avoid issues of “label exchanging” between groups, and hence a lack of model identifiability, we also impose the constraint that , or that the ordered set of parameters is nondecreasing.
Finite Mixture CaptureRecapture Model
We generalize the homogeneous capturerecapture model given in section 2.1 to a group finite mixture model, to allow heterogeneity in the probability of detection. Specifically, we introduce new model parameters , where represents the probability of detection for individuals in subgroup . Each is given an independent distribution, and we impose the constraint on these parameters that for to maintain identifiability.
Since gives the subgroup number which contains individual , we use probability of detection for individual . Putting this together, the group finite mixture capturerecapture model is written as:
Finite Mixture Occupancy Model
We similarly generalize the homogeneous occupancy model from section 2.1 to a group finite mixture model using parameters , where represents the probability of detection for sites in subgroup . We use the same independent prior distribution for each , and impose the same constraint to ensure model identifiability. Thus, indicates the subgroup which contains site , which therefore has probability of detection . The group finite mixture occupancy model is written as:
2.3 NonParametric Models
Using a nonparametric approach, the prespecification of a fixed number of subgroups is no longer required. Furthermore, our previous assumption that individual or site assignments to subgroups must follow a discrete uniform distribution is also relaxed. In a nonparametric model the number of subgroups in the population is considered unknown and is inferred from the data. Theoretically, there could be an infinite number of population subgroups, although in practice there will never exceed
subgroups. This means there cannot exist more population subgroups than the total number of observations. But so as long as there are fewer than subgroups, then individuals or sites can probabilistically move out of an existing group and into to a newly created group, with its own distinct probability of detection.As with the finite mixture model, the subgroup assignment structure is encoded in indicator variables , where indicates that individual belongs to population subgroup . Consider the following conditional distribution for , where :
(1) 
where is an integer not in , is the concentration parameter, and is a discrete measure concentrated at , which translates to a point mass of probability located at . The discrete distribution (1) for the group assignment of individual or site (conditional on the group assignments of individuals or sites ) implies that each successive individual or site is assigned to an existing subgroup with probability proportional to the size of each subgroup, and is assigned to a new subgroup with probability proportional to . The product of the successive conditional distributions given by (1
) gives rise to the joint distribution of
, which is the Chinese restaurant process (CRP) prior distribution with concentration parameter (Blackwell and MacQueen, 1973; Pitman, 1995, 1996). See Li et al. (2019) for more details and interpretations of the CRP prior distribution.The strictly positive concentration parameter of the CRP distribution influences the number of subgroups, through its control over the probability that individuals or sites are assigned into new subgroups. The larger the value of , the more likely new subgroups are to be created. Figure 1 illustrates the effect of on the number of subgroups created from the CRP prior distribution, when . For generally only one or two subgroups are created, and infrequently three or more. When , the number of subgroups created generally falls between one and six. For , we seldom observe only one subgroup, and instead generally have between two and ten groups. In our nonparametric models, rather than fixing
to a specific value, we will use a hyperprior distribution for
to allow the degree of heterogeneity within the dataset itself to dictate the plausible range for .The possibility of the CRP prior using as many population subgroups as the total number of observations requires the inclusion of distinct demographic parameters in our hierarchical nonparametric models. These parameters are assumed to be independent and identically distributed, just as in the finite mixture models.
NonParametric CaptureRecapture Model
In the nonparametric capturerecapture model, detection heterogeneity is flexibly addressed using a CRP prior distribution to allow for an unknown number of population subgroups. The number of population subgroups will thereby be inferred from the data, and is not fixed to a prespecified value as in the finite mixture models described in section 2.2. We assign a prior distribution to
, the vector of subgroup indicator variables. We use a
hyperprior distribution for the CRP concentration parameter .We introduce probabilities of detection , each independent and identically following a prior distribution. In general, fewer than subgroups are actually used by the CRP prior distribution, and thus only a subset of the probabilities of detection are “active” in terms of their influence on the model likelihood calculation. As in the finite mixture models, indicates that individual is a member of subgroup , and therefore has probability of detection . The full nonparametric capturerecapture model is written as:
NonParametric Occupancy Model
Similarly, we generalize the finite mixture occupancy model to use a prior distribution for individual group assignments , and assign a hyperprior distribution for . We include the maximum possible necessary number of distinct probabilities of detection , each with independent prior distributions. Here, indicates that site is a member of subgroup , and therefore has probability of detection . The full nonparametric occupancy model is written as:
3 Model Fitting via Markov chain Monte Carlo
Mathematical descriptions of our approaches to modeling heterogeneity within a population were given in section 2. These models are formulated in a Bayesian framework as hierarchical models. Prior distributions are specified for model parameters, and the conditional distributions for discrete latent states (e.g., dead/alive status) and the data likelihoods are given. The general tool for fitting hierarchical models to data is Markov chain Monte Carlo (MCMC; Brooks et al., 2011), a stochastic sampling algorithm for generating posterior samples for model parameters, conditional on the data. Demographic inferences are subsequently performed using the empirical posterior distributions of model parameters.
Markov chain Monte Carlo is a powerful tool for fitting hierarchical models to data, but the inherent mathematical complexity of MCMC sampling often requires the use of software. We make use of the recently developed nimble R package (NIMBLE Development Team, 2019)
, which offers new degrees of freedom for algorithmic development and customization of MCMC sampling
(de Valpine et al., 2017). The ability to write custom distributions for use in hierarchical models, and the flexible nature of nimble’s MCMC engine has provided noteworthy gains in fitting ecological models (Turek et al., 2016), and more generally in the study of MCMC algorithms (Turek et al., 2017).We make use of nimble’s MCMC engine for fitting these hierarchical models, after expressing them in the BUGS language (Lunn et al., 2009). In addition, we make use of two aspects of nimble’s flexibility. For modeling of individual heterogeneity, we make use of the BNP distributions and corresponding sampling algorithms, which are a recent addition to the nimble package. Second, we enhance performance of the specific ecological models by using custom likelihood distributions provided by the nimbleEcology package (Goldstein et al., 2020) to remove latent states from the model structures.
3.1 NonParametric Distributions and MCMC Sampling
When a hierarchical model is formulated using the CRP distribution, the dCRP distribution (available in the nimble package) assigns the joint prior distribution arising from (1) to the labeling vector, . Correspondingly, a specialized sampler is assigned by nimble’s MCMC engine. Because the likelihood function of the ecological models presented in section 2 are not conjugate for the prior distribution of the cluster parameters, , the nonconjugate sampling algorithm described in Algorithm 8 of Neal (2000) is assigned to .
As described in section 2, under a nonparametric approach, the number of population subgroups is not fixed. In terms of the MCMC sampling scheme this means that the number of subgroups, and therefore the number of cluster parameters which are active, can vary with every MCMC iteration. As nimble does not support dynamic length allocation, the number of cluster parameters defined in the model must be fixed. A safe option would be to consider cluster parameters, however this is highly inefficient both in terms of computation and storage, especially for large values of . To reduce this inefficiency nimble allows the specification of cluster parameters. If upon any MCMC iteration more than groups are created, then a warning is issued. Additionally, to reduce the computational burden of the nonparametric sampling, only the active cluster parameters, , are updated.
As discussed in section 2, the concentration parameter has important implications in the clustering structure of the model. Therefore, efficient sampling of is an important matter. Although its posterior distribution does not belong to any known class of distributions, when a gamma prior distribution is considered for , a computationally efficient sampling scheme (Escobar and West, 1995, section 6) is assigned by nimble’s MCMC engine.
3.2 Likelihood Distributions Using nimbleEcology
To reduce computation time, we make use of the nimbleEcology R package (Goldstein et al., 2020) for specifying the ecological hierarchical models. The nimbleEcology package provides likelihood distributions specific for a variety of common ecological models, which are implemented as custom distributions using the nimble package. The likelihood distributions provided in nimbleEcology
include those for capturerecapture models, occupancy and dynamic occupancy models, and more generally for discrete hidden Markov models (HMMs) as appear in multistate or multievent capturerecapture
(e.g., Turek et al., 2016).For each type of ecological model, the likelihood distribution provided by nimbleEcology marginalizes over discrete latent states to directly calculate the unconditional likelihood of observed data. This allows removal of discrete latent state variables – alive/dead indicator variables in capturerecapture, and occupancy indicator variables in occupancy modelling – from the hierarchical model. This reduces model size and the necessary model computations, and increases the speed of MCMC mixing of toplevel model parameters ( in capturerecapture, and in occupancy models) to generate stronger posterior inferences in less computational time.
Using the distributions provided in nimbleEcology to remove latent states does not alter the posterior results generated from each model; it only serves to increase the speed of generating inferences. The only noteworthy difference is that posterior inference for the discrete latent states cannot be performed, since samples for these latent states are never generated. So, for example, we could not perform inference for the alive/dead status of specific individuals in the capturerecapture setting.
4 Simulations
We undertake two simulation studies to assess performance of the various approaches to modeling individual and site heterogeneity. The first study is in the context of capturerecapture models, and the second study in that of occupancy models.
Both simulations consider the effect of varying degrees of heterogeneity in detection probability between two population subgroups on the accuracy of inferences. In each, we fit the homogeneous model, 2group and 3group finite mixture models, and the nonparametric model as were presented in section 2. R code for all simulations, including the nimble specifications for each model using the likelihood distributions provided in the nimbleEcology package, are provided as supplemental material.
Next we describe the details of each simulation, and then present results.
4.1 CaptureRecapture Simulation
In the capturerecapture simulation, we consider two population subgroups each with 800 individuals, for a total population size of 1,600. We use eight observational periods, and condition on all individuals being sighted on the first observational period. Survival probability is fixed at for simulating data, and we focus on the ability of various modeling approaches to estimate survival.
One population subgroup is fixed as having individual probability of detection , where detection is conditional on being alive. Individuals in the other subgroup have a fixed detection probability , which varies between simulations. We consider values of between and in increments of , where the terminal case coincides with the detection probability of the first subgroup, and therefore represents a homogeneous population.
4.2 Occupancy Simulation
In the occupancy simulation, we consider two subgroups each with 2,000 sites, for a total of 4,000 sites. We use six independent observations of each site. The proportion of occupied sites is fixed at for simulating data, and we focus on the ability of various modeling approaches to estimate the true occupancy proportion.
One subgroup is fixed as having probability of detection on each independent site visit, where detection is conditional on a site being occupied. The other subgroup has a fixed detection probability , which varies between simulations. We consider values of between and in increments of , but with a finer resolution on the interval between and . Again, the terminal case coincides with the detection probability of the first subgroup, and therefore represents a homogeneous population.
4.3 Simulation Results
Posterior inferences were performed for individual survival probability in the capturerecapture simulation and for site occupancy proportion
in the occupancy simulation. We use the posterior median and equaltailed 95% Bayesian credible intervals for inferences under each model. Fitted models include a homogeneous model (Hom), 2group and 3group finite mixture models (FM 2 and FM 3, respectively), and a nonparametric model (NP). Each model was fit to simulated data using MCMC, as described in section
3.Results for the capturerecapture simulation appear in Figure 2. The homogeneous model consistently underestimates , more severely for larger discrepancies in detection probability between the two groups, with the discrepancy diminishing and disappearing as the detection probabilities of the two groups converge. For the lowest value of , and hence a large difference in detection probability between groups, the 3group mixture model has a slight tendency to inflate estimates of , while the opposite is true of the 2group mixture model; however this difference is minor and quickly disappears as the group difference deceases. Otherwise, the mixture models and nonparametric models are all similarly successful in generating accurate inferences of . We also note that all models exhibit a slight negative bias in their estimates of , which diminishes as and jointly approach one.
Results for the occupancy model simulation appear in Figure 3. Once again, for low values of the homogeneous model underestimates , more severely for larger discrepancies between the two groups. We also see a regime of values between (approximately) 0.167 and 0.267, in which the 3group mixture model vastly overestimates , with posterior median estimates in excess of 0.99. The precise location of this regime varied somewhat depending on simulation parameters (specifically the number of individuals in the population, and number of observation periods) but its existence persisted in all simulations. That is, there exists the potential for highly inaccurate inferences when the number of groups in a finite mixture model is chosen either too high, or too low.
The 2group mixture model (which is also the datagenerating model) generated reliable inferences for all values of . The nonparametric model admitted greater uncertainty for the upper limit of the 95% credible interval, especially for lower values of , but the posterior median estimates generated using the nonparametric model were consistently near to the true parameter value .
5 Examples
We present two realdata examples, one in the context of capturerecapture and the other in that of occupancy modeling. For each example, we fit a homogeneous model for detection probability, finite mixture models containing between two and ten population subgroups, and a nonparametric model. In addition to parameter inferences under each model, we also present the WAIC (Watanabe, 2010; Gelman et al., 2014) of each fitted model. The WAIC value is a measure of the goodnessoffit of a hierarchical model, calculated using chains of posterior MCMC samples. WAIC is measured on the scale of deviance, and therefore lower values of WAIC indicate a more parsimonious fit to the data. Finally, for each example we also present the posterior distribution for the number of population subgroups in the nonparametric model. This suggests at the degree of heterogeneity present in the data as inferred using the nonparametric model, the only model which does not predefine the number of population subgroups. The datasets used for each example are available on GitHub at: https://github.com/danielturek/bnpexamplesdata.
5.1 CaptureRecapture Example
We consider wolf (canis lupus) capturerecapture data collected in France between 1995 and 2003, as studied in Cubaynes et al. (2010). The original data contains binary detection data for a total of 87 wolves, over observation periods. Since we condition on the first sighting of each individual, we excluded the twenty individuals who were first observed on the final observation period. This leaves a total of unique individuals in the dataset. Models for detection heterogeneity described in section 2 were fit to this data using MCMC, and inference was performed for individual survival probability under each model.
Model  Median  95% BCI  WAIC 

Homogeneous  .80  (.70, .89)  237.5 
Finite Mixture 2  .91  (.79, .99)  209.1 
Finite Mixture 3  .91  (.80, .99)  201.9 
Finite Mixture 4  .92  (.80, .99)  198.3 
Finite Mixture 5  .92  (.80, .99)  198.3 
Finite Mixture 6  .91  (.80, .99)  199.9 
Finite Mixture 7  .91  (.79, .99)  200.6 
Finite Mixture 8  .90  (.79, .99)  201.4 
Finite Mixture 9  .90  (.79, .98)  202.1 
Finite Mixture 10  .90  (.78, .98)  202.8 
NonParametric  .92  (.81, .99)  199.8 
Posterior median and 95% credible intervals, as well as the WAIC value of each fitted model are presented in Table 1. The homogeneous model produces the largest WAIC value among those models considered (indicating the poorest fit to the data), and the lowest estimates of . Posterior inferences from all mixture models and the nonparametric model are nearly indistinguishable, with median posterior values for around 0.91, and 95% credible intervals of approximately . We note that the nonparametric model produces the third lowest WAIC value, being 1.5 higher than the lowest two WAIC values (equal from the 4group and 5group mixture models). Further, inferences for generated under the nonparametric model are nearly identical to those of the 4group and 5group mixture models. Given our uncertainty in the structure and degree of heterogeneity, the nonparametric model provides defensible inferences and goodnessoffit.
Figure 4
displays the posterior distribution for the number of population subgroups, as inferred by the nonparametric model. We observe a rightskewed distribution placing the bulk of the posterior mass roughly between three and ten subgroups, with a posterior median of seven, the mode appearing at six subgroups, and a 90% Bayesian credible interval for the number of subgroups as
. This distribution suggests that our consideration of finite mixture models containing between two and ten subgroups was a reasonable choice, even though this was an uninformed and somewhat arbitrary selection.5.2 Occupancy Example
For our occupancy example, we consider a second wolf (canis lupus) dataset (Louvrier et al., 2018) collected in France in 2013. Opportunistic observational data such as tracks, scat, and prey remains were collected from 3,211 grid cells, each being a 10km 10km square. Each site was surveyed on a total of independent observation occasions. The categorical data, in total, consisted of 250 “unambiguous detections”, 54 “ambiguous detections”, and 12,540 “nondetections”. We convert this to binary data, wherein both “unambiguous” and “ambiguous” detections are considered to be positive detections. Models for detection heterogeneity described in section 2 were fit to this data using MCMC, and inference was performed for site occupancy proportion under each model.
Model  Median  95% BCI  WAIC 

Homogeneous  .063  (.054, .073)  2237.0 
Finite Mixture 2  .079  (.063, .101)  2195.8 
Finite Mixture 3  .087  (.067, .122)  2178.3 
Finite Mixture 4  .092  (.068, .139)  2168.7 
Finite Mixture 5  .089  (.068, .132)  2172.4 
Finite Mixture 6  .090  (.069, .130)  2172.1 
Finite Mixture 7  .089  (.068, .125)  2173.7 
Finite Mixture 8  .087  (.068, .122)  2174.9 
Finite Mixture 9  .087  (.068, .119)  2175.9 
Finite Mixture 10  .086  (.069, .119)  2175.4 
NonParametric  .090  (.066, .359)  2166.7 
Posterior median and 95% credible intervals, as well as the WAIC value of each fitted model are presented in Table 2. The homogeneous model again produces the largest WAIC value (indicating the poorest fit to the data), and the lowest estimates of . The 2group mixture model yields the secondhighest WAIC value and also lower estimates of than the remaining models, and the 3group mixture model yields the third highest WAIC value. This suggests a nontrivial degree of heterogeneity in detection probability between sites.
The remaining mixture models ( groups) give WAIC values between 2168.7 and 2175.9, and exhibit small variations in the inferences for , in particular in the upper limit of the 95% credible interval. In contrast, the nonparametric model yields the lowest WAIC value among all models (2166.7) indicating the best fit to the data. The posterior median estimate from the nonparametric model is similar to that of the mixture models, but the credible interval is wider. In particular, the nonparametric model suggests a higher upperbound for the 95% credible interval for than any other model considered. This result may be reasonable, since WAIC suggests the nonparametric model provides the best fit to the data, and we are completely uncertain as to the degrees of detection heterogeneity which exists in this opportunistic sampling dataset.
We again present the posterior distribution for the number of population subgroups as inferred by the nonparametric model in Figure 5. This distribution is again rightskewed, this time spanning a wider range for plausible numbers of subgroups. This makes intuitive sense, as our occupancy example data considers a much larger number of observations (3,211 sites, rather than a total of 67 individuals in the capturerecapture example). The posterior distribution has a median of eight, once again the mode appears at six subgroups, and has a 90% Bayesian credible interval of .
6 Discussion
Here, we have employed a Bayesian nonparametric approach to modeling heterogeneity in the detection process of ecological models. Our approach used a Chinese restaurant process (CRP) prior distribution to model group memberships, which has the benefit of not requiring specification of the number of population subgroups a priori. Using the CRP prior, both the distribution of individuals or sites among distinct subgroups and the number of subgroups are inferred from the data. This strategy obviates any process of model selection used to choose a “best value” for the number of groups when using a finite mixture model.
The tendency of the CRP distribution towards using fewer (and hence more diverse) subgroups, or towards using a larger number of tightly specified subgroups is governed by a concentration parameter (). The effect of different values of is seen in Figure 1, although we avoid the need to choose a particular value of by specifying an uninformative Gamma prior distribution. This approach of using a CRP prior distribution to model heterogeneity is used in diverse areas of research including topic modeling (Blei et al., 2010), genomics (Qin, 2006), and evolutionary clustering (Ahmed and Xing, 2008), among others.
The examples and simulations used herein focus on the most basic forms of capturerecapture and occupancy models. Specifically, we have considered the CormackJollySeber capturerecapture model, and a singleseason static occupancy model. Although our examples focused on the basic forms of each model, the same technique are readily applied to more complex variations of each model. For example, we could apply the same nonparametric CRP prior distribution in multistate capturerecapture, or in dynamic, multiseason, or multispecies occupancy models. Similarly, one can readily extend the hierarchical models to incorporate individual or environmental covariates affecting population demographics, survey effort, or the detection processes. The approach we have used is quite general, and widely applicable to ecological models where population heterogeneity is a consideration.
We have used the nimble R package to specify the hierarchical models described herein, and to fit these models to data using MCMC. nimble provides degrees of flexibility which are not available in other software packages. Specifically, we leverage nimble’s ability to use customwritten likelihood distributions in a hierarchical model, specifically the ecological likelihood distributions provided in the nimbleEcology package. Further, nimble provides the ability to specify the sampling algorithms used by the MCMC, and even to oneself write customized sampling algorithms for use in the MCMC. Indeed, the MCMC sampling algorithms used for fitting our models – specifically those used for the CRP concentration parameter , and for the CRPdistributed group membership indicators – are themselves custom sampling algorithms written for precisely these nonparametric motifs, and added into nimble’s MCMC repertoire of algorithms. That said, nimble does not attempt to provide “canned” algorithms, nor any particular prewritten model structures, but rather an environment for writing custom functions, statistical algorithms, and distributions, and the application of these to generallyspecified hierarchical model structures. The goal of nimble is to provide a flexible model and algorithmic programming environment to facilitate highly efficient analysis of models and complex data.
It is common that heterogeneity will be present to some degree in the detection process of ecological models. In practice, this may be detected by goodnessoffit tests (Jeyam et al., 2019), or perhaps based on prior expert knowledge. When detection heterogeneity is known or suspected to be present, and suitable covariates are not available to accurately model this heterogeneity, we recommend using a BNP modelling approach. This approach alleviates the necessity of selecting the number of components used in a finite mixture model, which is an inherently difficult and oftentimes subjective process. No less, the exact number of mixture components is generally not the primary inferential focus. Use of a BNP modelling approach, as demonstrated herein, accounts for whatever degree of heterogeneity may be present while requiring no subjective choices or guesswork. This provides an effective approach to reducing bias in the resulting demographic inferences.
Acknowledgements
Support for DT was provided by Fulbright Research Scholarship Award 9183FR and also the Williams College Class of 1945 World Fellowship. Support for CW was partially provided by award NSFDMS 1622444. Support for OG was provided by a grant from CNRS and “Mission pour l’interdisciplinarité” through its “Osezl’interdisciplinarité call.” We warmly thank the French Office of Biodiversity (OFB) for sharing the wolf datasets.
References
 Dynamic nonparametric mixture models and the recurrent chinese restaurant process: with applications to evolutionary clustering. In Proceedings of the 2008 SIAM International Conference on Data Mining, pp. 219–230. Cited by: §6.
 Ecology: from individuals to ecosystems. 4 edition, WileyBlackwell. Cited by: §1.
 Ferguson distributions via Pólya urn schemes. The Annals of Statistics 1, pp. 353–355. Cited by: §1, §2.3.
 The nested chinese restaurant process and bayesian nonparametric inference of topic hierarchies. Journal of the ACM (JACM) 57 (2), pp. 1–30. Cited by: §6.
 Handbook of Markov Chain Monte Carlo. CRC Press (en). Note: bibtex: Brooks2011 Cited by: §3.
 Assessing individual heterogeneity using model selection criteria: how many mixture components in capturerecapture models?. Methods in Ecology and Evolution 3 (3), pp. 564–573. Cited by: §1.
 Importance of accounting for detection heterogeneity when estimating abundance: the case of French wolves.. Conservation Biology 24 (2), pp. 621–6. Cited by: §5.1.
 Programming with models: writing statistical algorithms for general model structures with NIMBLE. J. Comput. Graph. Stat. 26 (2), pp. 403–413. Cited by: §3.
 A hierarchical dependent dirichlet process prior for modelling birds migration patterns in the uk. The Annals of Applied Statistics n/a (n/a), pp. . Cited by: §1.
 Modeling unobserved sources of heterogeneity in animal abundance using a dirichlet process prior. Biometrics 64 (2), pp. 635–644. Cited by: §1.
 Bayesian density estimation and inference using mixtures. Journal of the American Statistical Association 90, pp. 577–588. Cited by: §1, §3.1.
 Estimating normal means with a Dirichlet process prior. Journal of the American Statistical Association 89, pp. 268–277. Cited by: §1.
 A Bayesian analysis of some nonparametric problems. Annals of Statistics 1, pp. 209–230. Cited by: §1.
 Prior distribution on the spaces of probability measures. Annals of Statistics 2, pp. 615–629. Cited by: §1.
 Modelling latent individual heterogeneity in markrecapture data with dirichlet process priors. External Links: 1511.07103 Cited by: §1, §1.
 Understanding predictive information criteria for bayesian models. Statistics and computing 24 (6), pp. 997–1016. Cited by: §5.
 Individual heterogeneity and capture–recapture models: what, why and how?. Oikos 127 (5), pp. 664–686. Cited by: §1.
 The risk of flawed inference in evolutionary studies when detectability is less than one.. The American Naturalist 172 (3), pp. 441–448. Cited by: §1.
 nimbleEcology: distributions for ecological models in nimble. External Links: Link Cited by: §3.2, §3.
 Modelling of species distributions, range dynamics and communities under imperfect detection: advances, challenges and opportunities. Ecography 40 (2), pp. 281–295. External Links: Document Cited by: §1.
 Assessing heterogeneity in transition propensity in multistate capture–recapture data. Journal of the Royal Statistical Society: Series C (Applied Statistics). Cited by: §6.
 A tutorial on dirichlet process mixture modeling. Journal of Mathematical Psychology 91, pp. 128–144. Cited by: §2.3.
 On a class of Bayesian nonparametric estimates I: density estimates. The Annals of Statistics 12, pp. 351–357. Cited by: §1.
 Accounting for misidentification and heterogeneity in occupancy studies using hidden Markov models. Ecological Modelling 387, pp. 61–69. Cited by: §1, §5.2.
 The BUGS project: Evolution, critique and future directions. Statistics in Medicine 28 (25), pp. 3049–3067. Cited by: §3.
 Occupancy estimation and modeling: inferring patterns and dynamics of species occurrence. 2 edition, Elsevier. Cited by: §1.
 Bayesian population size estimation using dirichlet process mixtures. Biometrics 72 (4), pp. 1246–1254. Cited by: §1.
 Modelling individual migration patterns using a Bayesian nonparametric approach for capture–recapture data. The Annals of Applied Statistics 11 (1), pp. 21–40. Cited by: §1.
 Analysis of capturerecapture data. 1 edition, Chapman & Hall. Cited by: §1.
 Markov chain sampling methods for Dirichlet process mixture models. Journal of Computational and Graphical Statistics 9, pp. 249–265. Cited by: §3.1.
 NIMBLE: MCMC, particle filtering, and programmable hierarchical modeling. External Links: Document, Link Cited by: §3.
 Flexible randomeffects models using bayesian semiparametric models: applications to institutional comparisons. Statistics in Medicine 26 (9), pp. 2088–2112. Cited by: §1.
 Exchangeable and partially exchangeable random partitions. Probability Theory and Related Fields 102, pp. 145–158. Cited by: §1, §2.3.

Some developments of the BlackwellMacQueen urn scheme.
In
Statistics, Probability and Game Theory. Papers in Honor of David Blackwell
, T. S. Ferguson, L. S. Shapeley, and J. B. MacQueen (Eds.), pp. 245–268. Cited by: §1, §2.3.  The performance of mixture models in heterogeneous closed population capturerecapture. Biometrics 61 (3), pp. 868–873. Cited by: §1.
 Open capturerecapture models with heterogeneity: I. CormackJollySeber model. Biometrics 59 (4), pp. 786–794. Cited by: §1.
 Open capturerecapture models with heterogeneity: II. JollySeber model. Biometrics 66 (3), pp. 883–890. Cited by: §1.
 Selecting the Number of States in Hidden Markov Models: Pragmatic Solutions Illustrated Using Animal Movement. Journal of Agricultural, Biological and Environmental Statistics 22 (3), pp. 270–293. Cited by: §1.
 Clustering microarray gene expression data using weighted chinese restaurant process. Bioinformatics 22 (16), pp. 1988–1997. Cited by: §6.
 Site occupancy models with heterogeneous detection probabilities. Biometrics 62 (1), pp. 97–102. External Links: Document Cited by: §1, §1.
 A constructive definition of Dirichlet prior. Statistica Sinica 2, pp. 639–650. Cited by: §1.
 Automated parameter blocking for efficient markov chain monte carlo sampling. Bayesian Analysis 12 (2), pp. 465–490. Cited by: §3.
 Efficient markov chain monte carlo sampling for hierarchical hidden markov models. Environmental and ecological statistics 23 (4), pp. 549–564. Cited by: §3.2, §3.

Asymptotic equivalence of bayes cross validation and widely applicable information criterion in singular learning theory.
Journal of Machine Learning Research
11 (Dec), pp. 3571–3594. Cited by: §5.  Bayesian nonparametric mixture models using nimble. In NeurIPS Workshop on Nonparametric Bayesian Models, Cited by: §1.
Appendix A Simulation Code
library(nimble) library(nimbleEcology) ################################################## ############ CaptureRecapture Models ############ ################################################## ## number of individuals N < 800 ## number of observation periods T < 8 ## time period of first capture first < rep(1, N) ## length of observation history from first capture len < T  first + 1 ## survival probability phi < 0.7 ## detection probability pVec < rep(c(0.2, 0.8), each=N/2) ## simulate z (alive / dead status), ## and y (encounter histories) set.seed(0) z < matrix(NA, nrow=N, ncol=T) y < matrix(NA, nrow=N, ncol=T) for(i in 1:N) { z[i, first[i]] < y[i, first[i]] < 1 for(t in (first[i]+1):T) { z[i,t] < rbinom(1, 1, phi*z[i,t1]) y[i,t] < rbinom(1, 1, pVec[i]*z[i,t]) } } ## Homogeneous CaptureRecapture Model code < nimbleCode({ phi ~ dunif(0, 1) p ~ dunif(0, 1) for(i in 1:N) { y[i,first[i]:T] ~ dCJS_ss(phi, p, len=len[i]) } }) constants < list(N=N, T=T, first=first, len=len) data < list(y=y) inits < list(phi=0.5, p=0.5) Rmodel < nimbleModel(code, constants, data, inits) ## 2Group Finite Mixture CaptureRecapture Model code < nimbleCode({ phi ~ dunif(0, 1) for(k in 1:K) p[k] ~ dunif(0, 1) one ~ dconstraint(p[1] <= p[2]) for(i in 1:N) { g[i] ~ dcat(pi[1:K]) y[i,first[i]:T] ~ dCJS_ss(phi, p[g[i]], len=len[i]) } }) K < 2 ## fixed number of groups constants < list(N=N, T=T, first=first, len=len, K=K) data < list(y=y, one=rep(1,K1)) inits < list(phi=0.5, pi=rep(1/K,K), p=rep(0.5,K), g=rep(1,N)) Rmodel < nimbleModel(code, constants, data, inits) ## 3Group Finite Mixture CaptureRecapture Model code < nimbleCode({ phi ~ dunif(0, 1) for(k in 1:K) p[k] ~ dunif(0, 1) one[1] ~ dconstraint(p[1] <= p[2]) one[2] ~ dconstraint(p[2] <= p[3]) for(i in 1:N) { g[i] ~ dcat(pi[1:K]) y[i,first[i]:T] ~ dCJS_ss(phi, p[g[i]], len=len[i]) } }) K < 3 ## fixed number of groups constants < list(N=N, T=T, first=first, len=len, K=K) data < list(y=y, one=rep(1,K1)) inits < list(phi=0.5, pi=rep(1/K,K), p=rep(0.5,K), g=rep(1,N)) Rmodel < nimbleModel(code, constants, data, inits) ## NonParametric CaptureRecapture Model code < nimbleCode({ phi ~ dunif(0, 1) alpha ~ dgamma(1, 1) xi[1:N] ~ dCRP(conc=alpha, size=N) for(i in 1:M) p[i] ~ dunif(0, 1) for(i in 1:N) { y[i,first[i]:T] ~ dCJS_ss(phi, p[xi[i]], len=len[i]) } }) M < 100 ## maximum number of subgroups constants < list(N=N, T=T, first=first, len=len, M=M) data < list(y=y) inits < list(phi=0.5, alpha=1, xi=rep(1,N), p=rep(0.5,M)) Rmodel < nimbleModel(code, constants, data, inits) ################################################## ################ Occupancy Models ################ ################################################## ## number of sites N < 4000 ## number of observation periods T < 6 ## probability of occupancy pOcc < 0.7 ## detection probability pVec < rep(c(0.2, 0.8), each=N/2) ## simulate z (occupied status), ## and y (encounter histories) set.seed(0) z < rep(NA, N) y < matrix(NA, nrow=N, ncol=T) for(i in 1:N) { z[i] < rbinom(1, size=1, prob=pOcc) y[i, 1:T] < rbinom(T, size=1, prob=z[i]*pVec[i]) } ## Homogeneous Occupancy Model code < nimbleCode({ pOcc ~ dunif(0, 1) p ~ dunif(0, 1) for(i in 1:N) { y[i,1:T] ~ dOcc_s(pOcc, p, len=T) } }) constants < list(N=N, T=T) data < list(y=y) inits < list(pOcc=0.5, p=0.5) Rmodel < nimbleModel(code, constants, data, inits) ## 2Group Finite Mixture Occupancy Model code < nimbleCode({ pOcc ~ dunif(0, 1) for(k in 1:K) p[k] ~ dunif(0, 1) one ~ dconstraint(p[1] <= p[2]) for(i in 1:N) { g[i] ~ dcat(pi[1:K]) y[i,1:T] ~ dOcc_s(pOcc, p[g[i]], len=T) } }) K < 2 ## fixed number of groups constants < list(N=N, T=T, K=K) data < list(y=y, one=rep(1,K1)) inits < list(pOcc=0.5, pi=rep(1/K,K), p=rep(0.5,K), g=rep(1,N)) Rmodel < nimbleModel(code, constants, data, inits) ## 3Group Finite Mixture Occupancy Model code < nimbleCode({ pOcc ~ dunif(0, 1) for(k in 1:K) p[k] ~ dunif(0, 1) one[1] ~ dconstraint(p[1] <= p[2]) one[2] ~ dconstraint(p[2] <= p[3]) for(i in 1:N) { g[i] ~ dcat(pi[1:K]) y[i,1:T] ~ dOcc_s(pOcc, p[g[i]], len=T) } }) K < 3 ## fixed number of groups constants < list(N=N, T=T, K=K) data < list(y=y, one=rep(1,K1)) inits < list(pOcc=0.5, pi=rep(1/K,K), p=rep(0.5,K), g=rep(1,N)) Rmodel < nimbleModel(code, constants, data, inits) ## NonParametric Occupancy Model code < nimbleCode({ pOcc ~ dunif(0, 1) alpha ~ dgamma(1, 1) xi[1:N] ~ dCRP(conc=alpha, size=N) for(i in 1:M) p[i] ~ dunif(0, 1) for(i in 1:N) { y[i,1:T] ~ dOcc_s(pOcc, p[xi[i]], len=T) } }) M < 100 ## maximum number of subgroups constants < list(N=N, T=T, M=M) data < list(y=y) inits < list(pOcc=0.5, alpha=1, xi=rep(1,N), p=rep(0.5,M)) Rmodel < nimbleModel(code, constants, data, inits) ################################################## ############## Fit Model Using MCMC ############## ################################################## ## configure MCMC conf < configureMCMC(Rmodel) ## build MCMC Rmcmc < buildMCMC(conf) ## compile model and MCMC Cmodel < compileNimble(Rmodel) Cmcmc < compileNimble(Rmcmc, project=Rmodel) set.seed(0) samplesList < runMCMC(Cmcmc, niter=10000, nchains=3)
Comments
There are no comments yet.