When and where: estimating the date and location of introduction for exotic pests and pathogens

by   Trevor J. Hefley, et al.
Kansas State University

A fundamental question during the outbreak of a novel disease or invasion of an exotic pest is: At what location and date was it first introduced? With this information, future introductions can be anticipated and perhaps avoided. Point process models are commonly used for mapping species distribution and disease occurrence. If the time and location of introductions were known, then point process models could be used to map and understand the factors that influence introductions; however, rarely is the process of introduction directly observed. We propose embedding a point process within hierarchical Bayesian models commonly used to understand the spatio-temporal dynamics of invasion. Including a point process within a hierarchical Bayesian model enables inference regarding the location and date of introduction from indirect observation of the process such as species or disease occurrence records. We illustrate our approach using disease surveillance data collected to monitor white-nose syndrome, which is a fungal disease that threatens many North American species of bats. We use our model and surveillance data to estimate the location and date that the pathogen was introduced into the United States. Finally, we compare forecasts from our model to forecasts obtained from state-of-the-art regression-based statistical and machine learning methods. Our results show that the pathogen causing white-nose syndrome was most likely introduced into the United States 4 years prior to the first detection, but there is a moderate level of uncertainty in this estimate. The location of introduction could be up to 510 km east of the location of first discovery, but our results indicate that there is a relatively high probability the location of first detection could be the location of introduction.



page 16


A Bayesian spatio-temporal nowcasting model for public health decision-making and surveillance

As COVID-19 spread through the United States in 2020, states began to se...

Estimating the probability of accidental mark locations on a shoe sole

Footwear comparison is an important type of forensic evidence used to li...

Correspondence Analysis between the Location and the Leading Causes of Death in the United States

Correspondence Analysis analyzes two-way or multi-way tables withe each ...

A general framework for estimating the spatio-temporal distribution of a species using multiple data types

Species distribution models (SDMs) are useful tools to help ecologists q...

Fast Bayesian inference for large occupancy data sets, using the Polya-Gamma scheme

In recent years, the study of species' occurrence has benefited from the...

State estimation for aoristic models

Aoristic data can be described by a marked point process in time in whic...

Predicting Regional Locust Swarm Distribution with Recurrent Neural Networks

Locust infestation of some regions in the world, including Africa, Asia ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

When a species or pathogen invades a novel environment, determining when and where it was first introduced is of interest to scientists, policymakers, and the public. Knowing the location and date of introduction would further our understanding of the ecology of the process, increase our capacity to build predictive models, and could inform policies that aim to prevent new introductions. Although the location and time of introduction is known in some cases, such as the introduction of myxomytosis to control rabbits in Australia (Ratcliffe et al. 1952), there are many examples where the location and time of introduction are unknown and must be inferred from indirect data.

The general problem of modeling the spatio-temporal dynamics of invasion is well-represented in the literature (e.g., Mollison 1986; Gibson and Austin 1996; Wikle 2003); however, when the introduction is not directly observed there is a small number of limited methods that might be used to determine the location and time of introduction from indirect data. For example, a method known as geographic profiling has been introduced to estimate the position of sources of invasive species and diseases using occurrence records (Le Comber et al. 2011; Stevenson et al. 2012; Verity et al. 2014), but this approach ignores the temporal dynamics and therefore cannot estimate the initial source (however see Mohler and Short 2012). Similarly, Bayesian models have been developed to estimate the spatio-temporal dynamics of colonization (e.g., Cook et al. 2007; Catterall et al. 2012; Broms et al. 2016), but these methods are limited. For example, the Cook et al. (2007) approach requires that the colonization time be known. Catterall et al. (2012) demonstrate how to infer the time of colonization; however, this method was developed to understand the dynamics after the initial introduction and not to obtain estimates of the time and location of the first introduction. Lastly, Caley et al. (2015) developed a Bayesian spatio-temporal model to infer the time of introduction and geographic distribution of an invasive species from occurrence records; however, this approach does not provide a framework for inferring the location of introduction.

If the time and location of introductions were known, then tools commonly used to map the distribution of species and diseases could be used to build predictive models and infer the factors that influence the occurrence of novel introductions. Novel introductions, however, are rarely directly observed. Instead, the outcome of a successful introduction is observed by the collection of indirect data such as species or disease occurrence records. In principal, indirect spatio-temporal data and models can be used infer historical events such as infection histories (Salje et al. 2018), changes in climate (Tingley et al. 2012; Tipton et al. 2016), and, as we demonstrate, introductions of pests and pathogens.

Our objective is to show how point process models, a commonly used statistical approach for modeling the distribution of species and diseases, can be placed within a hierarchical modeling framework to obtain statistically principled estimates of the date and location of introduction. Although we provide a general framework, our work is motivated by the need to understand when and where Pseudogymnoascus destructans, the pathogen that causes white-nose syndrome (WNS), was first introduced into the United States. White-nose syndrome was first detected in 2006 using photographic evidence of a bat from Howes Cave, near Albany, New York (Fig. 1; Blehert et al. 2009; Frick et al. 2010). The pathogen, P. destructans, has since spread throughout the eastern and midwestern United States resulting in high mortality rates among several species of cave-hibernating bats. Statistical methods capable of estimating the date and location where P. destructans was first introduced into the United States could further our understanding of the etiology of the disease and provide critical information on the process of introduction. For example, WNS was recently detected on a western subspecies of little brown bat (Myotis lucifugus) occurring in the state of Washington that is likely the result of a novel introduction originating from the eastern United States (Lorch et al. 2016). Our modeling approach could eventually be used to test the hypothesis that the Washington WNS case is the result of a novel introduction and, if confirmed, identify areas at high risk for future introductions.

Fig. 1. Location and infection status of individual bat samples from four species tested for white-nose syndrome. The first detection in 2006 was based on photographic evidence only. The locations of individual samples were jittered so that cases from the same location did not overlap. The four species of bats tested include: little brown bat (Myotis lucifugus), big brown bat (Eptesicus fuscus), northern long-eared bat (Myotis septentrionalis), and tri-colored bat (Perimyotis subflavus).

2 Dynamic Spatio-temporal Statistical Models

Dynamic spatio-temporal statistical models are used in fields such as ecology (Wikle 2003; Hooten and Wikle 2008; Williams et al. 2017; Hefley et al. 2017b), epidemiology (Smith et al. 2002), and meteorology (Wikle et al. 1998, 2001). The defining property of a dynamic statistical model is an explicit description of how future observations (spatial data) evolve from past observations (Wikle and Hooten 2010; Mohler and Short 2012; Cressie and Wikle 2011; Wikle et al. 2019). For example, statistical implementations of dynamic agent-based models are constructed using a set of deterministic or stochastic rules governing the actions of individual agents at each time step (Hooten et al. 2010; Hooten and Wikle 2010; Keith and Spring 2013; Heard et al. 2015; McDermott et al. 2017).

Consider the classic example of a dynamic agent-based model where an individual performs a random walk. At each time step, the individual moves a random distance in a random direction. An inherent characteristic of dynamic models is that the initial conditions must be specified. For example, when performing a random walk, the location and time the individual begins walking must be specified. Although a Eulerian (individual-based) frame of reference can be used to understand dynamic processes, a Lagrangian (population-level) perspective is also common (Hooten et al. 2017

). In ecology, for example, dynamic models of populations are expressed as partial differential equations (PDEs), which are used to describe the global behavior of a large number of individuals (

Holmes et al. 1994; Hooten et al. 2017).

In data-driven modeling applications, hierarchical (conditional) model specifications can be used to estimate unknown parameters by connecting data to the underlying dynamic process (Cressie and Wikle 2011; Wikle et al. 2019). Many applications of dynamic statistical models have focused on estimating parameters that govern the process (e.g., rate of spread), but rarely has formal statistical inference about the initial conditions been of interest (but see Hefley et al. 2017a). Instead, the initial conditions are modeled phenomenologically and treated as nuisance parameters (Wikle et al. 2003; Hooten and Wikle 2008; Williams et al. 2017; Hefley et al. 2017b).

We demonstrate how incorporating scientific knowledge about the initial conditions of a dynamic process provides novel insight into the outbreak of an infectious disease or invasion of a pest. By justifiably assuming that the pathogen or pest was introduced at points occurring at unknown locations and times, we specify a point process to model the initial conditions. Point process models are commonly used for species distribution and disease mapping (Renner et al. 2015), but can also be incorporated into hierarchical Bayesian models to learn about latent (unobserved) processes (e.g., Hooten and Hefley 2019). In the following sections, we provide a specific example using a PDE of ecological diffusion; however, our approach is applicable to commonly used dynamic models that require the specification of initial conditions such as agent-based models, contact process models, rule-based approaches (e.g., cellular automaton), integral equations, and integro-difference equations.

2.1 Ecological Diffusion

Diffusion is a dynamic process that can be used to describe the movement of particles, animals, pathogens, or other objects from a Lagrangian perspective (Hooten et al. 2017). A specific type of diffusion, called ecological diffusion (Hooten et al. 2017), has been used to explain the spread of disease (Garlick et al. 2011, 2014; Hefley et al. 2017a, b), the outbreak of pests (Powell and Bentz 2014; Hooten et al. 2013; Powell et al. 2018), the invasions of species (Neupane and Powell 2015), and the reintroduction of extirpated species (Williams et al. 2017). Ecological diffusion is expressed by the PDE


where is the intensity of the dispersing pathogen (or population), and

are the spatial coordinates contained in the vector

, and is the time. The intensity, , can be linked to the expected pathogen abundance by integrating over a spatial domain of interest (i.e., ). The diffusion coefficient, could depend on location-specific covariates that control the rate of spread and vary over time. Although equation (1) only describes ecological diffusion, PDEs can be modified to capture other important components such a population growth or long-range dispersal (Holmes et al. 1994). For example, in our WNS data example we modify equation (1) to include an exponential growth component to enable the pathogen P. destructans to increase in abundance.

Ecological diffusion describes how the spatial distribution of a pathogen or population evolves over time. To initiate this dynamic process, the conditions at the time of introduction, , must be known or estimated. In situations of disease outbreak neither or are known. A realistic assumption is to presume that the pathogen was introduced at points, which can be formulated mathematically as


where is the unknown initial number of pathogen particles (or animals) and is the unknown coordinate of the location of introduction (). In data-driven applications, the time, location, initial number of particles, and potentially the number of points are unknown, thus the parameters , , , and must be estimated.

Estimation of the initial conditions can be understood heuristically as follows. By observing data generated from a diffusion process at multiple time points (Fig. 2), it is feasible to estimate the rate at which the process is diffusing (spreading). Once the diffusion rate is estimated, equation (1) can predict the future (forecast) or past (backcast) spread of the pathogen. Unlike forecasting, backcasting ultimately results in a highly constrained state with a functional form that we have knowledge about. Specifically, if the diffusion processes started at points, then backcasting has a known endpoint that constrains the functional form of

to equation (2). This constraint, formulated as a prior or process-level distribution when using a hierarchical Bayesian statistical model, enables the posterior distribution of the time, location, and number of points to be calculated from data. Next we introduce the data set that motivates our work. Understanding the specifics of the data is required to specify an appropriate hierarchical Bayesian model that links the ecological diffusion PDE to observed data and enables us to estimate the date and location of introduction.

Fig. 2. Illustration showing an initial introduction of particles and how the one-dimensional spatial distribution of pathogen or population intensity () evolves over time according to the ecological diffusion partial differential equation. Greater values of the diffusion rate , shown in the inset plot, result in faster spread whereas spread is inhibited when . The diffusion rate depends on the environmental variables at the location (), which results in the spatial variability (roughness) visible in . The pathogen intensity, , can be linked to the expected pathogen abundance by integrating over an interval of interest (i.e., ). In this example, for all time points.

3 White-nose Syndrome Data Example

Surveillance for WNS in the United States began in 2007 using a combination of passive and active surveillance methods. During 2007–2012, samples were obtained from individual bats associated with morbidity or mortality investigations occurring year-round at underground hibernacula or on the above-ground landscape, and consisted of bat carcasses, biopsies of wing skin, or tape lifts of fungal growth on muzzles. A small number of samples were also obtained from target species (including Myotis spp., Perimyotis subflavus, Eptesicus fuscus) admitted to rehabilitation facilities or state diagnostic laboratories for rabies testing from approximately December to May. A positive or negative diagnosis of WNS in individual bats was determined by observing characteristic histopathologic lesions in skin tissues using light microscopy (Meteyer et al. 2009). A diagnosis of suspect WNS was assigned to individuals with clinical signs suggestive of the disease that either had ambiguous skin histopathology or were not evaluated in this manner but for which the causative agent, P. destructans, was detected by fungal culture or polymerase chain reaction (Lorch et al. 2010).

We illustrate our modeling approach using a subset of the WNS surveillance data collected during 2008–2011 that includes individual samples of little brown bats (Myotis lucifugus), big brown bats (Eptesicus fuscus), northern long-eared bats (Myotis septentrionalis), and tri-colored bats (Perimyotis subflavus). This resulted in individual samples for each species and a total of 397 samples with 184 positive or suspected positive cases of WNS (Fig. 1). We limited our illustration to the four most common species sampled because the remaining species were uncommon and included less than 20 individuals and no more than five positive cases for each species. In our analysis, we chose to exclude the first case detected in 2006 because it was based on photographic evidence as well as samples from 2007 because diagnostic case criteria for WNS were not established until 2008. Finally, we used data collected from 2012 for the same four species to test the ability of our model to forecast future spread of P. destructans and prevalence of WNS. This resulted in individuals for each species and a total of 54 individuals with 37 positive or suspected positive cases.

3.1 Hierarchical Bayesian Statistical Model

We take a hierarchical Bayesian approach and specify a statistical model that facilitates simultaneous parameter estimation of the location and date of introduction as well as the diffusion and growth rate (Hooten et al. 2013). Our model is tailored to the WNS data (Fig. 1), but the hierarchical Bayesian approach is flexible and can be adapted to a wide range of studies (e.g., Hobbs and Hooten 2015; Williams et al. 2017). For the WNS data, we specify the hierarchical model:


where is equal to if the tested bat is WNS-positive or suspected positive and if negative. The probability that , , depends on the pathogen intensity defined by the PDE in equation (5) and species susceptibility , where is a vector that contains a binary indicator variable identifying the species of the tested bat and is a vector of species-specific susceptibility coefficients. Differences among species susceptibility to WNS has been reported although the mechanism(s) for these differences remain unclear (Langwig et al. 2012; Johnson et al. 2015; Langwig et al. 2015). Regardless of the mechanism(s), our model accounts for heterogeneity among species susceptibility by inclusions of .

For the ecological diffusion PDE, the product of the pathogen intensity () and species susceptibility () in equation (4) is greater than zero and therefore the inverse link function maps the positive real line

to a probability between 0 and 1. For an inverse link function, we used the cumulative distribution function of a standard log-normal distribution (i.e.,

). Although any appropriate link function could be used, our choice enables efficient implementation because the full-conditional distribution for are available in closed-form (see Appendix S1 for details; Hooten and Hefley 2019). The diffusion rate ()) in equation (5) depends on the regression-type equation in (6) that has an intercept term (), coefficients (), and location-specific covariates . For spreading diseases, the diffusion rate is always positive, thus for all , which motivates the log link function in (6). In addition to ecological diffusion, we add an exponential growth component to the PDE to account for increasing (or decreasing) pathogen intensity. This addition results in a growth rate () that depends on a similar regression-type equation (7) that has an intercept term (), coefficients (), and location-specific covariates , which can be the same as or different from . To allow the growth rate to be positive or negative, we used the identity link function so that the support of is unconstrained and can take on values from to .

We assume Dirichlet boundary conditions and set at the boundary of the United States. Similar to initial conditions, boundary conditions could be an important component and future studies may need to estimate the boundary conditions. For computational efficiency, we prefer Dirichlet boundary conditions because existing techniques for estimation would require estimating a large number of parameters (Wikle et al. 2003).

To fully specify a hierarchical Bayesian model, prior distributions must be carefully chosen for all parameters. We describe and justify our choice of prior distributions in Appendix S2, with the exception of the prior distribution for the location of introduction, which we describe below. We specify the initial conditions as described in equation (2). The natural choice for the prior of the latent (unobserved) locations of introduction is a point process, which is a probability distribution that generates locations in geographic space as random variables (

Cressie and Wikle 2011). For the WNS data, we assume a single point source introduction (i.e., ), which results in a conditional point process (i.e., a point process with a known number of points, in this case just one; Cressie 1993

p. 651). For an introduction that occurs at a single location, the intensity function of the point process must be specified as a hyperparameter; we specify the intensity function of the point process to be constant across the United States which is equivalent to assuming that

P. destructans was equally likely to have been introduced at any location within the United States.

If multiple introductions occur, it may be feasible to estimate the intensity function of the point process (Appendix S1). For example, the intensity function may be modeled using a regression-type equation to estimate how location-specific risk factors influence the likelihood of novel introductions. This approach could then be used to predict the geographic areas and identify the environmental conditions where new introductions are likely to occur. Finally, our approach can be modified to handle an unknown number of point sources that are introduced on different dates by using a spatio-temporal Poisson point process (Appendix S1). This modification may eventually be used to explain the recent positive cases of WNS in the state of Washington (Lorch et al. 2016).

3.2 Model Implementation

The covariates and , which influence the diffusion and growth rate in equations (6) and (7), respectively, can include any location-specific environmental factor (e.g., temperature). We used karst terrain and deciduous forest cover calculated from digitized maps (Tobin and Weary 2004) and the 2011 National Land Cover Database (Homer et al. 2015), respectively (Fig. 3). We expect that the spatial availability of caves and deciduous forest habitat will influence the diffusion of the pathogen due to the ability of bats to disperse; however, we acknowledge many other factors could be included depending on the goals of the study.

Fig. 3. The Environmental covariates proportion of deciduous forest (top panel) and karst terrain (bottom panel).

For model implementation, we first use an analytical technique called homogenization that results in a computationally efficient approximation to the solution of the ecological diffusion PDE (Garlick et al. 2011; Hooten et al. 2013). To solve the homogenized PDE, we use a standard finite-difference scheme and chose 30 time steps per month, broad-scale grid cells of 100 km by 100 km, and the fine-scale grid cells of 10 km by 10 km (Farlow 1993; Garlick et al. 2011; Hooten et al. 2013

). Finer spatial discretization may be implemented, but the resolution we chose is a reasonable trade-off between the location accuracy of our data (mostly reported at the county-level) and what is computationally feasible using accessible high-performance computing. For implementation of our Bayesian model, we used a Markov chain Monte Carlo (MCMC) algorithm with three chains. For each chain, we obtained 320,000 samples from the posterior distribution and discarded the first 20,000 samples to ensure that the MCMC algorithm was sampling from the target (stationary) distribution. All computations were conducted using a program in R (Appendix S1 and S2) with optimized basic linear algebra subprograms and subroutines written in C++ and called from R (

R Core Team 2019).

3.3 Model and Software Validation

As noted by Cook et al. (2006), implementing non-standard models such as ours often requires developing the necessary software “from scratch,” which may increase the chance of making errors when programming. Furthermore, when using a complex statistical model it is important to check that the parameters are identifiable (Lele 2010). One way to check our software for errors and our proposed model for identifiability of parameters is to use a simulated data set that mimics the WNS surveillance data, but where the parameter values are known. If our software and model are able to recover the known values of the parameters from the simulated data that mimics the WNS surveillance data, then there is evidence that the software is error free and that the parameters in our model are identifiable. Appendix S1 contains a simulated data example to demonstrate that our model is capable of recovering known values of the parameters (see Figs. S1 & S2).

In addition to a single simulated data set that mimics the WNS surveillance data, we conducted a simulation experiment that involved simulating several thousand data sets under different settings and fitting our proposed model to each data set. A simulation experiment can be used to evaluate frequentist properties of the estimated parameters, such as bias and coverage probability, which ensures statistical inference from our model is valid (Little 2006

). We conducted a simulation experiment employing a simplified version of the model we used for WNS surveillance data, but included a setting for multiple introductions (see Table S3 and Figs. S6–S7 for simulation settings in Appendix S3). For each setting, we evaluated bias of the estimated time and location as well as the coverage probability of the 90% credible intervals (or regions) for each parameter. We expect, unless something is pathologically wrong with our model (e.g., unidentifiable parameters;

Gustafson 2015), that point estimates will be unbiased and the 90% credible intervals will cover the true value of the parameters with a probability of 0.90.

For all settings, our simulation experiment shows that point estimates of the time and location of introduction are apparently unbiased (Figs S8–S9 in Appendix S3) and the 90% credible intervals cover the true location and time with a probability of . Appendix S3 contains details of our simulation experiment, full results, and computer code needed to reproduce our experiment.

3.4 Forecast Comparison

Accurate forecasts during the early stages of disease outbreak or invasion of a novel species may be useful to inform management recommendations and surveillance sampling. In addition to estimating the location and date of introduction, an auxiliary benefit of Bayesian implementations of dynamic models is that probabilistic predictions and forecasts can be obtained as a derived quantity (Hobbs and Hooten 2015; Hefley et al. 2017b). To demonstrate the accuracy of forecasts from our model, we used the samples collected from 2012 that were reserved for this single purpose (see WNS data description for more detail). To facilitate comparison of our model with other non-Bayesian approaches, we calculated the zero-one score using the posterior mean of the probability of infection (i.e., ) and forecasted that an individual sample would be WNS “positive” if and “negative” if (Gneiting and Raftery 2007; Gneiting 2011

). Although any local and proper scoring rule could be used, we chose the zero-one score because it can be used to calculate the misclassification rate, which is easy to interpret and communicate to practitioners. The misclassification rate is calculated using the 2012 data by summing the total number of samples incorrectly classified and dividing by the total number of samples.

As a standard of comparison, we followed Hefley et al. (2017b) and compared the forecasting ability of our model to phenomenological regression-type models developed for classification of binary events (e.g., positive/negative) from the statistics and machine learning literature (James et al. 2013

). Specifically, we compared forecasts from our model to three approaches, which include: (1) a generalized linear model (GLM) that included deciduous forest cover and karst terrain at the sample location and bat species as predictor variables; (2) a generalized additive model (GAM) that included a linear effect of the same predictor variables as the GLM, but that accounted for spatial autocorrelation by using two-dimensional splines on a sphere (

Wood 2017); and (3) the same GAM specification as in 2, but with an additional smooth effect of time to account for increasing prevalence and temporal autocorrelation. Details associated with the exact model specifications and computations are reported in Appendix S2.

For each approach, we used the samples collected from 2012 and calculated the zero-one score using the maximum (or restricted maximum) likelihood estimate of the predicted probability of infection () and forecasted that an individual sample would be WNS “positive” if and “negative” if . To compare forecasts with our model, we report the misclassification rate for each approach. Although comparing forecasts among different models is useful for quantifying forecast accuracy, this comparison is does not explicitly evaluate the reliability of the inferred time and location of introduction.

4 Results

The year with the highest probability of introduction was 2002 (1992–2005; 90% credible interval), which is 4 years before the first photographic evidence documented the presence of P. destructans in the United States (Fig. 4). When compared to the location of first detection based on photographic evidence from 2006, the posterior distribution shows an area covering km that has an equal or higher probability of containing the point where P. destructans was introduced (Fig. 4). This km area represents a 74% credible region and intersects the location of first detection, but indicates that location of introduction that are equally or more probable could be up to 510 km east of the first detection. For comparison, the 90% posterior credible region covers an area of km. This km regions represents 2.3% of the area covered by the 90% prior credible region, which demonstrates a large reduction in uncertainty due to the WNS surveillance data.

Fig. 4. Posterior distributions of the location and date of introduction obtained from fitting our model to the data shown in Fig. 1. The top panel shows the joint posterior distribution of the location of introduction with red indicating a higher probability than yellow. The beige polygon outlines an area that is equally or more likely to contain the point of introduction when compared to the location where white-nose syndrome was first detected in 2006 (blue dot). The gray areas are karst formations with caves, which is likely winter bat habitat. The bottom panel shows the marginal posterior distribution of the date of introduction. The blue vertical line represents the earliest known photographic evidence of white-nose syndrome on bats in the United States. Years with low probability are not shown (i.e., before 1980).

In addition to estimating the location and date of introduction, our approach simultaneously estimates species-specific susceptibility (Fig. 5), diffusion and growth rates of the pathogen (Fig. 6), and can be used to forecast the spread of P. destructans (Fig. 7 and animation in Appendix S4). As it relates to species-specific susceptibility, our results show that the northern long-eared bat was the most susceptible to P. destructans whereas the big brown bat was the least susceptible (Fig. 5). The diffusion and growth rates are positive in many regions and show spatial variability that is attributable to deciduous forest cover and karst terrain that contains caves (cf., Fig. 3 and Fig. 6, Fig. S3). As a result of positive growth and diffusion rates, forecasts show that P. destructans spread rapidly across the contiguous United States (Fig. 7), causing an overall increase in the probability of infection for all four species of bats over time (see animations in Appendix S4).

Our forecast comparison revealed a misclassification rate of 26% for our hierarchical Bayesian model. For comparisons, the three alternative approaches we used resulted in misclassification rate of 43% for method one (GLM), 28% for method 2 (GAM that included karst, forest and species as predictors and that accounted for spatial autocorrelation) and 30% for method three (same as method 2, but accounted for temporal dynamics and autocorrelation). A visual comparison of the forecasts for all methods is given in Appendix S2 (Fig. S5).

Fig. 5. The estimated species-specific susceptibility depends on the intensity of P. destructans (). Shown is the expected value of the posterior distribution (black lines) and 95% credible intervals (gray shading).

Fig. 6. The estimated diffusion rate (top panel) determines how fast the pathogen P. destructans will spread, whereas the growth rate (bottom panel) determines how quickly the intensity of P. destructans will increase. Deciduous forest and karst terrain (Fig. 3) influence the posterior distributions of the diffusion and growth rate leading to the spatial variability appearing in the maps.

Fig. 7. Estimated intensity of P. destructans () for January 2002–2012. See Appendix S4 for animations showing the contiguous United States for all months over the period 2000–2012 as well as forecasts of the probability of infection for all four species of bats. Note that the pathogen intensity, , can be linked to the expected pathogen abundance by integrating over a spatial domain of interest (i.e., ).

5 Discussion

Our results reveal that it is possible to estimate the date and location of introduction shortly after the release of a pathogen using indirect data from 397 samples collected over 4 years. Our results also demonstrate the need for statistical thinking because the estimated date shows that the introduction likely occurred several years before the first detected case. Conversely our results show that the location of first detection is a plausible location of introduction, but there is a moderate level of uncertainty surrounding the exact location.

For WNS, the estimated location and date of introduction, along with the associated uncertainty, constitutes knowledge generated by harnessing the information within disease surveillance data using statistically valid techniques. Similar surveillance data is routinely collected for diseases of wildlife, plants, livestock, and humans, but filling knowledge gaps using these data requires statistical methods that capture dynamic processes and quantify uncertainty.

The introduction of P. destructans is suspected to have originated from a single point source due to the clonal nature of the fungal pathogen in North America (Drees et al. 2017). For other pathogens or pests, introductions may occur at multiple locations. When multiple introductions occur, we explain in Appendix S1 how our approach can be modified accordingly and potential issues that may arise when fitting the model to data. When multiple introduction occur, our approach could identify areas and environmental conditions where new pathogens or pest introductions are likely to occur–effectively providing a map that is similar to Fig. 4, but showing areas at high risk for new introductions rather than a single past introduction. With this information, new introductions could be anticipated and perhaps avoided or optimal monitoring and control programs could be established (Williams et al. 2018; Laber et al. 2018).

Although our study system was epidemiological, dynamic statistical models have been used to explain the outbreak of pests (Powell and Bentz 2014; Hooten et al. 2013; Powell et al. 2018), the invasions of a species (Neupane and Powell 2015), the reintroduction of an extirpated species (Williams et al. 2017), and to determine optimal monitoring of ecological processes (Williams et al. 2018). Such a range of applications indicate that dynamic statistical models can be used for a variety of situations including to determine the location of remnant populations that enable outbreaks of pests, estimate the date, location, and propagule size where invasive species were introduced, or determine the habitat characteristics where an extirpated species can re-establish.


We thank all state, federal and other partners for submitting samples and the U.S. Geological Survey National Wildlife Health Center for processing the samples. At the time of publication data were not available for release because of the sensitivity related to the locations of the bat species sampled. Data can be requested from the following agencies that submitted samples. Agencies that submitted samples include Acadia National Park, Aerospace Testing Alliance Conservation, Alabama Department of Conservation and Natural Resources, Alabama Division of Wildlife and Freshwater Fisheries, Alaska Department of Fish and Game, U.S. Fish and Wildlife Service (USFWS) Alligator River National Wildlife Refuge, Arizona Game and Fish Department, Arkansas Department of Health, Arkansas Game and Fish Commission, U.S. Air Force, Avon Park Air Force Range, Bat Conservation International, Bat World Sanctuary, U.S. Department of Agriculture (USDA) Forest Service, Black Hills National Forest, Bureau of Land Management, Boston University, Buffalo National River (National Park Service), California Department of Fish and Game, Canadian Cooperative Wildlife Health Centre, Channel Islands National Park, Clinton Wildlife Management Area, Colorado Department of Natural Resources, USDA Forest Service Colville National Forest, Connecticut Department of Environmental Protection, Delaware Department of Natural Resources and Environmental Control, Department of the Army, Mississippi Department of Wildlife, Fisheries and Parks, Devils Tower National Monument, East Stroudsburg University, Ecological Sciences, Inc., Florida Fish and Wildlife Conservation Commission, Frenso Chaffee Zoo, Furman University, Georgia Department of Natural Resources, USDA Forest Service Gifford Pinchot National Forest, Great Smoky Mountains National Park, USDA Forest Service Hoosier National Forest, Humboldt State University, Idaho Department of Fish and Game, Illinois Department of Natural Resources, Illinois Natural History Survey, Indiana State University, Iowa Department of Natural Resources, U.S. Air Force, Langley, Kansas Department of Parks, Wildlife, and Tourism, Lake of the Ozarks State Park, Lake Vermilion-Soudan Underground Mine State Park, Lava Beds National Monument, USDA Forest Service, Lincoln National Forest, Louisiana Department Baton Rouge, Maine Department of Inland Fisheries and Wildlife, Maryland Department of Agriculture, Maryland Department of Natural Resources, Massachusetts Division of Fish and Wildlife, Michigan Department of Natural Resources, Ministere des Ressources naturelles et de la Faune, Minnesota Department of Natural Resources, Missouri Department of Conservation, Missouri State University, Montana Department of Fish, Wildlife and Parks, Nature Conservancy, Nebraska Game and Parks Commission, Nevada Department of Wildlife, New Hampshire Fish and Game Commission, New York Department of Environmental Conservation, North Carolina Wildlife Resources Commission, North Dakota Game and Fish, North Mississippi Refuges Complex, USFWS Noxubee National Wildlife Refuge, EL Malpais National Monument, Ohio Department of Natural Resources, Ohio Wildlife Center, Oklahoma State University, Oklahoma Department of Wildlife Conservation, Olentangy Wildlife Research Station, Oregon Caves National Monument, Oregon Department of Fish and Wildlife, Oregon Parks and Recreation Department, Ozark National Scenic Riverway, Ozark Plateau National Wildlife Refuge, Pennsylvania Game Commission, Pennsylvania State Animal Diagnostics Laboratory, Pima County Natural Resources, Parks, and Recreation, Rhode Island Department of Environmental Management, Rhode Island Division of Fish and Wildlife, USFWS San Bernardino/Leslie Canyon National Wildlife Refuge, South Dakota Department of Game, Fish and Parks, Session Woods Wildlife Management Area, USDA Forest Service, Shawnee National Forest, Sleeping Bear Dunes National Lakeshore, South Carolina Department of Natural Resources, South Mountain Wildlife Rehabilitation Center, Tennessee Wildlife Resources Agency, Texas Parakeet Raisers, Texas Parks and Wildlife, University of Central Oklahoma, University of Montreal, University of Nebraska, USFWS Upper Mississippi River National Wildlife and Fish Refuge, USDA Forest Service National Forests and Grasslands in Texas, USDA Forest Service Hot Springs Arkansas, USDA Animal and Plant Health Inspection Service Wildlife Services, USDA Forest Service North Central Research Station, USDA Forest Service Pacific Southwest Research Station, USFWS Arkansas Field Office, USFWS Ecological Services, USFWS Environmental Contaminants, Utah Division of Wildlife Resources, Vermont Fish and Wildlife, Virginia Department of Conservation and Recreation, Virginia Division of Natural Heritage, Virginia Game and Inland Fisheries, Washington Department of Fish and Wildlife, West Virginia Department of Natural Resources, USFWS Wheeler National Wildlife Refuge, USDA Forest Service White River National Forest, Wisconsin Dept of Natural Resources, Wisconsin State Lab of Hygiene, Wyoming Fish and Game. We thank Cydney Alexis, Nora Bello, Hannah Birg, Jeffrey Lorch, Bob O’Hara, Daniel Walsh, and Perry Williams for insightful discussions and comments that improved this manuscript. The authors acknowledge support for this research from USGS G16AC00413. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the U.S. Government.

Supplementary Material

Appendix S1

Tutorial with R code to simulated data and fit the Bayesian ecological diffusion model.

Appendix S2

Tutorial and R code to reproduce the white-nose syndrome data analysis.

Appendix S3

Simulation experiment description, results, and R code.

Appendix S4

Supplementary animations of Fig. 7.


  • Blehert et al. (2009) Blehert, D. S., Hicks, A. C., Behr, M., Meteyer, C. U., Berlowski-Zier, B. M., Buckles, E. L., Coleman, J. T., Darling, S. R., Gargas, A., Niver, R., et al. (2009). Bat white-nose syndrome: an emerging fungal pathogen? Science, 323(5911):227–227.
  • Broms et al. (2016) Broms, K. M., Hooten, M. B., Johnson, D. S., Altwegg, R., and Conquest, L. L. (2016). Dynamic occupancy models for explicit colonization processes. Ecology, 97(1):194–204.
  • Caley et al. (2015) Caley, P., Ramsey, D. S., and Barry, S. C. (2015). Inferring the distribution and demography of an invasive species from sighting data: the red fox incursion into tasmania. PloS one, 10(1):e0116631.
  • Catterall et al. (2012) Catterall, S., Cook, A. R., Marion, G., Butler, A., and Hulme, P. E. (2012). Accounting for uncertainty in colonisation times: a novel approach to modelling the spatio-temporal dynamics of alien invasions using distribution data. Ecography, 35(10):901–911.
  • Cook et al. (2007) Cook, A., Marion, G., Butler, A., and Gibson, G. (2007). Bayesian inference for the spatio-temporal invasion of alien species. Bulletin of Mathematical Biology, 69(6):2005–2025.
  • Cook et al. (2006) Cook, S. R., Gelman, A., and Rubin, D. B. (2006).

    Validation of software for Bayesian models using posterior quantiles.

    Journal of Computational and Graphical Statistics, 15(3):675–692.
  • Cressie (1993) Cressie, N. (1993). Statistics for Spatial Data. John Wiley & Sons.
  • Cressie and Wikle (2011) Cressie, N. and Wikle, C. K. (2011). Statistics for Spatio-Temporal Data. John Wiley & Sons.
  • Drees et al. (2017) Drees, K. P., Lorch, J. M., Puechmaille, S. J., Parise, K. L., Wibbelt, G., Hoyt, J. R., Sun, K., Jargalsaikhan, A., Dalannast, M., Palmer, J. M., et al. (2017). Phylogenetics of a fungal invasion: Origins and widespread dispersal of white-nose syndrome. mBio, 8(6):e01941–17.
  • Farlow (1993) Farlow, S. (1993). Partial Differential Equations for Scientists and Engineers. Dover Publicaitons.
  • Frick et al. (2010) Frick, W. F., Pollock, J. F., Hicks, A. C., Langwig, K. E., Reynolds, D. S., Turner, G. G., Butchkoski, C. M., and Kunz, T. H. (2010). An emerging disease causes regional population collapse of a common North American bat species. Science, 329(5992):679–682.
  • Garlick et al. (2014) Garlick, M. J., Powell, J. A., Hooten, M. B., and MacFarlane, L. R. (2014). Homogenization, sex, and differential motility predict spread of chronic wasting disease in mule deer in southern Utah. Journal of Mathematical Biology, 69(2):369–399.
  • Garlick et al. (2011) Garlick, M. J., Powell, J. A., Hooten, M. B., and McFarlane, L. R. (2011). Homogenization of large-scale movement models in ecology. Bulletin of Mathematical Biology, 73(9):2088–2108.
  • Gibson and Austin (1996) Gibson, G. and Austin, E. (1996). Fitting and testing spatio-temporal stochastic models with application in plant epidemiology. Plant Pathology, 45(2):172–184.
  • Gneiting (2011) Gneiting, T. (2011). Making and evaluating point forecasts. Journal of the American Statistical Association, 106(494):746–762.
  • Gneiting and Raftery (2007) Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378.
  • Gustafson (2015) Gustafson, P. (2015). Bayesian Inference for Partially Identified Models: Exploring the Limits of Limited Data. Chapman and Hall/CRC.
  • Heard et al. (2015) Heard, D., Dent, G., Schifeling, T., and Banks, D. (2015). Agent-based models and microsimulation. Annual Review of Statistics and Its Application, 2:259–272.
  • Hefley et al. (2017a) Hefley, T. J., Hooten, M. B., Hanks, E. M., Russell, R. E., and Walsh, D. P. (2017a). Dynamic spatio-temporal models for spatial data. Spatial Statistics, 20:206–220.
  • Hefley et al. (2017b) Hefley, T. J., Hooten, M. B., Russell, R. E., Walsh, D. P., and Powell, J. A. (2017b). When mechanism matters: Bayesian forecasting using models of ecological diffusion. Ecology Letters, 20(5):640–650.
  • Hobbs and Hooten (2015) Hobbs, N. T. and Hooten, M. B. (2015). Bayesian Models: A Statistical Primer for Ecologists. Princeton University Press.
  • Holmes et al. (1994) Holmes, E. E., Lewis, M. A., Banks, J., and Veit, R. (1994). Partial differential equations in ecology: spatial interactions and population dynamics. Ecology, 75(1):17–29.
  • Homer et al. (2015) Homer, C. G., Dewitz, J. A., Yang, L., Jin, S., Danielson, P., Xian, G., Coulston, J., Herold, N. D., Wickham, J. D., and Megown, K. (2015). Completion of the 2011 National Land Cover Database for the conterminous United States-representing a decade of land cover change information. Photogrammetric Engineering and Remote Sensing, 81(5):345–354.
  • Hooten et al. (2013) Hooten, M. B., Garlick, M. J., and Powell, J. A. (2013). Computationally efficient statistical differential equation modeling using homogenization. Journal of Agricultural, Biological, and Environmental Statistics, 18(3):405–428.
  • Hooten and Hefley (2019) Hooten, M. B. and Hefley, T. J. (2019). Bringing Bayesian models to life. CRC Press.
  • Hooten et al. (2010) Hooten, M. B., Johnson, D. S., Hanks, E. M., and Lowry, J. H. (2010). Agent-based inference for animal movement and selection. Journal of Agricultural, Biological, and Environmental Statistics, 15(4):523–538.
  • Hooten et al. (2017) Hooten, M. B., Johnson, D. S., McClintock, B. T., and Morales, J. M. (2017). Animal Movement: Statistical Models for Telemetry Data. CRC Press.
  • Hooten and Wikle (2008) Hooten, M. B. and Wikle, C. K. (2008). A hierarchical Bayesian non-linear spatio-temporal model for the spread of invasive species with application to the Eurasian Collared-Dove. Environmental and Ecological Statistics, 15(1):59–70.
  • Hooten and Wikle (2010) Hooten, M. B. and Wikle, C. K. (2010). Statistical agent-based models for discrete spatio-temporal systems. Journal of the American Statistical Association, 105(489):236–248.
  • James et al. (2013) James, G., Witten, D., Hastie, T., and Tibshirani, R. (2013). An Introduction to Statistical Learning. Springer.
  • Johnson et al. (2015) Johnson, J. S., Reeder, D. M., Lilley, T. M., Czirják, G. Á., Voigt, C. C., McMichael III, J. W., Meierhofer, M. B., Seery, C. W., Lumadue, S. S., Altmann, A. J., et al. (2015). Antibodies to Pseudogymnoascus destructans are not sufficient for protection against white-nose syndrome. Ecology and Evolution, 5(11):2203–2214.
  • Keith and Spring (2013) Keith, J. M. and Spring, D. (2013). Agent-based Bayesian approach to monitoring the progress of invasive species eradication programs. Proceedings of the National Academy of Sciences, 110(33):13428–13433.
  • Laber et al. (2018) Laber, E., Meyer, N. J., Reich, B. J., Pacifici, K., Collazo, J. A., and Drake, J. (2018). Optimal treatment allocations in space and time for on-line control of an emerging infectious disease. Journal of the Royal Statistical Society: Series C (Applied Statistics), 67(4):743–789.
  • Langwig et al. (2012) Langwig, K. E., Frick, W. F., Bried, J. T., Hicks, A. C., Kunz, T. H., and Marm Kilpatrick, A. (2012). Sociality, density-dependence and microclimates determine the persistence of populations suffering from a novel fungal disease, white-nose syndrome. Ecology Letters, 15(9):1050–1057.
  • Langwig et al. (2015) Langwig, K. E., Hoyt, J. R., Parise, K. L., Kath, J., Kirk, D., Frick, W. F., Foster, J. T., and Kilpatrick, A. M. (2015). Invasion dynamics of white-nose syndrome fungus, midwestern United States, 2012–2014. Emerging Infectious Diseases, 21(6):1023.
  • Le Comber et al. (2011) Le Comber, S. C., Rossmo, D., Hassan, A. N., Fuller, D. O., and Beier, J. C. (2011). Geographic profiling as a novel spatial tool for targeting infectious disease control. International Journal of Health Geographics, 10(1):35.
  • Lele (2010) Lele, S. R. (2010). Model complexity and information in the data: Could it be a house built on sand? Ecology, 91(12):3493–3496.
  • Little (2006) Little, R. J. (2006). Calibrated bayes: a Bayes/frequentist roadmap. The American Statistician, 60(3):213–223.
  • Lorch et al. (2010) Lorch, J. M., Gargas, A., Meteyer, C. U., Berlowski-Zier, B. M., Green, D. E., Shearn-Bochsler, V., Thomas, N. J., and Blehert, D. S. (2010). Rapid polymerase chain reaction diagnosis of white-nose syndrome in bats. Journal of Veterinary Diagnostic Investigation, 22(2):224–230.
  • Lorch et al. (2016) Lorch, J. M., Palmer, J. M., Lindner, D. L., Ballmann, A. E., George, K. G., Griffin, K., Knowles, S., Huckabee, J. R., Haman, K. H., Anderson, C. D., et al. (2016). First detection of bat white-nose syndrome in western North America. mSphere, 1(4):e00148–16.
  • McDermott et al. (2017) McDermott, P. L., Wikle, C. K., and Millspaugh, J. (2017). Hierarchical nonlinear spatio-temporal agent-based models for collective animal movement. Journal of Agricultural, Biological and Environmental Statistics, 22(3):294–312.
  • Meteyer et al. (2009) Meteyer, C. U., Buckles, E. L., Blehert, D. S., Hicks, A. C., Green, D. E., Shearn-Bochsler, V., Thomas, N. J., Gargas, A., and Behr, M. J. (2009). Histopathologic criteria to confirm white-nose syndrome in bats. Journal of Veterinary Diagnostic Investigation, 21(4):411–414.
  • Mohler and Short (2012) Mohler, G. O. and Short, M. B. (2012). Geographic profiling from kinetic models of criminal behavior. SIAM Journal on Applied Mathematics, 72(1):163–180.
  • Mollison (1986) Mollison, D. (1986). Modelling biological invasions: chance, explanation, prediction. Phil. Trans. R. Soc. Lond. B, 314(1167):675–693.
  • Neupane and Powell (2015) Neupane, R. C. and Powell, J. A. (2015). Invasion speeds with active dispersers in highly variable landscapes: multiple scales, homogenization, and the migration of trees. Journal of Theoretical Biology, 387:111–119.
  • Powell and Bentz (2014) Powell, J. A. and Bentz, B. J. (2014). Phenology and density-dependent dispersal predict patterns of mountain pine beetle (Dendroctonus ponderosae) impact. Ecological Modelling, 273:173–185.
  • Powell et al. (2018) Powell, J. A., Garlick, M. J., Bentz, B. J., and Friedenberg, N. (2018). Differential dispersal and the Allee effect create power-law behavior: distribution of spot infestations during mountain pine beetle outbreaks. Journal of Animal Ecology, 87(1):73–86.
  • R Core Team (2019) R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Ratcliffe et al. (1952) Ratcliffe, F., Myers, K., Fennessy, B., and Calaby, J. (1952). Myxomatosis in Australia: a step towards the biological control of the rabbit. Nature, 170(4314):7–11.
  • Renner et al. (2015) Renner, I. W., Elith, J., Baddeley, A., Fithian, W., Hastie, T., Phillips, S. J., Popovic, G., and Warton, D. I. (2015). Point process models for presence-only analysis. Methods in Ecology and Evolution, 6(4):366–379.
  • Salje et al. (2018) Salje, H., Cummings, D. A., Rodriguez-Barraquer, I., Katzelnick, L. C., Lessler, J., Klungthong, C., Thaisomboonsuk, B., Nisalak, A., Weg, A., Ellison, D., et al. (2018). Reconstruction of antibody dynamics and infection histories to evaluate dengue risk. Nature, 557(7707):719.
  • Smith et al. (2002) Smith, D. L., Lucey, B., Waller, L. A., Childs, J. E., and Real, L. A. (2002). Predicting the spatial dynamics of rabies epidemics on heterogeneous landscapes. Proceedings of the National Academy of Sciences, 99(6):3668–3672.
  • Stevenson et al. (2012) Stevenson, M. D., Rossmo, D. K., Knell, R. J., and Le Comber, S. C. (2012). Geographic profiling as a novel spatial tool for targeting the control of invasive species. Ecography, 35(8):704–715.
  • Tingley et al. (2012) Tingley, M. P., Craigmile, P. F., Haran, M., Li, B., Mannshardt, E., and Rajaratnam, B. (2012). Piecing together the past: statistical insights into paleoclimatic reconstructions. Quaternary Science Reviews, 35:1–22.
  • Tipton et al. (2016) Tipton, J., Hooten, M., Pederson, N., Tingley, M., and Bishop, D. (2016). Reconstruction of late holocene climate based on tree growth and mechanistic hierarchical models. Environmetrics, 27(1):42–54.
  • Tobin and Weary (2004) Tobin, B. D. and Weary, D. J. (2004). Digital engineering aspects of Karst map: a GIS version of Davies, WE, Simpson, JH, Ohlmacher, GC, Kirk, WS, and Newton, EG, 1984, Engineering aspects of Karst: US Geological Survey, National Atlas of the United States of America, Scale 1: 7,500,000. Technical report, US Geological Survey.
  • Verity et al. (2014) Verity, R., Stevenson, M. D., Rossmo, D. K., Nichols, R. A., and Le Comber, S. C. (2014). Spatial targeting of infectious disease control: identifying multiple, unknown sources. Methods in Ecology and Evolution, 5(7):647–655.
  • Wikle et al. (2019) Wikle, C., Zammit-Mangion, A., and Cressie, N. (2019). Spatio-temporal Statistics with R. CRC Press.
  • Wikle (2003) Wikle, C. K. (2003). Hierarchical Bayesian models for predicting the spread of ecological processes. Ecology, 84(6):1382–1394.
  • Wikle et al. (1998) Wikle, C. K., Berliner, L. M., and Cressie, N. (1998). Hierarchical Bayesian space-time models. Environmental and Ecological Statistics, 5(2):117–154.
  • Wikle et al. (2003) Wikle, C. K., Berliner, L. M., and Milliff, R. F. (2003). Hierarchical Bayesian approach to boundary value problems with stochastic boundary conditions. Monthly Weather Review, 131(6):1051–1062.
  • Wikle and Hooten (2010) Wikle, C. K. and Hooten, M. B. (2010). A general science-based framework for dynamical spatio-temporal models. Test, 19(3):417–451.
  • Wikle et al. (2001) Wikle, C. K., Milliff, R. F., Nychka, D., and Berliner, L. M. (2001). Spatiotemporal hierarchical Bayesian modeling tropical ocean surface winds. Journal of the American Statistical Association, 96(454):382–397.
  • Williams et al. (2018) Williams, P. J., Hooten, M. B., Womble, J. N., Esslinger, G. G., and Bower, M. R. (2018). Monitoring dynamic spatio-temporal ecological processes optimally. Ecology, 99(3):524–535.
  • Williams et al. (2017) Williams, P. J., Hooten, M. B., Womble, J. N., Esslinger, G. G., Bower, M. R., and Hefley, T. J. (2017). An integrated data model to estimate spatiotemporal occupancy, abundance, and colonization dynamics. Ecology, 98(2):328–336.
  • Wood (2017) Wood, S. N. (2017). Generalized Additive Models: An Introduction with R. Chapman and Hall/CRC.