1 Introduction
Studying the population evolution of arthropod pests, as well as of biological control agents, is of great importance for the crop primary production, agricultural infrastructures, spreading diseases and consequently the economy (bradshaw2016massive). Temperature and body size are two major determinants that influence the metabolic, survival, growth and reproduction rates which control the ecological processes at all levels of arthropods’ life (brown2004toward). Biological control is facilitated when the climate responses of biocontrol agents are understood, especially to temperature. The thermal thresholds for insect development can be estimated using several functional forms (kontodimas2004).
In all but the simplest cases, mathematical modelling is an indispensable tool for understanding the resulting developmental scheme (kontodimas2004). However, fitting ecological models for developmental rates is not straightforward, typically because the mathematical forms are not linear (papanikolaou2019elucidating) and the actual biochemical reactions of insects or environmental factors responsible for their growth may remain unobservable. Therefore, we adopt the Bayesian paradigm to population dynamics’ modeling and inference since it naturally accounts for latent parameters and their uncertainties. Nonetheless, there are significant challenges in designing statistical methods that work efficiently in a wide range of ecological applications. Stan (carpenter2017stan) provides a BUGSlike interface to model building and the ability to run it via different languages and operating systems.
1.1 Motivation for modeling developmental rate
Insects and mites, as ectotherms, regulate their body temperature according the environment they live in (norris2012effects). This affects the rate of metabolism, i.e., the biochemical reactions that allow the processes of production and release of energy, as well as the synthesis of necessary molecules that serve as structural or functional components (neven2000physiological). In fact, temperature affects the functionality of enzymes, which act as catalysts for these biochemical reactions (van2008slaves). Consequently, within a range of temperatures in which insects and mites develop and reproduce, various biological features are affected (broufas2001development; huey2001temperature; broufas2007development; nedvved2009temperature; papanikolaou2013temperature; papanikolaou2014life). Thus, their performance is indebted to several temporal fluctuations in terms of population size through time. The empirical finding of the initial increase in the growth rate of insects and mites in relation to temperature, followed by its sharp decline, formed the basis for the development of various mathematical models of its description (kontodimas2004). These models allow the estimation of the lower and upper thermal limits, i.e. the lowest and highest temperature, respectively, at which the growth rate is zero, as well as the temperature at which it receives its maximum value. Understanding populations’ growth rate is of importance, as their assessment can lead to decisions on their management (hare2011understanding), particularly under the pressure of climatic change (bradshaw2016massive).
1.2 Historical overview with models and techniques used in literature
This paper investigates some popular nonlinear ecological models that describe the rate of insects’ and mites’ development within a Bayesian context. We explore the computational and statistical efficiency of Hamiltonian Monte Carlo (HMC) (betancourt2017conceptual; neal1901mcmc) and Variational Bayes (VB) (blei2017variational)
, a challenging task in the present setting due to distinct features of the entertained models, including truncation. Both the widely used Gaussian distributional assumption and a newlydeveloped Inverse Gammabased version are explored in order to model the developmental rate distribution of insects and mites. We compare the models using information criteria, models marginal likelihood estimates and graphical tools. In addition, model averaging techniques are used to provide robust estimates of the parameters of interest. A distinct feature of these models is that zero count data make one parameter of interest indeterminable and model fit potentially misleading. We propose ways to overcome this indeterminacy by applying the Zero Inflated Inverse Gamma distribution while carefully connecting the probability of nonzero development to the predictor.
2 Ecological models
A variety of different linear and nonlinear equations have been used to describe the rate of development of insects and mites and to estimate their thermal limits. Such linear approximations enable the calculation of lower developmental threshold and thermal constant within a small temperature range, usually 1530^{o}C (campbell; wagner; jarovsik2002; kontodimas2004). However, the relationship between development and temperature becomes nonlinear outside that range. Thus, in order to accurately predict developmental rates across the spectrum, the use of nonlinear ecological models is required (wagner; kontodimas2004; damos2012temperature).
2.1 Nonlinear ecological models
In the present work, four commonly used nonlinear ecological models are considered. Specifically, the Bieri (bieri1983development), the Briere (briere1998comparison; briere1998modeling; briere1999novel), the Analytis (analytis1981relationship) and the Lactin (lactin1995improved) models are implemented. Developmental time is the duration between life stages of the insects or mites (wagner)
. The response variable
describes the developmental rate and it is defined as the reciprocal of the days until the completeness of a particular developmental event. Herein,denotes the predictor variable, the absolute temperature measured in Celsius degrees, while
denotes the parameter vector of the model. Typically, the expected developmental rate,
is modelled and the four aforementioned models are presented below.2.1.1 Bieri Model
In the Bieri model, the developmental rate is defined as
(1) 
where , , and are the model parameters. In particular, the values of and lie close to the real lower and upper thermal thresholds and , at which the development starts or ceases respectively. The exact values of the thermal thresholds are derived implicitly as the lower and upper roots respectively of the response function. The parameter corresponds approximately to the rate of increase in the linear model at vital temperatures (bieri1983development). The response variable in (1) is a concave function of the temperature. Parameter is defined in the interval while determines the decrease of the developmental rate at higher temperatures (bieri1983development) when exceeds unity. According to (1), we observe that:
and ,
which suggests that and leads to the following inequality
The temperature at which the maximum developmental rate occurs is called optimum and it is denoted by . In the Bieri model, it is given by
In Fig. 1 some curves are generated by the Bieri model in (1) when C and C while the other parameters vary.
2.1.2 Briere Model
The Briere model is the most popular and parsimonious model. The developmental rate is defined as
(2) 
where , and are model parameters. Particularly, and are exactly the lower and upper thermal thresholds at which the development starts or ceases respectively while parameter is an empirical constant (briere1998modeling). The response variable in (2) is again a concave function of the temperature. Parameter is defined in whereas the existence of the square root in (2) ensures that the developmental rate declines sharply at higher temperatures. The optimum temperature in the Briere model is given by
In Fig. 2 some curves are created by the Briere model in (2) when and while the parameter varies.
2.1.3 Analytis Model
The developmental rate in Analytis model is defined by
(3) 
where , , , and are parameters of this model. The exponents and in (3) are empirical constants that determine the rate of growth and decrease of the developmental rate respectively (analytis1981relationship). Both these parameter take values in but in order to reduce the computational burden we may restrict them in a subset of the form , for some constant . Finally, takes values in interval. The optimum temperature in this model is given by
The Analytis model has a multiplicative polynomial structure in which the exponents change as parameters to be estimated. Such a structure needs some empirical driven tuning when defining its thermal parameters space. Especially, in the case of one dataset, we assumed that is greater than . Some curves generated by the Analytis model in (3) are depicted in Fig. 3 when and , while parameters and vary.
2.1.4 Lactin Model
The Lactin model includes four parameters and the developmental rate is defined as
(4) 
where is associated with the upper thermal threshold since it tends to this value when tends to be zero. Parameter represents an asymptotic level of the developmental rate value in (4) that is approximated when predictor tends either to (extremely low temperatures) or to the threshold parameter . Thus, in the event that the is nonnegative, the Lactin ecological function does not have a lower thermal threshold () and at the same time as the developmental rate value in (4) is limited above zero level, at which the maximum thermal thresholds undergoes. In the case that is negative, and sample space is in the interval. Parameter is positive and it determines the descent steepness of the developmental rate. It expresses the temperature range between the value at which the response function begins to descend and the value of the parameter. When is less than one, the rate of descent is very high, although in the other case the rate of descent is lower and similar to the other ecological models. This feature triggers discontinuity and lack of fit problems. Hence, in this work we define in interval in order to avoid such problems. Parameter describes the acceleration of the function from low temperatures to the optimal temperature (lactin1995improved). The response function in (4
) has one inflection point, a maximum point at the optimum temperature and asymmetry about this point (left skewed). Also the function has a sharp drop after the optimum temperature, which is achieved by setting
in . The actual upper thermal threshold is evaluated as the higher root of response function in (4). The optimum temperature in the Lactin model is given by(5) 
In addition, the temperature at the inflection point of the Lactin curve is given by
(6) 
In Fig. 4 some curves are created by the Lactin model in (4) when and the parameters , and vary.
2.2 Ecological features of the models
There are some basic common features in all of the abovementioned ecological models.
There is no growth below the lower temperature threshold or above the upper temperature threshold . Specifically, in the case of the Briere and Analytis models, the developmental rate is positive and is defined only between the two parameters of the thermal thresholds. Also, the developmental rate in the Bieri and Lactin models can take negative values that cannot be interpreted. To accommodate these characteristics, we use initial values of the parameters that allow the ecological function to receive positive values.
The developmental rate is an asymmetric curve left skewed of its maximum point. It increases and reaches a maximum at optimal temperature while it declines rapidly down to zero at the higher temperature threshold that is considered as lethal temperature. It includes an inflection point, with the exception of the Bieri model which is a concave function of the temperature. The structure of the Lactin model makes it susceptible to a type of exponential pattern in datavalues (Fig. 4). The Briere model, on the contrary, has a specific structure, which has the first derivative of class and can hardly trace exponential changes in data values (Fig. 2
). Also, the model of Analytis has polyonimic structure as the model of Briere but it does contain more degrees of freedom because the exponents it includes are unknown parameters (Fig.
3). The Bieri model also adopts an exponential reduction after the thermal optimum threshold but can only follow the developmental increase of the dataset linearly (Fig. 1). All four ecological models have been used in the literature to provide reasonable estimates of the thermal thresholds of several anthropods’ developmental rates at various stages (bieri1983development; kontodimas2004; aghdam2011evaluation). However, the Bieri model is underutilized in the literature, so we include it in this study to gain a better understanding of ecological models that describe temperaturedependent development. Additionally, in (kontodimas2004; aghdam2011evaluation) the ecological models were compared based on the accuracy of the real data thermal threshold estimates, the adjusted coefficients of regression (), and the residual sum of squares values. In summary, the Briere and Analytis models, appear to overestimate and underestimate the upper and lower thermal thresholds, respectively, whereas the Bieri and Lactin models appear to meet the majority of the criteria used in the comparisons. Despite these minor differences, all of the above models appear to provide higher values than other models in realdata applications in the literature and we include all four in the current work.2.3 Measurement Error
Probabilistic random error due to chance and systematic error due to data with excessive zeros is added in order to include uncertainty in the ecological models already listed. In section 2.3.1, we present the notation adopted and the probability schemes implemented in this analysis.
2.3.1 The likelihood of the data
Let represent the observed developmental rates of the individual observed at temperature, where . We consider to be independent response variables counted as the reciprocal of the number of days until the development of the individual takes place and the range of its value lies in the interval.
Furthermore, let , and be the vectors of the response, the predictor and the parameters respectively then the conditional expectation is considered to vary according to the ecological function presented in (1), (2), (3) and (4) respectively.
The data distribution is denoted by
In this study we consider the Gaussian and the InverseGamma distributions as the distribution of the response data
. Specifically, the Gaussian distribution is used broadly in the literature as a good approximation to most unimodal distributions with finite variance due to the general form of the Central Limit Theorem. Thus, it can be used as to approximate a more complicated model likelihood of the data. The nonlinear model of the independent response rates has the Gaussian distribution given by
(7) 
where the mean of the Gaussian likelihood is driven by the respective ecological model, whereas its standard deviation
is considered as an unknown parameter. We consider a weakly informative prior distribution for like: the InverseGamma . The nonlinear model used in (7) is a constant variance model among different temperatures. In addition, it allows for zero observed rate values, which occur when the stage of the insect does not change in perpetuity. Such modeling, however, has the disadvantage of lack of interpretability in the case of estimated negative rate values.On the other hand, the InverseGamma distribution is a plausible alternative for modeling positive observed rates, as it handles positive values that describe ratios such as developmental rates whose inverse are positive counts (such as days passed until the expected development occurs) that can be described by the Gamma distribution. The nonlinear model of the independent response rates has the InverseGamma distribution given by
(8) 
where is the shape parameter of the InverseGamma distribution. The mean of the distribution equals , whereas the variance equals . The InverseGamma likelihood in (8) is a natural alternative to model observed positive rates. Herein, its mean is driven by the respective ecological function, whereas its variance depends both to the shape parameter and the ecological function as well, which allows the variance to be temperature dependent. For the prior distribution of the shape parameter we have chosen a weakly informative .
2.3.2 Zerorates case
There are situations in which any insect development does not occur throughout the cohort study. This is indicated by zero values in the response variable, which can theoretically be interpreted as the number of days required for the insect to move to its next stage never ending, implying that the developmental rate is the reciprocal of infinity. In such cases, the MCMC sampling procedure can be extended either to include prior information about the case of no development by adjusting the prior knowledge of the parameters concerned or to include a zeroinflation scheme. For this work, we suggest the use of a Zero Inflated Inverse Gamma distribution, which gives zero value with probability for the insect observed at
temperature. Particularly, the probability density function of the observation
is:(9) 
where is a constant positive parameter that has the opposite order of magnitude of the sample mean , while is the inflation point of logit link function where the probability of zero is equal to . The constant is associated with the constant and can be chosen so that the zero rate of development matches the probability at a predetermined level like 0.9. According to the real data example presented below in the results section, the proposed values that satisfy the above criteria for constants and are and , respectively.
2.3.3 Priors
In the Bieri model (1) the prior distribution of the parameters , , and are shown in Table 1 respectively.
In the Briere model (2), the transformation is considered instead of the original parameter . Weakly informative priors are considered for , and as shown in Table 1 respectively. In the Analytis model (3) the transformation is considered instead of with the weakly prior distribution . Additionally, the prior distributions utilized for parameters , , and are given in Table 1 respectively.
In the Lactin model (4) a new parametrization is used. Hence, the new transformed parameters are:
are considered instead of the original parameters , and , respectively. By their definition, the new parameters are taking values in , and . The latter is derived by the definition of the optimum temperature in the Lactin model (5) and the fact that . In addition, should the temperature at the inflection point be a positive number, then from (6) we can get that . The prior distributions considered are shown in Table 1 respectively for each ecological model.
Models  Parameters  Priors  Models  Parameters  Priors 
Bieri  U(0,1)  Briere  
Analytis  
Lactin  
U(0,1)  
U(0,1) 

transformed parameter used.
3 Application and results
We adopt the Bayesian paradigm to inference and model selection. For concreteness we give the discernible features of the main tools we utilize in section C in the appendix.
3.1 Data application
The reciprocals of the days counted express the observed rates of the insects and mites from egg to adult stage. The range of the observed rates are at [0, 1] interval. The zero value indicates that no development is observed. This situation implies the existence of a truncation point which gives explicitly an upper bound for the upper thermal threshold.
Two datasets are used in this analysis. They concern the study of the twospotted mite, Tetranychus urticae (barber2003biocontrol) and the fourteenspotted ladybird beetle, Propylea quatuordecimpunctata (papanikolaou2013temperature). The Tetranychus urticae data developmental rates have minimum 0.019 at and maximum 0.182 at . The Propylea quatuordecimpunctata dataset consist of 105 beetles and their developmental rate have minimum 0 at and maximum 0.111 at .
3.2 Tetranychus urticae
There are 247 mites that have inserted adult stage until the study ended. The four ecological models are used both assuming the Gaussian and the Inverse Gamma distributions for the data. The information criteria along with the estimates of the marginal likelihood for each model are provided in Table 2.
AIC  DIC  LooCV  WAIC  BIC  (se)  (se)  (se)  

Bieri  1663.2  1667.7  1672.0  1672.1  1638.6  796.1 (28.6)  777.2 (19.0)  792.6 (7.1)  
Briere  1592.2  1595.8  1595.8  1595.9  1574.7  759.6 (39.0)  728.2 (12.4)  758.0 (6.9)  
Analytis  1738.3  1736.9  1744.0  1744.0  1717.3  815.9 (40.5)  830.2 (11.0)  821.6 (12.3)  
Lactin  1673.9  1675.6  1689.1  1689.1  1638.8  797.8 (40.0)  821.8 (23.2)  793.8 (56.4)  

Bieri  1715.6  1716.0  1717.2  1717.2  1698.1  837.4 (42.6)  839.7 (9.0)  832.1 (10.0)  
Briere  1749.4  1749.4  1746.3  1746.3  1735.4  846.7 (46.1)  874.0 (12.8)  848.0 (3.5)  
Analytis  1889.0  1887.3  1889.0  1889.1  1867.6  904.2 (42.6)  928.2 (16.3)  903.9 (26.2)  
Lactin  1911.4  1911.0  1910.5  1910.6  1893.9  920.9 (30.3)  956.4 (17.0)  922.0 (6.5) 

denotes the logarithm of estimated marginal likelihood via Importance sampling,

denotes the logarithm of estimated marginal likelihood via Power posterior,

denotes the logarithm of estimated marginal likelihood via Bridge sampling.
Both Information criteria and marginal likelihood results clearly suggest that the Inverse Gamma distribution has better fit than the Gaussian distribution at Tetranychus urticae dataset across all the ecological models. Furthermore the Lactin model with the Inverse Gamma distribution stands out against all the other cases. In the Gaussian case, the Analytis model excels.The use of the Inverse Gamma distribution, on the other hand, not only increases the Analytis model’s efficiency but also adds flexibility to the Lactin model. The Briere model has the poorest criteria values (Table 2). Although the information criteria have fairly indicated the Briere model as the most suitable for the data, when we concentrate on the upper thermal threshold estimate, its performance is poor compared to the other ecological models shown in Fig. 5. Nevertheless, even though there is a clear picture concerning information criteria values between the ecological models, there is some variation between marginal likelihood estimates within ecological models in Table 2.
Posterior means, credible limits and the effective sample size (neff) of the thermal thresholds and the deviance of the four ecological models are summarised in Tables 5 and A.1 using the Inverse Gamma and the the Gaussian distribution, respectively. In addition, the HMC, ADVI meanfield and ADVI fullrank methods appear alternately in each column for each parameter of interest.
The credible limits estimates between the four ecological models do not overlap in Tables 5 and A.1. Bieri model has greater limits and Lactin model gives negative value estimates in the Inverse Gamma case Table 5. The credible limits for overlap between Bieri and Analytis whereas the estimates are lower for Lactin model and greater for Briere model. The credible limits overlap for Bieri and Lactin in Table A.1, has higher values for Briere and lower for Analytis model.
The credible limits using ADVI meanfield and fullrank methods overlap with the HMC credible limits in most cases in Tables 5 and A.1. Also the ADVI fullrank method seems to be closer to the HMC estimates as in Briere and Lactin models in Table 5. However in general it has worst fit than the corresponding fitted model using HMC and also gives wider credible intervals.
Bayesian model averaging provides alternative estimates for the parameters of interest, combining the predictive efficiency of all four ecological models. The derived weights are shown in Table A.1 whereas the BMA estimates and their credible intervals are given on Table A.1 and are divided into the data case of the Gaussian distribution and the data case of the Inverse Gamma distribution. The predictive bias of the Analytis and Lactin models appear to affect the model averaging estimates of , and the deviance in the Gaussian and the Inverse Gamma case respectively. Also different weights based on (20) give almost identical credible limits. However using ELBO based BMA weights do not give robust estimates of the parameters of interest which is not unexpected since the ELBO is a lower bound estimate of the marginal likelihood. Furthermore, the time elapsed until the completion of the algorithm for ADVI methods is up to 52 seconds, while for the HMC method is at least 476 seconds for Tetranychus dataset as shown in Table A.1. The average difference between the working times of HMC and each ADVI method is around 3500 seconds (58 minutes) whereas between the meanfiled and fullrank is around 18 seconds. These findings illuminate the VBI fullrank method’s timeefficacy.
angle=0,scale=0.6
neff
neff
Mean
&
Cr. I.
Bieri
9.4
(9.3, 9.6)
9.3
(9.1, 9.4)
9.3
(9.0, 9.5)
19741
33.7
(33.2, 34.1)
145.5
(134.8, 149.8)
203.2
(114.5, 326.6)
8888
Briere
6.6
(6, 7)
6.6
(6.4, 6.7)
6.5
(6.1, 7)
7933
36.7
(35.2, 38.5)
36.6
(36.1, 37)
36.6
(35, 38.3)
6978
Analytis
4.2
(4, 4.6)
7.6
(7.5, 7.8)
6.0
(4.4, 9.4)
9977
33.6
(33.3, 34)
99.9
(96.6, 103)
79.9
(16.8, 239.3)
7876
Lactin
18.7
(18.7, 18.7)
7.8
(7.6, 8.1)
18.7
(18.8, 18.7)
18393
32.0
(31.8, 32.2)
32.1
(31.8, 32.3)
33.9
(33.8, 34.1)
12228
neff
dev
neff
Mean
&
Cr. I.
Bieri
35.8
(35.4, 36.5)
150.8
(149.2, 152.5)
209.0
(114.5, 326.6)
7738
1720.9
(1724.9, 1713)
1652.9
(1658.6, 1637.6)
1652.4
(1658.5, 16)
10228
Briere
45.0
(43.1, 47.3)
44.8
(44.2, 45.4)
44.9
(42.8, 47)
6935
1753.3
(1757, 1745.9)
1752.9
(1757.1, 1741.4)
1751.9
(1756.7, 1740.4)
8638
Analytis
35.0
(35, 35.1)
99.9
(96.6, 103.1)
80.9
(17.5, 240.6)
9038
1894.0
(1899.2, 1885)
1670.9
(1691.8, 1641.4)
1495.1
(1687.4, 149.5)
12370
Lactin
38.4
(38.1, 38.9)
42.6
(42.4, 42.8)
38.6
(38.2, 39)
8284
1916.3
(1920.6, 1908.3)
1705.8
(1747.8, 1650.5)
1890.0
(1918.5, 1791.8)
9938

deviance of the model given the data.

effective sample size.
In Fig. 5 the predicted posteriors versus Tetranychus urticae data using Inverse Gamma distribution are shown for the ecological models.
The adaptivity in data across predictor values is clear in all the ecological models, where the credible limits are adjusted to data variance in each temperature level. On the other hand, in the Gaussian case (Fig. 7) the variance of the posterior remains constant across predictor values.
3.3 Propylea quatuordecimpunctata dataset
There are 17 out of 105 insects that have not altered their egg status until the study ended. These cases are observed at and are indicated by zeros in the response variable y. The Inverse Gamma distribution is not defined for zero response values. On the other hand, the Bieri and Lactin models can also generate negative values. Hence, in order to account for the presence of zeros, we use a Zero Inflated Inverse Gamma model and choose parameter initial values so as the scale of the Inverse Gamma to remain positive. Both the Gaussian and the Zero Inflated Inverse Gamma distributions are used for the data along with the four ecological models. The various model selection tools are summarised in Table A1.
Both Information criteria and marginal likelihood results clearly suggest that the Zero Inflated Inverse Gamma distribution has better fit than the Gaussian distribution at Propylea quatordicempuncata only at the Analytis model. Furthermore Bieri and Analytis models stand out in the Gaussian and the Inverse Gamma case respectively, whilst the Briere model has the poorest criteria values (Table A1). Nevertheless, even though there is a clear picture of the goodness of fit between the ecological models, there is some variation between marginal likelihood estimates within ecological models in Table A1.
In the Gaussian model, the Bieri and Lactin models stand out according to the information criteria and marginal likelihood estimates, the Analytis model follows, whilst the Briere model has the lower criteria values. On the contrary, in the Zero Inflated Inverse Gaussian model, the Analytis model is a better choice according to the marginal likelihood and the information criteria values, while Bieri, Lactin and Briere models follow respectively. Lactin and Bieri can be interpreted by their ability to track an exponential datavalue decrease. Additionally, the Bieri model manages to capture the linear increase in the Propylea quatuordecimpunctata dataset. The Briere model, on the other hand, lacks performance due to its unique structure, which requires that the decline of the response variable as the temperature rises be of the form . The Analytis model has similar multiplicative structure to the Briere model, but it is more complex model since the exponents of its model are unknown variables that make it adaptive and liable to data change especially in the Inverse Gamma case. Nevertheless, the Analytis model needs some tuning when defining its thermal parameters space. This is necessary to avoid allowing values close to zero, which would result in parameter underestimation and poor fit in the Analytis model. Especially, in the case of the Propylea dataset, we aprior assumed that is greater than . Posterior means, credible limits and the effective sample size of the thermal thresholds and the deviance of the four ecological models are summarised in Tables 6 and A.1 using the Zero inflated Inverse Gamma and the Gaussian distribution respectively. Specifically, the HMC, ADVI meanfield and ADVI fullrank methods appear alternatively in each column of Tables (6 and A.1) for each parameter of interest. Briere model has greater 95% credible limits for while Lactin model gives negative value estimates. The results for overlap for Bieri and Analytis in the Zero Inflated Inverse Gamma case whereas the estimates are lower for Lactin model. The credible limits overlap between Bieri and the other ecological models both in the Gaussian and the Zero Inflated Inverse Gamma case. The zerorate values can be naturally modelled in the case of the Gaussian distribution. Though only the Bieri and Analytis models give wider credible intervals. For the rest of the models, the credible limits difference is lower than three degrees . In addition, in Zero Inflated Inverse Gamma case, the Analytis model has higher information criteria values, while the Bieri and Lactin models follow as shown in Table A1, respectively. All four ecologocal models incorporate terms into their structure in the form of products which directly affect the scale of the Inverse Gamma distribution. Thus, not only the mean but also the variance of the statistical model change along and adapt the flunctuations for each temperature level as shown in Fig. 9. This adaptivity is evident across all the four ecological models in use.
The outcomes for 95% credible limits do not overlap, while Briere model has larger values. The Lactin model is not suitable to provide reasonable estimates as it allows for negative values (Table 6). The results for are sorted in ascending order from Bieri, Lactin, Analytis, and Briere. We observe that the credible limits for the overlap between models and do not exceed the maximum observed temperature . Concerning the Zero Inflated Inverse Gamma case, the posterior predictive probabilities of nonzero entries, which are the probabilities of anthropods’ development, are shown in Figure 6. We observe that the probabilities tend to unity near estimates whereas become negligible out of the thermal threshold limits except for the Lactin model case which does not give robust estimates for . In addition, the 95% credible limits estimates using ADVI meanfield and fullrank methods overlap with the HMC estimates with exceptions at the Lactin model in which the fullarank method seems to be closer to the HMC estimates (Tables A.1 and 6). In many cases, the ADVI fullrank method agrees with HMC. It seems to give more robust estimates for mean in Gaussian case and Bieri model. However it has worst fit than HMC and also gives wider credible intervals.
angle=0,scale=0.6 neff neff Mean & 95% Cr. I. Bieri 9.9 (9.4, 10.3) 9.4 (9.3, 9.6) 9.2 (8.7, 9.7) 9138 32.7 (32.2, 33.8) 73.0 32.6, 146.6 109.4 (28.7, 320.1) 16552 Briere 11.1 (10.3, 11.9) 11.2 (10.8, 11.5) 11.2 (10.4, 12.0) 14192 29.3 (29.2, 29.5) 29.3 (29.2, 29.4) 29.3 (29.2, 29.5) 17300 Analytis 5.0 (4.0, 7.0) 5.1 (4.1, 8.7) 4.4 (4.2, 4.9) 352 33.5 (32.3, 34.9) 33.0 (29.71, 37.0) 32.2 (30.5, 33.9) 1240 Lactin 133.51 (157.5, 116.8) 11.6 (1.2, 59.2) 12.5 (409.8, 8.5) 11511 30.9 (30.8, 31.1) 39.7 (1.1, 211.2) 25.9 (5.5, 77.7) 9847 neff dev neff Mean & 95% Cr. I. Bieri 34.4 (33.7, 34.9) 18.1 (9.3, 52.6) 16.0 (8.8, 52.5) 12972 722.3 (725.7, 714.8) 318.5 (360.4, 318.8) 323.8 (572.1, 309.3) 9962 Briere 35.0 (34.8, 35.0) 34.9 (34.8, 35.0) 34.8 (34.8, 35.0) 23972 563.8 (568.0, 555.9) 563.1 (567.7, 552.0) 561.9 (567.4, 549.7) 14077 Analytis 33.6 (32.5, 34.9) 33.6 (30.3, 37.6) 32.5 (30.7, 34.4) 1409 736.2 (742.1, 729.1) 411.6 (482.3, 216.9) 684.5 (742.5, 618.8) 1117 Lactin 34.9 (34.7, 35.0) 25.3 (1.1, 76.3) 45.1 (935.4, 52.5) 16361 703.6 (708.1, 695.0) 247.8 (0.0, 294.8) 247.8 (0.0, 294.8) 9179

deviance of the model given the data,

effective sample size.
Furthermore, the time elapsed until the completion of the algorithm for ADVI methods is up to 48 seconds, while for the HMC method it is at least 321 seconds for Propylea dataset as shown in Table A.1. The average difference between the working times of HMS and each ADVI method is around 6300 seconds (105 minutes) whereas between the meanfield and fullrank is around 12 seconds. These findings illuminate both the VBI fullrank method’s computational timeefficacy.
4 Conclusions
Conclusions of the current work can be summarised into five groups.
4.1 Comparison of Computational Methods
Although ADVI methods agree with HMC in many of cases and are distinguished in timeefficacy, robust estimates do not appear to be available for all parameters of interest, with the ADVI fullrank method providing better estimates than the ADVI meanfield mehod. The two datasets application show that they are sensitive to the initialization of the parameters and the rootfinding algorithm used, in particular for more complicated models such as Analytis and Lactin. For example, if the root initial suggestion is positive, the root generated is also positive, even if the HMC results are negative, like in Tables(5, 6, A.1). However, in the same tables, the VBI fullrank method’s credible limits overlap with those of the HMC method more frequently than the VBI meanfield method, indicating that the ADVI fullrank method has little better computational efficiency.
4.2 Distribution of the data
Inverse Gamma distribution is a promising alternative to the classical Gaussian distribution in the analysis of the rate of development of anthropods in the absence of zero data and also provides adaptive temperaturelevel variance predictions across all ecological models. It also reinforces the flexibility of the Lactin model. In addition Inverse Gamma distribution naturally models the developmental rates that are defined as the reciprocal of positive real numbers.
4.3 Zeroinflation performance
Zero Inflated Inverse Gamma distribution can be used in the presence of nondevelopment (zero data) after defining the probability of generating zeros. In this way the developmental rates are naturally modeled and also the predictions are variance adaptive to the temperature levels. Finally, it performs better than the classical Gaussian model in the Analytis case as far as information criteria scores and marginal likelihood estimates are concerned.
4.4 Model comparison
For the model comparison, both the information criteria and the marginal probability values agree on the prioritization of performance across ecological models at both datasets. Nevertheless, there is some variation between marginal likelihood estimates within ecological models and their is no clear pattern across the marginal likelihood approaches.
There is no standard ecological model that has better fit measured by information criteria and the estimated marginal likelihood than the others in all the situations. However, even in the presence of zero response values, the Bieri ecological model appears to have robust results using the Gaussian distribution and stands out when there is a linear increase of the developmental rate like in the Propylea quatuordecimpunctata dataset case. In addition, it provides interpretable estimates of interest parameters with credible intervals, often overlapping with that of the Analytis or the Lactin model, using weakly informative priors. By contrast, the Bieri model can not follow the nonlinear increase in the rate of development of the Tetranychus data, as do the more sophisticated Analytis or / and Lactin models.
Lactin model has better fit in the case of exponentially fluctuations of the developmental rate, but is inadequate for estimation. Analytis model needs tuning for sample space so as not to underestimate it and provides robust estimates for the other thermal parameters involved in the model.
Briere and Analytis models by definition includes the true threshold parameters and along with a functional truncation between these threshold parameters, which do not affect their performance. Moreover, the Briere model appears to have the worst marginal likelihood values, as well as the worst predictive performance for both Gaussian and Inverse Gamma distributions, despite having the least parameters and being widely used. It also seems to generate estimates with small 95% credible intervals and higher values. Although statistical inference based on the information criteria have reasonable values for Briere model, when we focus on a particular parameter of interest such as the , its performance may be poor compared to the other ecological models as shown in Fig. 5. In conclusion, the Briere model can be used to obtain quick estimates of thermal parameters involved in arthropod evolution without the need for any preadjustment (parameters tuning or / and transformation).
The Analytis model needs some initial boost by either tuning the thermal parameters or using informative priors in order to achieve good predictions. In particular, a lower bound of for should be imposed in order not to generate negligible or negative values. An upper bound for should also be imposed if there are zero responses at a higher temperature. It is only after these parameters have been calibrated that the Analytis model stands out in its performance by comparing the other models and generates wide 95% credible intervals for parameters of interest. Additionally, due to its functional form, Analytis model can fit any polynomial shaped increase and decrease of the developmental rate. In comparison to the simple Gaussian case, Analytis explains the zeros generated scheme better. However, it provides smaller estimates than the other three models, for the parameter. In summary, the Analytis model performs well to predict the upper thermal parameters only after adjusting the and sample space interval limits to prior information from the data.
In addition, the Lactin model does not include an analytical lower thermal threshold in its form. The lower thermal threshold estimates are implicitly derived as the lower root of the Lactin model applying the appropriate model parameters values. Thus, the Lactin model in some cases generates negative estimates of that can not be interpreted in ecology studies. Also parameters and fluctuations trigger statistical model discontinuity situations that need to be addressed taking into account the dataset structure. Moreover, it has smaller effective sample size especially in the case of Gaussian distribution. On the contrary, it generates a wider 95% credible interval for compared to other ecological models and also performs better in the case of zero responses at high temperatures. In summary, the Lactin model is applicable when research focus on robust estimates of the optimum and the upper thermal threshold parameters without the need for prior tuning.
4.5 BMA performance
BMA technique is used to address the difficulty that the true causal structure in ecological data is often not known and often several models are used to describe the development of anthropods. The mean square error in modelaveraged predictions depends on each model’s predictive bias and variance, the covariance in predictions between models, and the uncertainty about model weights (dormann2018model). In this work, we adopt weights in (20). The BMA weights summarised in Tables(A.1, A.1) clearly support the Bieri and Analytis models for the Propylea data and the Analytis and Lactin models for the Tetranychus data. As a result, the BMA estimates are close to the estimates of these models respectively for the two datasets.
5 Discussion
There are several issues that we deal with in terms of arthropod developmental rates, and there are some concerns for future research. To begin with, comparing nonlinear nonnested models with varying numbers of parameters and truncated mean structures, as well as excessivezeros in data, is not a trivial task.
5.1 Comparison of Computational Methods
We use the Bayesian paradigm, as well as some contemporary computational approaches such as the HMC, ADVImeanfield, and ADVIfullrank, to address not only irregular and truncated mean structures, but also the uncertainty of zero generation in the data. Although ADVI techniques are gaining popularity in the scientific community due to their fast and computationally inexpensive approximations to the posterior distributions, they do not provide robust estimates of all the parameters of the models we study. The HMC method, on the other hand, gives robust estimates even under these specific model and data structure conditions.
5.2 Distribution of the data
Furthermore, for the data generation scheme, we suggest the Gaussian and the Inverse Gamma distributions. The Gaussian option provides sensible estimates that can be used even though the data has a lot of zeros. Inverse Gamma, on the other hand, not only naturally models developmental rates, which are characterized as the reciprocal of positive real values, but also provides variance adaptivity across temperature fluctuations. Allso for each ecological model we define the Zero Inflated Inverse Gamma density so as to model data with an excessive number of zeros. When comparing models involving Gaussian and Inverse Gamma or Zero Inflated gamma distributions, we find that the second performs better in nonzero data cases, while the first performs better in all but the Analytis model.
5.3 Model comparison
In addition, we address the model comparison challenge by employing the Information criteria not only to assess model goodness of fit to data while accounting for model complexity, but also to assess ability of models to make predictions on new data using the leave one out cross validation technique. Additionally, we use marginal likelihood approximations of the various models to determine which one is best supported by the data. Finally, we plot the predicted posteriors alongside the observed data points to visualize the prediction ability of the suggested models.
5.4 BMA performance
The predictive bias of the weighted models, as well as the uncertainty about the weights, affect BMA results using Information criteria as weights according to (20) weights as outlined in section C.2.3 in the appendix. As a result, the BMA approach does not provide estimates that differ from the bestperforming models.
5.5 Future research
Among things for future research is to select consistently the most robust candidate between models given sufficiently many data samples, in a sensitivity analysis perspective. Moreover, the ADVI methods, can be extended so as to capture more sophisticated mean structures, like the ones we present in the current work. In addition, probability density that generates zeros in the Zero Inflated Inverse Gamma distribution can be modeled in more complex ways, such as using hyperparameters and hierarchical effects across temperature levels. Finally, Rpackages that include the suggested models and perform the analysis presented in this paper are to be created.
References
Appendix A Tables with summaries and posterior estimates of the statistical methods used
a.1 Propylea quatuordecimpunctata dataset
Tetranychus urticae
Gaussian
Bieri
Briere
Analytis
Lactin
HMC
2011
476
9852
5672
ADVImeanfield
8
1
52
1
ADVIfullrank
45
6
38
44
Inverse Gamma
HMC
1375
761
3231
4917
ADVImeanfield
2
4
15
27
ADVIfullrank
2
34
14
44
Propylea quatuordecimpunctata
Gaussian
Bieri
Briere
Analytis
Lactin
HMC
4335
20218
4913
1426
ADVImeanfield
13
7
3
16
ADVIfullrank
48
11
12
16
Zero Inflated Inverse Gamma
HMC
1078
321
15291
3073
ADVImeanfield
41
5
9
10
ADVIfullrank
52
5
39
18
Gaussian distribution aic_w dic_w loocv_w waic_w bic_w elbo_mf elbo_fr Bieri 4.9E17 9.4E16 2.3E16 2.4E16 8.1E18 3.1E53 7.3E45 Briere 1.9E32 2.3E31 6.6E33 6.9E33 1.1E31 2.9E27 1.5E18 Analytis 1 1 1 1 1 0.001 1 Lactin 1.0E14 4.9E14 1.2E12 1.2E12 9.0E18 0.999 3.5E39 Inverse Gamma distribution Bieri 3.0E43 4.5E43 1.1E42 1.0E42 3.0E43 1.6E20 1.0E200 Briere 6.6E36 8.1E36 2.2E36 2.1E36 3.8E35 8.9E19 1.0E171 Analytis 1.4E5 7.1E6 2.1E5 2.1E5 1.9E6 1 1 Lactin 0.999 0.999 0.999 0.999 1 1.0E36 2.7E58
Gaussian distribution aic_w dic_w loocv_w waic_w bic_w elbo_mf elbo_fr Bieri 0.999 0.968 0.991 0.991 1 1 1 Briere 3.3E34 7.3E30 5.5E35 5.5E35 4.8E33 5E203 3E218 Analytis 1.1E31 1.5E32 3.8E32 3.8E32 4.0E31 9.0E269 1.0E210 Lactin 5.9E04 0.032 0.009 0.009 1.1E05 5.0E159 3.0E162 Zero Inflated Inverse Gamma distribution Bieri 9.3E5 0.001 0.001 0.001 3.5E4 5.1E53 3.4E72 Briere 1.7E38 3.9E38 7.3E38 7.2E38 2.4E37 1 1.1E23 Analytis 1 0.999 0.999 0.999 1 3.3E35 1 Lactin 1.7E8 2.1E8 5.8E8 5.8E8 6.4E8 3.0E114 2.0E140
angle=0,scale=0.6 Gaussian Model aic_w dic_w loocv_w waic_w bic_w elbo_mf elbo_fr Mean 4.4 4.4 4.4 4.4 4.4 4.4 4.4 Cr.I. (4.0, 5.3) (4.0, 5.3) (4.0, 5.3) (4.0, 5.3) (4.0, 5.3) (4.0, 5.3( (4.0, 5.3) Mean 32.9 32.9 32.9 32.9 32.9 32.6 32.9 Cr.I. (32.7, 33.3) (32.7, 33.3) (32.7, 33.3) (32.7, 33.3) (32.7, 33.3) (32.4, 32.8) (32.7, 33.3) Mean 35.3 35.3 35.3 35.3 35.3 36.9 35.3 Cr.I. (35.1, 35.5) (35.1, 35.5) (35.1, 35.5) (35.1, 35.5) (35.1, 35.5) (36.5, 37.3) (35.1, 35.5) dev Mean 1750.3 1750.3 1750.3 1750.3 1750.3 1693.9 1750.3 Cr.I. (1758.0,1738.3) (1758.0,1738.3) (1758.0,1738.3) (1758.0,1738.3) (1758.0,1738.3) (1703.1,1679.7) (1758.0,1738.3) Inverse Gamma Model Mean 4.2 4.2 4.2 4.2 4.2 4.2 4.2 Cr.I. (4.0, 4.6) (4.0, 4.6) (4.0, 4.6) (4.0, 4.6) (4.0, 4.6) (4.0, 4.6) (4.0, 4.6) Mean T_opt 32.0 32.0 32.0 32.0 32.0 33.6 33.6 Cr.I. (31.8, 32.2) (31.8, 32.2) (31.8, 32.2) (31.8, 32.2) (31.8, 32.2) (33.3, 33.9) (33.3, 33.9) Mean 38.4 38.4 38.4 38.4 38.4 35.0 35.0 Cr.I. (38.1, 38.9) (38.1, 38.9) (38.1, 38.9) (38.1, 38.9) (38.1, 38.9) (35.0, 35.1) (35.0, 35.1) dev Mean 1916.3 1916.3 1916.3 1916.3 1916.3 1894.0 1894.0 Cr.I. (1920.6,1908.3) (1920.6,1908.3) (1920.6,1908.3) (1920.6,1908.3) (1920.6,1908.3) (1899.2,1885.2) (1899.2,1885.2)

deviance of the model given the data.
angle=0,scale=0.65 Gaussian distribution aic_w dic_w loocv_w waic_w bic_w elbo_mf elbo_fr Mean 10.6 10.6 10.6 10.6 10.6 10.6 10.6 Cr.I. (9.7, 11.4) (9.7, 11.4) (9.7, 11.4) (9.7, 11.4) (9.7, 11.4) (9.7, 11.4) (9.7, 11.4) Mean 32.6 32.5 32.6 32.6 32.6 32.6 32.6 Cr.I. (32.0, 33.5) (31.9, 33.4) (32.0, 33.5) (32.0, 33.5) (32.0, 33.5) (32.0, 33.5) (32.0, 33.5) Mean 35.0 35.0 35.0 35.0 35.0 35.0 35.0 Cr.I. (34.98, 35.0) (34.98, 35.0) (34.98, 35.0) (34.98, 35.0) (34.98, 35.0) (34.98, 35.0) (34.98, 35.0) dev Mean 818.8 818.5 818.7 818.7 818.8 818.8 818.8 Cr.I. (833.1, 800.8) (832.4, 801.1) (832.9, 800.9) (832.9, 800.9) (833.1, 800.8) (833.1, 800.8) (833.1, 800.8) Inverse Gamma distribution Mean 5.0 5.0 5.0 5.0 5.0 11.1 5.0 Cr.I. (4.0, 7.0) (4.0, 7.0) (4.0, 7.0) (4.0, 7.0) (4.0, 7.0) (10.3, 11.9) (4.0, 7.0) Mean 33.5 33.5 33.5 33.5 33.5 29.3 33.5 Cr.I. (32.3, 34.9) (32.3, 34.9) (32.3, 34.9) (32.3, 34.9) (32.3, 34.9) (29.2, 29.5) (32.3, 34.9) Mean 33.6 33.6 33.6 33.6 33.6 35.0 33.6 Cr.I. (32.5, 34.9) (32.5, 34.9) (32.5, 34.9) (32.5, 34.9) (32.5, 34.9) (34.8, 35.0) (32.5, 34.9) dev Mean 736.2 736.2 736.2 736.2 736.2 563.8 736.2 Cr.I. (742.1, 729.1) (742.1, 729.1) (742.0, 729.1) (742.0, 729.1) (742.1, 729.1) (568.0, 555.9) (742.1, 729.1)

deviance of the model given the data.
angle=0,scale=0.60 neff neff Mean 95% Cr. I. Bieri 10.6 (9.7, 11.4) 10.7 (10.6, 10.8) 10.7 (10, 11.4) 7906 32.6 (32, 33.5) 32.2 (32, 32.5) 32.2 (31.8, 32.5) 7620 Briere 13.1 (12.4, 13.8) 13.3 (12.9, 13.8) 8.2 (1.8, 15.6) 4 29.7 (29.6, 29.8) 29.7 (29.5, 29.8) 27.3 (11.7, 30.1) 4 Analytis 6.3 (4.1, 10.6) 21.5 (7.7, 55.2) 6.9 (5.4, 9.3) 3742 33.2 (32.1, 34.8) 39.2 (18.1, 89.1) 33.2 (31.7, 37.7) 4435 Lactin 154.4 (234.7, 119.3) 146.2 (154.3, 138.1) 159.2 (243.1, 119.0) 7560 31.2 (30.9, 31.3) 31.1 (31.1, 31.2) 31.1 (30.9, 31.3) 8024 neff dev neff Mean 95% Cr. I. Bieri 35.0 (34.98, 35.02) 33.8 (33.6, 33.9) 33.7 (33.5, 33.9) 7194 818.7 (833.1, 800.6) 672.7 (685.1, 657.4) 672.5 (686.2, 657.1) 11673 Briere 35.0 (35.0, 35.01) 35.0 (34.8, 35) 32.9 (14.3, 35) 113 660.5 (667.0, 649.7) 535.8 (543.2, 520.5) 429.7 (535.0, 402.1) 14478 Analytis 33.6 (32.5, 34.9) 42.5 (32.7, 94.5) 36.1 (4.6, 133.2) 9370 674.1 (690.2, 666.3) 160.0 (261.3, 98.2) 491.9 (595.9, 293.6) 5164 Lactin 35.0 (35, 35.04) 35.0 (35.0, 35.1) 35.0 (35.0, 35.1) 21622 809.8 (823.7, 792.2) 804.7 (820.5, 784.8) 778.5 (807.5, 702.6) 12147

deviance of the model given the data,

effective sample size.
angle=0,scale=0.65 neff neff Bieri 10.5 (10.2, 10.8) 9.8 (9.5, 10.2) 9.8 (9.2, 10.3) 15089 33.0 (32.7, 33.4) 158.6 (144.4, 165.4) 183.8 (76.4, 368.5) 8164 Briere 9.3 (8.6, 9.9) 9.3 (9.1, 9.5) 9.3 (8.6, 9.9) 8690 33.1 (32.5, 33.7) 33.0 (32.8, 33.2) 33.1 (32.4, 33.7) 8028 Analytis 4.4 (4.0, 5.3) 4.3 (4.2, 4.4) 5.3 (4.8, 6.1) 8757 32.9 (32.7, 33.3) 33.0 (32.9, 33.1) 33.2 (32.6, 33.7) 41 Lactin 10.4 (10.1, 10.7) 3.9 (5.8, 1.9) 0.7 (4.0, 3.0) 11526 32.6 (32.4, 32.8) 32.2 (32.0, 32.3) 32.0 (31.8, 32.2) 9884 neff dev neff Bieri 36.3 (35.9, 36.8) 164.9 (161, 169) 190.6 (80, 376.1) 7561 1677.1 (1683.2, 1666.6) 1461.0 (1467.7, 1443.6) 1461.7 (1467.7, 1456.3) 10087 Briere 40.0 (39.3, 40.9) 39.9 (39.6, 40.1) 40.0 (39.1, 40.9) 7562 1602.2 (1606.8, 1593.4) 1593.2 (1605.7, 1569.1) 1598.5 (1606.3, 1583.6) 9618 Analytis 35.3 (35.1, 35.5) 35.2 (35.2, 35.3) 35.2 (35.1, 35.4) 8246 1750.3 (1758.0, 1738.3) 1744.7 (1755.6, 1724) 1709.1 (1741.9, 1606.9) 10954 Lactin 36.9 (36.6, 37.3) 38.3 (38.1, 38.5) 38.2 (37.9, 38.5) 7797 1693.9 (1703, 1679.6) 1764.0 (1780.3, 1738.1) 1776.2 (1784.9, 1762) 8227

deviance of the model given the data,

effective sample size.
AIC  DIC  LooCV  WAIC  BIC  (se)  (se)  § (se)  

Bieri  804.7  783.6  816.1  816.1  786.1  419.9 (19.6)  433.4 (18.0)  462.4 (9.2)  
Briere  650.5  649.5  658.3  658.3  637.3  299.2 (24.6)  235.4 (6.7)  231.4 (1.2)  
Analytis  662.1  637.1  671.4  671.4  646.1  283.7 (24.0)  312.2 (10.0)  274.2 (33.7)  
Lactin  789.8  776.8  806.7  806.7  763.3  390.1 (33.5)  328.1 (23.0)  334.7 (1.7)  

Bieri  716.1  719.3  719.3  719.3  702.8  334.5 (25.9)  342.9 (31.5)  333.2 (5.0)  
Briere  560.8  561.4  561.4  561.4  550.2  268.2 (23.2)  265.7 (4.6)  251.7 (1.9)  
Analytis  734.7  732.4  732.4  732.4  718.8  340.8 (26.7)  371.3 (14.5)  335 (31.2)  
Lactin  698.9  699.1  699.1  699.1  685.6  329.2 (19.5)  329.3 (11.8)  313.7 (18.8) 

denotes the logarithm of estimated marginal likelihood via Importance sampling,

denotes the logarithm of estimated marginal likelihood via Power posterior,

§denotes the logarithm of estimated marginal likelihood via Bridge sampling.
Appendix B Posterior prediction graphs for both datasets
Appendix C Bayesian inference
In the current section, we provide details of statistical model specifications in the Bayesian framework. Specifically, we provide a brief description of the most important features of Stan’s implementation of HMC and VB so the reader can get familiar with the tools that Stan is based on. We then provide model selection and model averaging techniques in order to compare the different ecological models and to explore and interpret the parameters of interest combining predictions from all the four of them.
c.1 HMC and VB techniques
The HMC method is a Monte Carlo technique that uses Hamiltonian dynamics in order not only to explore efficiently the target distribution but also to propose distant samples in the parameter space that do not exclusively depend on the current state of the Markov chain like considered in previous MCMC methodology
(neal1901mcmc). In this way, many performance challenges are tackled like either the low convergence due to the fact that the parameter space with high posterior support is not reached or the poor exploration of the target distribution due to its multimodality or its shape irregularities. The existence of Hamiltonian dynamics in the system of the joint density mass function allows the preservation of volume and hence adequate trajectories can be used to define complex mappings of the parameter state space without the need to account for cumbersome Jacobian calculations (barber2003biocontrol). Thus, by carefully designing automated trajectory realizations in the Hamiltonian dynamics system, the Stan team managed to create an augmented software called STAN (carpenter2017stan), which materializes HMC sampling for the parameters of interest.Moreover, independently, Automatic Differentiation Variational Inference (ADVI) technique is referred to the machine learning field
(blei2017variational). The latter is a VB method and posterior target distributions are approximated by choosing the closest distribution to a parametric family of tractable distributions like the exponential family via optimization. In order to achieve this pointwise estimations of the parameters of the family distribution are estimated so that the Kullback–Leibler ‘KL’ divergence function is minimized. Specifically, since the KL divergence is intractable the Evidence Lower Bound is maximized instead (blei2017variational).c.2 Model selection and model averaging
In case there are models under consideration, the posterior probability of the suitability of the model given the data , is given by
(10) 
where expresses the prior belief for the model, while the is the model evidence also called ‘marginal likelihood’ and it can be interpreted as the likelihood over the space of models, marginalizing out the parameters of the model. The ratio of the marginal likelihoods between two models
is called Bayes factor and is the posterior odds of the null hypothesis that the
model fits better the data than themodel does when the prior probability of the null is onehalf
(kass1995bayes). The Bayes factor is used in order to give evidence for the most probable model given the data, when comparing two alternative models (kass1995bayes).In the case of Bayesian model averaging, the model selection uncertainty is taking into account in statistical inference. The joint posterior of the model with vector of parameters , using the Baye’s rule, is proportional to the product of the likelihood of the model times the prior distribution of the parameters times the prior distribution (that expresses our uncertainty of the model)
(11) 
The uncertainty of the model given the data can then be reexpressed via the posterior probability defined by the ratio in (10), in case of existence of multiple models. The posteriors of the models can be thought as weights that are critical to the Bayesian model averaging as they can be used to extract useful weighted statistics from the data distribution while at the same time taking into account model uncertainties. Estimation of model parameters and model uncertainties can be achieved either by directly sampling from the joint posterior (11) or by approximating the marginal likelihood of each model independently and, accordingly, by controlling the outcomes with a view to formulating proper weights and proceeding with the calculation of the averaged statistics. For the former case, techniques like the reversible jump MCMC (green1995reversible; george1997approaches) and variable selection samples (carlin1995bayesian; kuo1998variable; dellaportas2000bayesian) are used. On the other hand, for the later case, techniques of marginal likelihood approximations via thermodynamic integration (friel2008marginal), bridge sampling (meng1996simulating), importance sampling (perrakis2014use) or via information criteria perspective like in (kass1995bayes) are used.
c.2.1 Information criteria
The criteria used in the current work are the Akaike information criterion ‘AIC’, the Bayesian information criterion ‘BIC’, the Deviance information criterion ‘DIC’, the Watanabe–Akaike information criterion ‘WAIC’ and the Leaveoneout crossvalidation criterion ‘Loocv’. Briefly, these criteria provide an approximation of the expected log predictive density for newcoming data while correcting bias from data usage. In particular AIC (Akaike1100705) is defined as the difference
where is the Maximum Likelihood estimate ‘MLE’ of the parameters of the model. Similarly, BIC (schwarz1978estimating) is defined as the difference
where n is the sample size. In addition, DIC (spiegelhalter2002bayesian) is defined as the following difference:
where is the posterior mean of the parameters of the model, whereas is the effective number of parameters and it is evaluated following (spiegelhalter2002bayesian; gelman2013bayesian) by either
or
where is an expectation over the posterior density of , whereas is the variance of the log posterior density of the observed data y, over the posterior density of . Furthermore, WAIC (watanabe2010asymptotic) is defined as the following difference:
where is the expectation of the probability at data point over the posterior distribution of the parameters of the model, whereas is the effective number of parameters and it is evaluated following (gelman2013bayesian) by either
or
where is the expectation over the logarithm of the posterior density of at data point, whereas is the variance of the log posterior density of the observed data , over the posterior density of .
Furthermore, LooCV (gelman2013bayesian) is defined as the following difference:
where is the expectation of the probability at data point over the posterior distribution of the parameters of the model. The posterior distribution is sampled considering a partition of the data, leaving one data value () out of the original sample. The is a bias correction of the measure and it is evaluated following (gelman2013bayesian) by
where is the expectation of the probability at data point over the posterior distribution of the parameters of the model leaving out the observation.
c.2.2 Marginal likelihood estimation techniques
The marginal likelihood can be viewed as a normalizing constant of the density within the ecological model that includes parameters . In the general scheme of comparing the two densities and of interest, as in the case of the Bayes factor of two models or in the case model’s prior and posterior, a general path from to can be created according to (gelman1998simulating) using a class of densities on the same space indexed by the continuous auxiliary variable say . A key formula that links the corresponding normalizing constant and the unormalized density that correspond to the sampling distribution is given by:
(12) 
where the expectation is with respect to the sampling distribution .
In addition, another key formula of estimating a ratio of normalizing constants has been of great interest such as in computing likelihood ratios in hypothesis testing or in computational physics in estimating free energy differences, or in computing the Bayes factor in Bayesian framework (meng1996simulating). The general formula is as follows:
(13) 
where and expectations are with respect to posterior distribution densities and respectively, whereas the bridge function is defined and overlapped by the common support of the former densities.
Using general formulas (12) and (13), several marginal probability evaluation schemes of the model are derived (gelman1998simulating). The power posterior sampling (friel2008marginal), the importance sampling (perrakis2014use) and the bridge sampling (meng1996simulating; overstall2010default) techniques are used for the current work.
In the power posterior case, formula (12) is integrated with respect to variable t and is substituted with density . The marginal likelihood is derived from logarithmic scale by the equation:
(14) 
where expectation is taken with respect to the density which is defined as the power posterior at temperature t (friel2008marginal).
Additionally, the standard error
for the ith model estimator (14), as shown in section E of the appendix is approximated by:where is the time after discretization and is the standard error of the corresponding estimation given in (14).
In the case of importance sampling, the marginal likelihood is assessed by introducing the proper density function g. After sampling from the proposed density function g, the marginal likelihood is calculated as with respect to g as:
Following (perrakis2014use), we use the density equal to and the auxiliary importance function g used is as follows:
(15) 
where are the parameters of the model divided into two blocks and which may or may not be independent. The right hand side of (15) is the product of the marginal posterior distributions of the block. Thus, the marginal probability which gives the target value is given as follows:
Comments
There are no comments yet.