1 Introduction
Several issues arise when analysing public health data with participants nested within spatial areas, three of which we discuss here. The first is that risk factors and covariates related to individual level health outcome, may be given on different spatial resolutions. It is better to use all data on their finest level given in order to avoid the ecological and atomic fallacy (see, e.g. [ecological_multilevel], [lost_aggre], [aggr_level]). The second issue concerns the spatial effect. Two main types should be considered: spatial autocorrelation and spatial heterogeneity. Spatial autocorrelation is closely related to the first law of geography ([geography_law], [spatialwall]) and can arise for a number of reasons, for instance, due to unmeasured/unavailable confounders. The second type is the grouping effect or spatial heterogeneity ([context_heterogeneity]). Sometimes, the difference between the two types of spatial effects is not obvious. The third issue concerns the dynamic effect for longitudinal data. Not only do individuals change over time, but (characteristics of) geographical units as well. These changes are the consequences of unavailable/unmeasured area level covariates that may be changing over time and should, therefore, be considered ([long_multilevel], [repeated_dynamic_group]). Modelling and inference should exploit the dynamic and the spatial effect.
Multilevel modeling (mostly 2level) offers a framework to take advantage of the hierarchical structure of the data ([multi_public], [group_diff]) and is widely used in many applications in the medical, educational and social science ([lit_multilevel_ed], [social_multilevel], [med_multi], [psycho_research]
). The classical multilevel model (MLM CL2) has multiple advantages, but also some limitations. The inclusion of random effects at the highest hierarchical level helps to adjust for fixed effect estimates, for missing or unavailable covariates with spatial structure. A central assumption of interest in the MLM CL2 is that these random effects are mutually independent across spatial units, modelling spatial heterogeneity only. In the Bayesian framework, this consists of assuming exchangeable unstructured priors on area level random effects. This is equivalent to a global smoothing towards the mean effect. However, for positive covariation between adjacent units, positions of the spatial units are important; local smoothing using adjacency may be more appropriate. Figure
1 in the supplementary material shows the importance of accounting for spatial autocorrelation instead of the spatial heterogeneity only.Several works in the literature of health geography and spatial epidemiology, mostly for crosssectional analysis have recognised both the spatial heterogeneity and the spatial autocorrelation and modified the MLM CL2 accordingly: ([browne_MLM], [hongwei_MLM], [autolog]). However, most of these methods are neither useful when substantial autocorrelation is expected nor do they exploit spatial adjacency. Therefore, structured priors on the area level random effect recognizing adjacency are of interest. Markov Random Field models, particularly the Conditional Autoregressive (CAR) models ([CARspatial], [hoef_CAR]
), are suitable for this task. CAR priors have the advantage of facilitating random effects analysis under a Markov Chain Monte Carlo (MCMC) sampling approach. Recent works that combined the advantages of multilevel models and Markov Random Field for crosssectional data include
[multilevel_interat1], [multilevel_interat2], [multilevel_interat3] and the S.CARmultilevel() function of the CARBayes package ([carspatialonly]). However, spatial confounding ([fixed_eff_you_love1], [fixed_eff_you_love2]) was not explicitly examined in the previous papers. We consider the multilevel model with restricted CAR model (MLM RCAR) to account for this spatial confounding.To examine the change over time of the health outcome in longitudinal studies, classical level growth models (also MLM CL3) are widely used when participants are nested within geographical units. This consists of including arealevel random effects that are independent across spatial units in the model equation, to account for the spatial effect. In addition to aforementioned limitations of MLM CL2 for crosssectional analyses, the geographical units are conceived here as entities that exert an effect that changes systematically with time. It is more realistic to assume that areas undergo structural and functional changes over time that are more stochastic in nature. These changes are not well captured by the classical model MLM CL3 ([repeated_dynamic_group]).
The contribution of this paper is twofold: The first and main objective is to further develop Multilevel Conditional Autoregressive models for longitudinal data (MLM tCARs) by combining some already existing models: The classical 3level growth model and some CAR models to account for spatiotemporal random effects. We compare the developed MLM tCARs in the longitudinal setting to the classical (3level) multilevel growth model (MLM CL3) in terms of accuracy and stability in the coefficient estimates under the presence and absence of spatial effects in data via a simulation study. The second goal is to provide epidemiologists with a practical decision tree to help choose the appropriate models that produce more stable coefficient estimates in case of nested data, for crosssectional as well as longitudinal studies. To achieve this, we also compare the MLM CARs for crosssectional data to the MLM CL2 in a simulation study. In contrast to existing studies ([multilevel_interat3]),we additionally analyse whether adding the spatially correlated error term (CARprior term) to a linear model shrinks or enlarges/inflates the true regression coefficients. The Restricted Multilevel Conditional Autoregressive Model (MLM RCAR) ([restricted_CAR]) accounts for spatial confounding. Thus, we compare the MLM RCAR, the MLM CAR and the MLM CL2 for crosssectional data.
We organized our article as follows: In the first part of section 2 we shortly present the MLM CL2 and MLM CARs for crosssectional studies. In the second part of section 2, we introduce the developed MLM tCARs for longitudinal data and then present selected comparison criterion for epidemiological and public health applications. The simulation strategy follows. The method section ends with a computation subsection, which summarizes the techniques used for simulation and model fitting. In section 3 we outline the results of the simulation studies. The decision tree for the most appropriate methods, when spatial effects are involved, is provided in section 4. In section 5 we apply the MLM CL3 and MLM tCARs for longitudinal studies comparatively to the analysis of the association between depression and greenness in the longitudinal Heinz Nixdorf Recall Study (HNRS). Section 6 is dedicated to the discussion of the results of the simulation studies and the application.
2 Methods
2.1 Multilevel Conditional Autoregressive models for crosssectional data
For multilevel models for crosssectional data, the study region is partitioned into nonoverlapping areal units. Data are available on individuals, with individuals within area , . The following model equation accounts for both spatial heterogeneity and spatial autocorrelation:
(1) 
where , follow an inverse Gamma distribution,
follows a uniform distribution
. Here, is a vector of intercept and covariates for individual in area . includes individual level as well as area level covariates. is the binary adjacent matrix. The area level random effect vector has the Leroux structure given in equation (1) ([leroux], [congdon_hier], p. ). corresponds to a lack of spatial interdependence, i. e. the classical multilevel model (2 levels, MLM CL2). By contrast, leads to the intrinsic CAR ([congdon_hier], p. ) model.Spatial confounding can be interpreted in linearmodel terms as a collinearity problem. When spatial confounding is detected, spatial smoothing is restricted to the orthogonal complement of the fixed effect of area level variables, called Restricted Conditional Autoregressive model (RCAR). This is recommended and described in [fixed_eff_you_love1], [fixed_eff_you_love2]. Choosing the restricted CAR models for the area level random effect instead of the CAR models leads to the MLM RCAR.
2.2 Multilevel Conditional Autoregressive models for longitudinal data
Our model combines the advantages of multilevel models and the properties of the Markov Random Field to accurately model the random effects in a multilevel growth model. It is defined as follows:
(2) 
is a continuous outcome for individual in the spatial unit at the measurement occasion . is a vector of intercept and covariates for individual in area at the measurement occasion . includes individual level and timevarying variables, individual level and timeinvariant variables, area level and timevarying variables, and area level and timeinvariant variables. It also includes a deterministic function of time , defining the individual growth. Note that g could be defined differently for each individual. with prior is the vector of regression coefficients. There are three levels represented by sources of random variation at the area level, the individual level and the observation level in the random effect part. We assume the outcome for individuals in spatial areas and time
are conditionally independent and normally distributed.
and , independent of , are random effects for the individuals to account for persontoperson differences in the repeated measures and how they change over time. is the vector of random effects for time period , which evolves over time and makes use of temporal and spatiotemporal dynamics. Independent of , and , it decomposes the spatial effects into spatial heterogeneity and spatial autocorrelation. There are existing models for in the literature with this structure, mostly used in disease mapping models so far, including ([Knorr_phi], [Lee_phi], [Bernardinelli_phi]). Such models also help to understand the dynamic of the spatial effect. We select three models for for our model in equation (2) and for our simulation study: the CAR ANOVA model, the convolution model, and the classical model. This lead to the MLM CAR ANOVA, MLM CONV und MLM CL3, respectively. Together, the MLM CAR ANOVA and MLM CONV are called MLM tCARs.The CAR ANOVA model for is given by
(3) 
where , . , as well as , see [Knorr_phi]. The model decomposes the spatiotemporal variation in the data into components: an overall spatial effect common to all time periods, an overall temporal trend common to all spatial units, and a set of independent spacetime interactions. This is an ANOVAtype decomposition. This model is appropriate if the goal is to estimate overall time trends and spatial patterns. Here, the spatiotemporal autocorrelation is modelled by a common set of spatial random effects and a common set of temporal random effects . Both are modelled by the CAR prior proposed by [leroux]. is the same as in subsection 2.1 while is the binary temporal adjacency matrix defined by if and otherwise, . Additionally, the model can incorporate an optional set of independent spacetime interactions
. For the inverse Gamma prior for the variance components
, values for and could be , .The convolution model for is defined by
(4) 
where , . . It is an extension of the model originally by [CARspatial]. The spatial autocorrelation parameter as well as the spatial heterogeneity parameter are allowed to vary over time. i.e., the model produces a separate effect for each area and each time point.
The Classical linear growth model is the (3level) model for which , reducing to , depending on the goal of the analysis. Here, and capture the area level random effect or unexplained spatial variation for the intercept and slope respectively.
2.3 Model comparison criterion for epidemiological studies
Since we are mostly interested in associationbased (regression) models applied to observational data, we concentrate on comparison methods for which uncertainty in coefficient estimates are examined.
Let and
be the estimated regression coefficients and standard deviations respectively, using regression model
to fit the generated data. The bias for the coefficient , using model is defined by , where are the true regression coefficients. The Root Mean Square Error assesses the quality of the estimators of the true regression coefficients using the underlying model .We use the Deviance Information Criterion (DIC) ([DIC_selection]) as an insample prediction criterion, which is a reasonable choice for our nested models. When comparing candidate models, smaller values of DIC indicate better models. The DIC should be used together with the posterior loglikelihood before recommendation. The model maximizing the posterior loglikelihood is preferable.
2.4 Simulation strategy
The simulation study is motivated by data situations typically observed in spatial epidemiology or health geography, where data are collected on different spatial levels. The main goal of the simulation study is to examine, how well the true regression coefficients for the candidate models are retrieved for simulated spatial effects. We use the geography of the HNRS.
We start with the simulation study for longitudinal data, based on equation (2). We simulate the spatiotemporal random effect . Then, we simulate two covariates at the individual level, one of which is timevarying, from normal distributions. One timevarying variable is simulated at the area level, from a normal distribution. We consider a linear individual time trend
. We hold predictor variables fixed as well as the individual level error term. After simulating the area level spatiotemporal random effect
for each scenario, we generate the dependent variable. More details on the model equations used for simulating the spatiotemporal random effect are given in section 7, equation (5). and are spatial and temporal autocorrelation parameters respectively, with values in the unit interval . and are overall spatial and temporal variation parameters, respectively. The values for these parameters define the strength of the simulated spatiotemporal effect. For and we consider low, medium and high values, corresponding to , and , respectively. For and , we also consider low, medium and high values, corresponding to , and (large enough for this problem), respectively. Overall, there are possible scenarios. For presentation, we consider just a few selected ones for , and , as described in Table 7.1 in section 7. For sensitivity analyse about the structure of the spatiotemporal random effect, we vary the values of , , and for each scenario. We also simulated different types of spatial models, all of which include spatial heterogeneity and spatial autocorrelation, whose strength change over time. Additionally, we have also simulated several independent variables instead of just for sensitivity analysis (data not shown). For each generated data, candidate regression models are fitted and compared with respect to the comparison methods listed in subsection 2.3: MLM CL3, MLM CAR ANOVA and MLM CONV.For crosssectional analyses, we base our simulation on equation (1). We simulate two predictor variables: one variable defined at individual level and the other defined at area level, both described by normal distributions. The individual level error term is also simulated from a normal distribution. More details on the model equations used for simulating the spatial random effect are given in section 7, equation (6). is an autocorrelation parameter and lies between and . If is closer to , then the simulated spatial effect is more similar to spatial heterogeneity. If is closer to , then this is more similar to a CAR structure. A value of between and corresponds to both medium spatial heterogeneity and autocorrelation. The different scenarios are described in Table 7.1 in section 7. Here we just include scenarios for common data situations. For sensitivity analysis, for each scenario, different values of and are given. For each generated data, candidate regression models are fitted and compared with respect to the comparison methods listed in subsection 2.3: MLM CL2, MLM CAR and MLM RCAR.
2.5 Computation
All models are fitted in the Bayesian setting with Markov Chain Monte Carlo (MCMC) simulation methods. All parameters whose full conditional distributions have a closed form distribution, i. e. the regression parameters and all area level, individual and observational level variance parameters, are updated using a Gibbs sampler ([Gelfand]). The spatial and temporal parameter and for the MLM CAR ANOVA are updated using the slice sampler ([Neal]). Full conditional distributions for parameters of interest are available upon request. For a better comparability, we relied on the expert system of BUGS ([BUGS]), particularly run from within the R software using the Rpackage R2WinBUGS ([r2winbugs]). We used devices such as centering of the covariates as well as hierarchical centering of the random effects ([hier_centering]) to reduce correlation in the joint posterior and increase Markov Chain Monte Carlo (MCMC) effective sample sizes. To access convergence and consistency of the chains, single as well as two parallel chains initialized at different points were used, and the Geweke diagnostic ([Geweke]), Brook Gelman diagnostic ([brook]) and Heidelberger Welch’s diagnostic ([Heidelberger]) were applied. We ran the chains and chose the number of iteration until at least , the measure of mixing chains, is less than for all parameters and quantities of interest. For the final estimation, we used a single long run after discarding a part: The length of the burnin period was determined for each model separately.
For the longitudinal analyses, the MLM CL3 (3 level) as well as the MLM CONV stabilized earlier at about iterations. For the MLM CAR ANOVA, we needed up to simulations for the chains for some parameters to stabilize. We thinned the chain by storing only every th draw.
For the crosssectional analyses, we also thinned the chains by storing only every th draw for the MLM CL2, MLM CAR and MLM RCAR in order to decrease autocorrelation and speed up ’mixing’. For the Markov chains for the MLM CAR and the MLM RCAR, we could not detect departure from convergence after the th iterations. The MLM CL2 stabilized earlier.
The models were run in parallel using the Rpackage ’batchtools’ ([batchpack]), which provides a parallel implementation. The complete Rcode for the simulation study is available upon request.
3 Results for the simulation studies
3.1 Results for longitudinal data
In summary, MLM CONV model and the MLM CAR ANOVA model perform much better than the MLM CL3 model. This is particularly pronounced in case of a strong spatial variation and a changing spatial structure over time. Otherwise, the MLM tCARs should be used cautiously to avoid overfitting, for instance.
figureComparison of the Root Mean Square Error (RMSE) for the area level variable coefficient, for the set of selected scenarios of the simulated spatiotemporal effect. The true value for the area level coefficient is . and , and are overall variance and autocorrelation parameters from equation (5) in section 7, for space and time respectively.
In more details, the individual level regression coefficients are very well retrieved by the candidate models in any of the scenarios as indicated in Figure 1. However, the RMSE is more pronounced for the MLM CL3 compared to the MLM tCARs in general. The coverage of the Credible Interval (CI) is for all scenarios and for all candidate models (data not shown). For the area level variable coefficient in Figure 3.1, the RMSE is not negligible for the three models. The bias and, therefore the RMSE is larger in case of larger values of . The RMSE is still larger for the MLM CL3. The time coefficient is almost equally retrieved for the three methods, and the RMSE is larger when the value of larger is (see Figure 1 in section 7).
Observing Figures 3.1 and 1 (see section 7), the DIC and loglikelihood, respectively, for the MLM CAR ANOVA and MLM CONV are larger than that of the MLM CL3 in general. The DIC is smaller for the MLM CONV compared to the MLM CAR ANOVA, and smaller for the MLM CAR ANOVA compared to the MLM CL3. This indicates a better fit for the MLM tCARs in general. Note from the results of the sensitivity analysis for the model parameters and model goodness of fit that very small simulated observational level variances result in negative DIC values (results not shown) even though fitting a correct model.
figureComparison of the DIC, for a set of selected scenarios of the simulated spatiotemporal effect, longitudinal. and , and are overall variance and autocorrelation parameters from equation 5, for space and time respectively.
In any case, the DIC values for the MLM tCARs are smaller in general. Moreover, small observational level (simulated) variances lead to very small bias in the individual level regression coefficients. The coverage of the CI is very good for the three models in case of small values for and .
For very small values of spatial and temporal parameters, the MLM CL3 has smaller RMSE values for the regression coefficients in general compared to the MLM CAR ANOVA. This is understandable as the MLM CAR ANOVA may be too complicated to fit such a simple structure. The MLM CAR ANOVA in comparison to the MLM CONV models seems to be more complex. The MLM CONV model shows better results for the RMSE for coefficient estimates and model fit in almost all scenarios.
For sensitivity analyses, different values for the fixed effect parameters for simulation were investigated. The results were similar regarding the behaviors of regression coefficients.
3.2 Results for crosssectional data
The summary result is that the RMSE for the area level regression coefficients is larger for the MLM CL2 and the MLM CAR, compared to the MLM RCAR. This depends on the strength of the overall variance and spatial autocorrelation (CARstructure) in the simulated data. The difference in the RMSE is mainly due to the standard error, since there is little bias in the regression coefficients. The individual level regression coefficients are not influenced very much. For a very small value of the overall spatial variance parameter
, the bias is in general negligible. In more details, the individual level regression coefficients are very well retrieved by the three candidate models in any of the scenarios, though the classical model performs worse; compare the RMSE for the individual level variable in Figure 3.2. The coverage of the CI is for all scenarios and for all candidate models (data not shown). For the area level variable coefficient in Figure 3.2, there is negligible bias and RMSE for the three models in case of very small value of . For moderate and higher values of , the RMSE is larger for the three models. Again, the RMSE is larger for the MLM CL2, smaller for the MLM RCAR and larger the larger is. The individual level error variance is better retrieved by the MLM CARs (data not shown). The spatial autocorrelation parameter (data not shown) is also well retrieved by the MLM CAR and MLM RCAR; the MLM CL2 tends to overestimate the corresponding random effects variances.Observing Figure 1 (section 7), the DIC is smaller for the MLM RCAR compared to the MLM CAR and MLM CL2. The loglikelihoods are about the same for the three models.
With additional explanatory variables (data not shown), we obtain similar results. For a stronger correlation between the area level variable and the CAR random effect term, the MLM RCAR clearly has better performances.
figureComparison of the Root Mean Square Error (RMSE) for the area level and individual level variable coefficients, for a set of selected scenarios of the simulated spatial effect. The true value for the individual level coefficient is while the true value for the area level coefficient is . and are the overall spatial variance and autocorrelation parameters from equation (6) in section 7, respectively.
After centering of the random effects, the computation time for the MLM RCAR is slightly larger than that of the MLM CAR model. This, however, is not considerable compared to the risk of making weak inference with biased regression coefficients.
4 Decision tree
Although there exist many advanced methods available for analysing health data within a geographical context, simpler methods are often called upon to obtain a ’quick’ solution. However, the choice of simplicity sometimes may lead to erroneous conclusions. Based on the results of our simulation studies, we suggest the decision tree in Figure 4
to help the users analyse epidemiological data for which participants are nested within geographical areas. We use a tradeoff between bias/RMSE, model fit and computational time to select the appropriate models. Overfitting in regression analysis can also produce misleading regression coefficients and pvalues (frequentist). We recommend the MLM tCARs for longitudinal data instead of the MLM CL3 only when a strong spatial effect is expected in the data. Some of the MLM tCARs, like The MLM CONV could be used routinely, no matter how strong the spatial effect is. For crosssectional data, we recommend the use of the MLM RCAR and MLM CAR instead of the MLM CL2 except when a very weak spatial effect is expected in the data.
For software implementation, the classical linear model, the MLM CL2, the MLM CL3 are available in software packages. The implementation for MLM tCARs and MLM CARs is not always given in software packages. See Table 1 in the supporting material in section 7 for detailed information on the implementation of the methods described in this decision tree in software packages.
figureDecision tree for users, to analyse data for which participants are nested within geographical areas that might be of public health interest. The method suggestion is based on a tradeoff between accuracy to retrieve regression coefficients, model’s goodness of fit, and time needed to complete the fit.
5 Application to the analysis of the association between depression and greenness in the Heinz Nixdorf Recall Study
5.1 Data
The Heinz Nixdorf Recall Study (HNRS) is an ongoing population based cohort study from the cities Essen, Mülheim and Bochum of the metropolitan Ruhr Area in Western Germany. The baseline was performed between and including randomly selected men and women aged to years old ([Ester]). Participants were invited to the study center three times each after 5 years (, ; , ; , ). In addition, a yearly postal followup between the examinations was provided, which allows for more than 18 years of observational data. The HNRS was approved by the local ethics committees, and all participants gave informed consent prior to participation.
Depressive symptoms during the previous week were assessed using the 15item shortform questionnaire of the Center for Epidemiologic Studies Depression Scale (CESD) ([Hautzing], [Radloff]). For our analysis we used CESD data of eight measurements, assessed between and . Scores for the 15item version range from 0 to 45, with a higher score indicating more frequent depressive symptoms.
Recent studies suggest evidence of the importance of green spaces for mental health. Green spaces improve wellbeing or reduce physiological stress indicators ([Ester]). Exposure to green space is commonly measured either as surrounding greenness or access to green space. Here, we employ the Normalized Difference Vegetation Index (NDVI) derived from satellite imagery ([ndvi_scale]). Values of NDVI range from to . Those around 1 generally correspond to water, while values near represent bare surfaces, e.g. rock, rooftops and roads or very scarce or dry vegetation. Values approaching represent dense vegetation, e.g. rainforest. We base our analysis on eight time points of satellite imagery data.
Further covariates are included. Some are directly measured at the district level like unemployment rate in district, obtained from the local census authorities of the respective cities of Bochum, Essen, and Mülheim. Unemployment is a strong indicator for material deprivation in a neighborhood and was used as an indicator of neighborhoodlevel socioeconomic status (SES). Other covariates such as socioeconomic (e.g., income), demographic (e.g., age), gender, medical history and Body Mass Index (BMI) are measured at the individual level. In contrast to the NDVI, further covariates are timeinvariant.
5.2 Analysis and results
We apply the MLM CL3, the MLM CAR ANOVA and the MLM CONV comparatively. The square root of the depression scores leads to models that better fit the assumptions and requirements of linear models. Square rooted depression is continuous with values between and , has mean , and a standard deviation of .
Individual profiles of depression data with average trend line (Figure 1, section 7) are used exploratorily to detect a slightly linear decreasing trend. We also analysed the spatial confounding at baseline. In order to decrease autocorrelation and speed up ’mixing’ of the MCMC, we thinned the chains by storing only every 10th draw. The length of the burnin period was determined for each model separately. For the Markov chains, for the MLM CL3, we could not detect any departure from convergence after the th iteration. For the MLM CONV, the chain stabilized also very quickly, at about 9.000 iterations. For the MLM CAR ANOVA, more iterations are needed, e.g., about to observe no departure from convergence for the spatial autocorrelation parameter.
Figure 1 in the supporting materials shows the mixing of the chains for some fixed effect parameters of the MLM CONV. The trace plots look similar for other methods. Trace plots, density plots and further diagnostics for all parameter estimates as well as variance components are available upon request.
MLM CL3  MLM CAR ANOVA  MLM CONV  

intercept  
greenness  
female vs male  
baseline Age  
BMI  
time  
unempl. in district  


  
median  
median  
loglikelihood  
DIC 
tablePosterior quantiles from equation (
2) for the MLM CL3 , MLM CAR ANOVA and MLM CONV for the analysis of the association between greenness and depressive symptoms of the Heinz Nixdorf Recall Study.The results suggest a negative association between greenness and depressive symptoms for all three methods ( is not included in the CI) as indicated in Table 5.2. For MLM CL3, MLM CONV and MLM CAR ANOVA respectively, a unit increase in the value of NDVI leads to a decrease of the root of depression score, on average by (, , ). The time slope is on average () for all three methods ( not included in the CI). This indicates a slightly linear decreasing trend, which is in perfect accordance with the exploratory individual profile plot of the data in Figure 1 of section 7. We also notice that the coefficient estimates for the area level variables are the same for the three models. The intercept is slightly different for the MLM CAR ANOVA. This is also in line with the results of the simulation study, where in case of a very small overall spatial effect, not changing much over time, and a medium spatial autocorrelation, the three compared methods give approximately the same results for coefficient estimates and credible intervals. In fact, the spatial heterogeneity parameter for the MLM CONV is very small (median value ), and the spatial autocorrelation parameter for the MLM CONV as well (median value ). Both are not changing much over time. This is also in accordance with the MLM CAR ANOVA, where the overall spatial and temporal parameters and are and . A medium value of the spatial autocorrelation parameter is indicative of a medium spatial dependence. A large value of the temporal autocorrelation parameter is indicative of a spatial effect that is not changing much over time. The spatiotemporal interaction parameter is also very weak. Both the posterior loglikelihood and DIC are better for the MLM CAR ANOVA and MLM CONV, particularly the MLM CONV, compared to the MLM CL3. The total iterations took minutes for the MLM CL3, minutes for the MLM CONV and minutes for the MLM CAR ANOVA (just for comparison) on an ordinary personal computer.
The similar results for fixed effect estimates were also expected from the decision tree after having a look at the spatial structure of the data. For a weak spatial effect, not changing much over time, the MLM tCARs were suggested, with the MLM CL3 as alternative.
6 Discussion
Our aim was to explore the effect of considering, or neglecting spatiotemporal random effects, on regression coefficients in epidemiological data, with participant nested within geographical units.
To achieve this, we considered the classical multilevel models and the multilevel conditional autoregressive models. The multilevel conditional autoregressive models are obtained by combining multilevel models and the properties of Markov Random Fields in order to borrow strength in adjacent areas. We considered separately regression models for crosssectional studies as well as for longitudinal studies in simulation studies in several scenarios of the spatial and spatiotemporal effects. For crosssectional analyses, we compared the MLM CL2 , the MLM CAR and the MLM RCAR. For longitudinal studies we introduced new models named MLM tCARs (MLM CAR ANOVA and MLM CONV). Such models are not common in the literature, at least not for epidemiological analyses. These models help to explicitly borrow strength and simultaneously account for individual dynamics as well as area dynamics. After introducing the MLM tCARs, we compared them to the MLM CL3. In summary, the results indicated that neglecting either the spatial or spatiotemporal effects leads to larger RMSEs in coefficient estimates and worse model fit in general. Furthermore, we provided a decision tree for model selection in epidemiological studies for spatially nested data, based on simulation studies. At the end, we applied the methods to the analysis of the association between depression and greenness in the HNRS. In the following, we discuss our results, pointing out the strengths and limitations.
To achieve our goals, we had to simulate the spatial and spatiotemporal random effects. The simulation of random effects from CAR models is not common and should be taken cautiously ([Old_new]). However, CAR type random effects are successfully used in many works and applications to simulate the spatial structure in order to show the correctness and advantages of some CARlike spatial models ([CarBayes]). We used the geography of the HNRS, but the results can be generalized to other geographical structures.
For the application, the three models showed the negative association between greenness and depressive symptoms as well as a linear decreasing individual trend. This analysis of the association at the individual level is also perfectly in line with the analysis by [djeudeu], which was performed on the HNRS at an aggregated level. Moreover, the current analysis has the added value that all variables are used at their finest spatial level. This avoids the risk of the ecological and the atomic fallacy by aggregating or disaggregating them. Though the analysis of the association and growth showed the same result for all three models for the coefficient estimates of interest, the MLM CAR ANOVA as well as the MLM CONV showed an overall better fit compared to the MLM CL3. Moreover, using the MLM CAR ANOVA or the MLM CONV, we were able to separate out the individual time trend and the spatial time trend. The spatial effect was not changing very much over time. The overall spatial effect was medium and the spatial autocorrelation was rather strong. This explains why, the models showed almost the same behaviours for coefficient estimates as expected from the scenarios of the simulation study and as indicated in the decision tree in Figure 4.
The MLM CARs and MLM tCARs have the limitation that using them unnecessarily for noncomplex data structures may lead to overfitting and slightly timeconsuming fits. However, the damage of neglecting the spatial structure is more important than the unnecessary computational burden to fit the model. The bias for individual level coefficients are generally negligible for all methods. However, not accounting for the spatiotemporal dependence and dynamic could lead to large standard deviations for regression coefficients and therefore larger RMSEs for classical models. The estimates of the standard errors determine the ’significance’ of the fixed effect parameters (frequentist). Not choosing the right model to account for the spatiotemporal effect will lead to (falsely) small pvalues (frequentist) and, therefore, false ’significant’ associations between health outcome and exposure in some epidemiological studies. Therefore, it is recommended to consider a model that accounts for the spatial effect as well as the dynamic of the residual variation rather than using the MLM CL2 or MLM CL3.
The spatiotemporal CARprior should, however, be used cautiously. In the situation of spatial confounding, we recommend to use the restricted CAR models. This will avoid the fixed and random arealevel effects to compete to explain common variation, which could distort estimates of the fixed effects area level regression coefficients and unduly inflate their posterior variances. This was considered for our crosssectional analyses. The MLM CAR and MLM RCAR were just a bit more time consuming compared to the MLM CL2. Although this seems to be a huge disadvantage, in absolute practical terms in a multitasking computing environment, it makes little impact. Relative to the time taken to collect the data (sometimes more than years) it is irrelevant. Restricted MLM tCAR models still need to be fully developed and were not part of our simulation study.
Using MLM CARs and MLM tCARs, spurious inferences regarding fixed effects parameters can be avoided. This is particularly important when the primary inferential focus is on fixed effect estimates as in several epidemiological analyses. If the goal of the analysis was model comparison in terms of prediction, we would have to use some sort of holdout of data. See [white_smooth] for analysing spatial data with the goal of spatial prediction. Methods suited for inference may not always be appropriate for comparing models in terms of prediction ([bias_reg_compare]).
We noticed that the MLM CONV has a less complex spatiotemporal structure and performs a bit better than the MLM CAR ANOVA in most of the scenarios, also in the application. MLM tCARs with different spatiotemporal CAR prior structures for the area level random effects can also be compared in future works to further improve the decision tree. The MLM CAR ANOVA did not retrieve the intercept properly for certain scenarios. This could be a result from spatial and temporal confounding that should jointly be addressed in future studies. The overall results in this paper are developed for Gaussian likelihood models and could easily be extended to other likelihood models (generalized linear models). A preliminary simulation study with the logitlink function for crosssectional data showed similar results for coefficient estimates that we described here.
References
7 Supplementary material
7.1 Tables and figures
Scenario No  Meaning/Interpretation  
0.8  0.5  weak spatial effect, medium spatial heterogeneity and autocorrelation, medium temporal effect  
3  0.9  weak spatial effect, mainly spatial autocorrelation, strong temporal effect  
0.8  3  0.09  medium spatial effect, medium spatial heterogeneity and medium spatial autocorrelation, strong temporal effect  
0.8  0.5  medium spatial effect, medium spatial heterogeneity, medium temporal effect  
moderate spatial effect, mainly spatial autocorrelation, medium temporal effect  
strong spatial effect, medium spatial autocorrelation, strong temporal effect  
0.9  strong spatial effect, mainly spatial autocorrelation, strong temporal effect  
0.5  strong spatial effect, medium spatial autocorrelation, medium temporal effect  
0.9  strong spatial effect, mainly spatial medium spatial autocorrelation, strong temporal effect 
tableThe selected scenarios of the simulated spatiotemporal effect. Here, all the selected scenarios are presented in the paper.
Scenario number  Value of  Value of  Meaning/Interpretation 

moderate spatial effect, mainly spatial autocorrelation  
1  moderate spatial effect, mainly spatial heterogeneity  
moderate spatial effect, medium spatial heterogeneity and medium spatial autocorrelation  
strong spatial effect, mainly spatial spatial heterogeneity  
strong spatial effect, mainly spatial autocorrelation  
10  strong spatial effect, medium spatial heterogeneity and medium spatial autocorrelation  
weak spatial effect, mainly spatial autocorrelation  
8  weak spatial effect, mainly spatial heterogeneity  
weak spatial effect, medium spatial heterogeneity and medium spatial autocorrelation 
tableThe different scenarios of the simulated spatial effect. The scenarios with gray color are the one presented in the paper.
Statistical Model  Usual R packages, implementation in R  Usual implementation in SAS  

Crosssectional design, classical non spatial methods  Classical multivariate linear, Poisson, Binomial, negative Binomial, log normal regression. There are nonparametric and semiparametric methods as well.  Frequentist approach include Rpackages stats (lm function for linear and glm for generalized linear model), MASS (glm.nb), gamm4 (gamm, gamm4), nlme (lme) lme4 (lmer, lmer2, glmer), glmmADMB (glmmadmb) for mixed effect, gee (gee) for marginal models. Bayesian approach includes R package MCMCglmm, R2WinBUGS, RINLA, rstan. The list is not exhaustive. 
Frequentist approach includes proc reg, proc glm for linear regression, and proc logistic, proc genmod, proc glimmix, proc nlmixed for generalised linear models. Bayesian approach include proc mcmc and builtin capabilities in the genmod procedure. 
Crosssectional design, classical spatial methods  Classical MLM (MLM CL2) Fixed Effect Methods (FEM) Genaralized Estimating Equations (GEEs)  Frequentist approach include gamm4 (gamm, gamm4) lme4 (function lme), mgcv (gam, semiparametric). Bayesian approach includes R packages MCMCglmm, R2WinBUGS, RINLA, rstan. 
Frequentist approach include proc mixed for linear regression, proc logistic, proc genmod and proc GEE for fixed or marginal effect, proc glimmix and proc nlmixed for generalized ME,
Proc GAM for semiparametric models Bayesian approach includes proc MCMC. 
Crosssectional design, spatial heterogeneity and spatial autocorrelation  Multilevel CAR (MLM CAR), Multilevel resricted CAR (MLM RCAR), Semiparametric regression (generalized additive models).  Bayesian approach include R package MCMCglmm, R2WinBUGS, RINLA, rstan, CARBayes, HSAR.  Bayesian approach includes proc mcmc 
Longitudinal design, classical methods  Univariate and multivariate ANOVA, Growth curve models, multilevel regression models (MLM CL3), SEM (Structural Equation Modeling) for longitudinal data, Fixed Effect Models (FEM), Generalized Estimating Equations (GEEs), Semiparametric models.  Frequentist approach include R packages gamm4 (gamm, gamm4) lme4 (lme) geepack and gee for Generalized Estimating Equation, multgee for multinomial response, CRTgeeDR. Semiparametric methods includes mgcv (gam). Bayesian approach includes R package MCMCglmm, R2WinBUGS, RINLA, rstan.  Frequentist approach include proc mixed for linear regression, proc logistic, proc genmod and proc gee for fixed or marginal effect, proc glimmix and proc nlmixed for generalized MLM. Bayesian approach includes proc mcmc 
Longitudinal design, changing spatial effect, spatial autocorrelation and spatial heterogeneity  MLM tCARs for longitudnal data  Implementation using WinBUGS run from within R via the Rpackage R2WinBUGS 
figureDescription of spatial effects for the example of the Heinz Nixdorf Recall Study: The red/green points represent participants’ positions with high/no high value of depression score, the yellow arrows indicate the spatial proximity for participants in the same geographical unit, while the blue ones are for participants in adjacent units. Two participants in adjacent units may be closer and have more similar outcome than participants in the same spatial unit (yellow arrows). This implies that both spatial heterogeneity and spatial autocorrelation should be accounted for.
figureComparison of the Root Mean Square Error (RMSE) for the time variable coefficient (growth), for the set of selected scenarios of the simulated spatial effect. The true value for the time coefficient is . and , and are overall variance and autocorrelation parameters from equation (5), for space and time respectively.
figureComparison of the Root Mean Square Error (RMSE) for the individual level variable coefficient, for the set of selected scenarios of the simulated spatial effect. The true value for the individual level coefficient is . and , and are overall variance and autocorrelation parameters from equation (5), for space and time respectively.
figureComparison of the posterior loglikelihoods, for a set of selected scenarios of the simulated spatiotemporal effect for longitudinal studies. and , and are overall variance and autocorrelation parameters from equation (5), for space and time respectively.
figureComparison of the goodness of fit, for a set of selected scenarios of the simulated spatial effect for crosssectional studies. and are the overall spatial variance and autocorrelation parameters from equation (6), respectively.
figureExploratory plot of the individual trajectories and mean trajectory for randomly selected 50 participants, to identify average individual trends within subjects. The dense linear blue line represents the means trajectory. It is indicative of a linear decreasing trend.
figurePost diagnostic plots (trace and density plots) of the fixed effect parameters of interest.
7.2 Simulation of the spatiotemporal effect
We use the following equation, step by step, to generate the random effect and then the dependent variable:
(5) 
where
, , , , , , for , , . and are and dimensional unit matrices, respectively. denotes the Kronecker product.
7.3 Simulation of the Spatial effect
We use the following equation:
(6) 
where denotes a unit matrix. is a variance parameter that controls the strength of the overall spatial structure.