1 Introduction
In environmental epidemiology, we are often interested in making inference about the relationships between spatiallyreferenced exposures and spatiallyreferenced health outcomes. Spatial structure in exposures can arise from underlying environmental processes and human activities, while health outcomes can have spatial structure derived from both the exposure of interest and other spatiallyvarying factors. For example, coronary heart disease is associated with socioeconomic status (Diez Roux et al., 2001), which can have geographic structure and is associated with air pollution (Brochu et al., 2011; Jerrett et al., 2004; Mensah et al., 2005). When not measured (or measured incompletely), these factors can cause unmeasured spatial confounding
, which is the inability to distinguish the effect of a spatiallyvarying exposure from residual spatial variation in a health outcome, resulting in biased point estimates and standard errors
(Paciorek, 2010).Early discussion of spatial confounding in the literature includes Clayton et al. (1993), who described the ‘confounding effect due to location’ in regression models for ecological studies, which they attributed to unmeasured confounding factors. Clayton et al. (1993) advocated for the inclusion of a spatiallycorrelated error term in hierarchical models for spatial data (Clayton and Kaldor, 1987; Besag et al., 1991) and claimed this would account for confounding bias but might result in conservative inference. Since then, the approach of adding spatially structured error terms has frequently been used in spatial models for areal data (Reich et al., 2006; Wakefield, 2007; Hodges and Reich, 2010; Hughes and Haran, 2013; Hanks et al., 2015; Page et al., 2017). In a causal inference framework, confounding due to geography has been recently addressed using spatial propensity scores (Papadogeorgou et al., 2019; Davis et al., 2019), however in those settings the exposure of interest was not explicitly spatial.
For pointreferenced data, Paciorek (2010) provided a rigorous discussion of spatial confounding and the importance of the spatial scales of variability in the exposure and outcome. Paciorek (2010) demonstrated that reductions in bias could be obtained when the scale of unconfounded variability in the exposure, which he quantified by the spatial range parameter in a Matern covariance function, is smaller than the scale of confounded variability.
One approach in the literature for adjusting for spatial confounding at broad scales with pointreferenced data is to estimate the coefficient of interest from a semiparametric model that includes spatial splines
(Paciorek, 2010; Chan et al., 2015). This approach is a natural extension of time series studies, where flexible, onedimensional basis functions can account for confounding due to unmeasured temporal variation (Burnett and Krewski, 1991; Schwartz, 1994; Dominici et al., 2003; Dominici et al., 2004; Szpiro et al., 2014). Thinplate regression splines (TPRS) (Wood, 2003) are a commonly used basis and the amount of adjustment can be tuned using the degrees of freedom () parameter (Paciorek, 2010). However, particular values of do not have clear spatial scale or interpretation. Generalizing inference about associations between exposures and health outcomes is difficult when the extent of spatial confounding adjustment is not easily quantified.1.1 Air Pollution in the NIEHS Sister Study
This work is motivated by an analysis of systolic blood pressure and fine particulate matter (PM) in the NIEHS Sister Study. Chan et al. (2015) found that a 10 g/m difference in longterm exposure to ambient fine particulate matter (PM
) was associated with 1.4 mmHg (95% Confidence Interval [CI]: 0.6, 2.3) higher systolic blood pressure (SBP) at baseline. To account for spatial confounding from unmeasured regional differences in socioeconomic and health patterns,
Chan et al. (2015) included TPRS with 10 in their model.Here we consider a reanalysis of this cohort using PM exposures at grid locations, rather than at subject residences, to accommodate the methods we describe below. We use predictions of the 2006 annual average ambient concentration from the universal kriging model of Sampson et al. (2013), made on a 25km by 25km grid across the contiguous United States. For each of the 47,206 Sister Study subjects, we assign exposure based upon the closest grid cell center. Using the same measured confounders as Chan et al. (2015), but no spatial adjustment, we find that a difference of 10 g/m in PM is associated with 0.26 mmHg higher SBP (95% CI: 0.14, 0.66). However, when TPRS with 10 are added to the model then the estimated difference in SBP is 0.77 mmHg (95% CI: 0.14, 1.41). Figure 1 shows the estimates for other amounts of adjustment. The change in the estimates as a function of suggests that some form of spatial confounding is present, but it is not clear from Figure 1 on what scales the confounding is occurring, how much adjustment should be done, and which estimate should be reported.
1.2 Manuscript Outline
In the remainder of this paper, we address the question of how to interpret and select the spatial scale of adjustment. In Section 2 we introduce the statistical framework and procedures for spatial confounding adjustment. In Section 3 we describe three choices of spatial basis (TPRS, a Fourier basis, and a wavelet basis) and present a method for assigning variation in these bases to a spatial distance. In Section 4 we describe approaches for selecting the amount of adjustment and compare the approaches in simulations in Section 5. In Section 6 we apply the adjustment approaches to the motivating Sister Study cohort. Section 7 provides a concluding discussion.
2 Methods for Spatial Confounding Adjustment
2.1 Statistical Framework
We consider a cohort study of subjects with measured health outcomes , exposures , and observed confounder covariates . We partition the spatial domain into a rectangular grid with distinct locations . Each subject is assigned to a location . We assume that the subjects are distributed evenly across the locations; we discuss nonuniform sampling of subject locations in Section 2.4.
Extending the framework of Szpiro et al. (2014) to a spatial setting, we assume that the exposure for each individual is measured without error and takes the form
(1) 
where is a smooth spatial surface and is (fixed) residual spatial variation. We decompose the observed confounders in a similar manner as , except that is not a function of space and is modelled as . The stochasticity of arises from the sampling of subjects.
We assume that the observed health outcomes arise from the model
(2) 
where is an unknown, fixed spatial surface and . The target of inference is the parameter , which represents the association between exposure and the outcome conditional on and . The term represents unmeasured spatial variability in , which we assume to be fixed and unknown.
In contrast to the approach taken by “restricted spatial regression”, which targets the unconditional effect of exposure on outcome (Hodges and Reich, 2010; Hughes and Haran, 2013), our parameter of interest here is the exposureoutcome association conditional on the unmeasured confounder . In observational air pollution epidemiology studies, the conditional association is targeted in order to provide information that is generalizable beyond the specific time and location of the study cohort (e.g. Peng et al., 2006).
To model the spatiallyvarying terms in (2), we employ a set of hierarchical spatial basis functions that are in order of nonincreasing resolution. For compactness of notation, let denote the matrix of the first basis functions evaluated at the subject locations. Following Dominici et al. (2004) and Szpiro et al. (2014), we decompose , , and as:
(3)  
where
is the unit vector with 1 as the
th element. We assume . This assumption means that there is finer scale variability in the exposure surface than in the unobserved confounding surfaces, which Paciorek (2010) identified as a necessary condition to achieve reductions in bias via adjustment. In Section 3, we consider three different choices of : a regression spline basis, a Fourier basis, and a wavelet basis.The ordinary least squares estimator from the model
will be biased due to the correlation of the observed and with the unobserved . In the literature, this correlation is sometimes referred to as concurvity (Hastie and Tibshirani, 1990; Ramsay et al., 2003a, b). We describe two strategies for adjusting for this correlation in order to eliminate the confounding bias.2.2 Adjustment in Outcome Model
A straightforward way to adjust for the unmeasured spatial structure is to fit the semiparametric regression model
(4) 
for a chosen value of . This is a direct extension of confoundingadjustment approaches used in time series methods (Dominici et al., 2004; Peng et al., 2006) and was the approach taken by Chan et al. (2015) and in Figure 1. For basis functions such as TPRS that are easily computed, fitting model (4) in standard statistical software is straightforward.
If , then the ordinary least squares estimator from model (4) will be unbiased, because the inclusion of fully adjusts for variation in its column space, which includes . If , then will remain biased because of correlation between and the terms of (from the decomposition in (3)) that are not within the column space of .
2.3 Preadjusting Exposure
A second approach to confounding adjustment is to first ‘preadjust’ the exposure, to remove variation in the exposure that is correlated with the confounding surface. In comparison to the semiparametric adjustment approach described above, the preadjustment approach does not necessarily require the explicit construction of , which allows for additional choices of spatial basis. Szpiro et al. (2014) outlined how preadjustment can be done for cohort studies with time series exposures, and here we present an extension of that approach to spatial settings. This approach can be considered a form of spatial filtering (e.g. Haining, 1991; Tiefelsdorf and Griffith, 2007)
We decompose the vector of observed exposures into two components, and , such that is orthogonal to for some (we discuss methods for selecting in Section 4). In settings when can be explicitly computed, this decomposition can be accomplished by taking to be the projection of onto () and to be its complement (. When it is not practical to explicitly construct , as in the Fourier and wavelet approaches we discuss in Section 3, we use a filtering procedure in the frequency domain to achieve this decomposition.
Once has been partitioned into and , we fit the model
(5) 
where and . Because , the bias of is proportional to
(6) 
where the summations are over . If , then is by construction orthogonal to and so . If , then is orthogonal to and so . Under these two conditions, the bias of given in (6) is zero. If either or , then will be biased. Forcing into the error term creates correlation between the , but this can be accounted for using ‘sandwich’ standard error estimates (White, 1980).
2.4 NonUniform Spatial Distributions of People
So far we have assumed that all locations have an observed subject and that the number of subjects was that same at each location. This assumption was necessary to achieve the orthogonality of the preadjusted exposure () and the basis functions matrix (and thus , , and ). Adjustment in the outcome model or exposure preadjustment in which can be explicitly constructed do not require that the locations be spatially uniform. However, preadjustment using the filtering approaches does require that all locations be observed the same number of times.
To implement preadjustment in the presence of duplicate locations, first the duplicated locations are removed (and unobserved locations added) to create the vector . The filtering preadjustment is done on to generate and . By construction , where is the th basis function evaluated at all locations in once. The duplicated locations are then added back in by repeating the relevant entries in to form (and similarly for ). However, the duplicated locations means that in general . To recover the orthogonality necessary to eliminate the bias term (6), the outcome model can use reweighting.
Suppose that subject locations are distributed according to a spatial distribution , which has positive support across all of . To recover orthogonality of and (), we reweight the study population by its spatial distribution. Specifically, we fit a weighted least squares (WLS) version of (5) where the observation weights are proportional to the inverse of . This assures orthogonality of and as , since
This WLS approach essentially downweights the contribution of people living in highlypopulated areas and gives added weight to those who live in more sparsely populated areas. The parameter being estimated remains the same. For applications in air pollution epidemiology, the correlation between high exposures and population density means that this has the effect of weighting subjects with different exposure levels differently. This is similar in spirit to propensity score weighting, in which groups are reweighted by their probability of exposure in order to remove bias due to confounding
(Stuart, 2010).3 Interpreting Spatial Scales of Adjustment
Here we present specific details for three different choices of spatial basis functions: thinplate regression splines, Fourier basis, and wavelet basis. We describe how to implement adjustment for each basis and how to interpret the amount of adjustment in terms of physical distance. We will describe methods for selecting the amount of adjustment in Section 4.
3.1 Effective Bandwidth
For each basis, we describe how to relate the amount of confounding adjustment (i.e. ) to a physical spatial scale. We call this scale, denoted by , the effective bandwidth
of the adjustment. Heuristically, the effective bandwidth is the width of the smoothing kernel induced by the choice of basis function. This allows for adjustment using
basis functions to be interpreted as adjusting for confounding by smoothing at distance. In the context of particular application, may represent a national, regional, or metropolitan scales of variation.3.2 Thinplate regression splines
Thinplate regression splines (TPRS) are lowrank approximations to thinplate smoothing splines (Wood, 2003)
. The fullrank thinplate splines are derived from a set of radial basis functions, and TPRS achieve computational benefits by using a truncated eigenbasis to approximate the radial basis. TPRS are based upon the spatial locations of observed points, eliminating the need for knotselection.
For problems of inference, unpenalized TPRS are indexed by the degrees of freedom (), which controls the dimension of the basis and can be fixed to yield a set of hierarchical basis functions where higher values of correspond to adjustment at a finer scale. The mgcv package in R (Wood, 2003) makes explicit construction of the TPRS basis straightforward, so both the semiparametric adjustment and preadjustment approaches can be used with this basis.
We propose the following procedure for computing the effective bandwidth for a chosen number of TPRS basis functions. The smoothing induced by TPRS can be represented in the smoothing matrix . Unlike the frequencybased methods discussed below, the amount of smoothing, which is given by the columns of , varies at each location due to the location of neighboring points, which is also impacted by the geometry of the domain. We compute by finding the distance at which the “average” value of the smoothing matrix values first cross zero. For each observation , we first compute the vector of distances to all other points. We then fit a loess smooth to , the th column of , as a function of . The procedure is repeated across all locations, and the pointwise median of the loess smooths computed. The distance at which this median smooth first crosses zero is . A detailed description of this procedure is provided in Supplementary Material Section A. The publiclyavailable R package spconf implements this procedure for any provided spline or smoothing matrix values.
3.3 Fourier decomposition and Frequencybased filtering
A second choice for basis is the set of sinusoidal Fourier basis functions. Without loss of generality, let and denote coordinates of . The value of the function
can be written as a linear combination of sine and cosine functions, whose coefficients are determined by the Discrete Fourier Transform (DFT) of
(Gonzalez and Woods, 2008). Spectral methods have been widely used in spatial statistics, often as a means to approximate covariance functions (e.g. Guinness and Fuentes, 2017; Royle and Wikle, 2005; Fuentes et al., 2007).Allowing for ties as necessary, we can use the spectral coordinates to order the spatial basis functions by effective frequency: (Burger and Burge, 2009). Because each basis function is oriented to a particular angle, there may be multiple basis functions with the same frequency but different orientation.
To avoid explicit construction of all basis functions, we apply the preadjustment approach described in Section 2.3 for a selected frequency . We decompose into its projection onto and corresponding complement by applying a highpass filter in the frequency domain. The values of the filter are
(7) 
To implement preadjustment, we first compute the DFT of , which is the gridded exposure surface with duplicated locations removed. We then define to be the elementwise product of and . The inverse DFT of provides . Duplicated locations are assigned the same value from , yielding .
One interpretation of removing variation described by frequencies and lower is that it removes periodic variation with periods and higher. The analogue of for the Fourier basis is the effective bandwidth of the kernel smoother in the spatial domain that corresponds to the frequency filter . This smoother is given by the inverse Fourier transform of , which is analytically a ’sinc’ function. However, edge effects from finite grids mean the inverse DFT of does not exactly follow the expected analytic form. Therefore, we compute the effective bandwidth of empirically in a manner similar to the approach for TPRS: Let denote the inverse DFT of . We take to be the minimum value of , scaled by grid size, for which .
3.4 Wavelet Basis and Thresholding
A third choice of spatial basis is a set of wavelets. Like Fourier basis functions, wavelets are localized in frequency, but they are also localized in space (Nason, 2008). This means that wavelets can compactly describe variation at different frequencies in different areas of a spatial domain. There are many different sets of wavelets, and here we use the smooth Daubechies wavelets (Daubechies, 1988). Wavelet basis functions are indexed by level, based upon successive halving of the spatial domain. Within each level (and thus, within each frequency ), there are multiple wavelet basis functions with different orientations and spatial positions. The effective bandwidth that corresponds to each level is .
Multiresolution approaches such as wavelets have recently been used to quantify spatial variation of air pollution exposures (Antonelli et al., 2017). However, Antonelli et al. (2017) only described exposures with variation at “low” and “high” frequencies, and did not attempt to use wavelets for explicit confounding adjustment nor did they provide a specific distance to these labels. Decorrelation of the exposure and outcome via wavelets was presented in an ecological context by Carl and Kühn (2008). However, they applied the thresholding to the exposure and the outcome jointly and performed regression on the wavelet coefficients, which is not practical for the large grids and when we have other confounders in the model.
For finite data on a discrete grid, the Discrete Wavelet Transform (DWT) maps any surface to a set of wavelet coefficients. Unlike the DFT, the DWT requires that the grid have length equal to a power of two. Data on a nonsquare grid can be embedded within a larger grid with dyadic dimension to apply the DWT. We use an implementation of the DWT available in the R package wavethresh (Nason, 2008).
To preadjust exposure using wavelets, we apply a filtering approach similar to that used for a Fourier basis. We first compute the DWT of . We then threshold all coefficients at levels to get a modified wavelet transform . We then apply the inverse wavelet transform to to get the preadjusted exposure . Although we use a global threshold of all wavelet coefficients up to a particular level, the extent of thresholding could be varied across the exposure domain if desired.
4 Selecting the amount of adjustment
A critical question for data analysis is how much adjustment to do or, equivalently, what value of or should be chosen. In both approaches presented in Section 2, the estimates of are unbiased only if (with an additional condition of for the preadjustment approach) and if the basis for adjustment is correctly chosen. But in most practical settings, the true values of , , and and the correct choice of basis are unknown. In this section, we describe different approaches to choosing the amount of adjustment ( or ).
If there is specific knowledge of assumed unmeasured confounders or other external, contentarea knowledge, then could be selected a priori. Exact knowledge of is unlikely to be known, but a choice of or might be based upon a combination of known scale of variation in the exposure (e.g. if it is predicted from a spatial model with known parameter values) and the approximate scale of the possible confounders (e.g. regional variation in socioeconomic status). This choice might also be influenced by estimates of and , which could be estimated from the data. While an a priori choice of
may not necessarily lead to an unbiased estimate, it does provide a way to prespecify the extent and interpretation of adjustment, without risking overfitting the model to the outcome data
.In the absence of external knowledge, the amount of adjustment can be selected in a datadriven manner. Estimating directly is challenging, since it requires knowing the amount of variation in that is due to and –exactly what we are trying to estimate. However, measures of model fit (for either (4) or (5)) can be used to calculate the scale of spatial variation in . Specifically, after fitting the outcome model with a range of choices for , the model that minimizes the Akaike Information Criterion (AIC) (Akaike, 1973) or Bayesian Information Criterion (BIC) (Schwarz, 1978) can provide a selection of the model that best fits the data. However, selecting the model that best fits is not guaranteed to reduce error in the estimate of .
A variation of this procedure for estimating the scale of relevant spatial variation in is to fit outcome models that include the adjustment basis but do not include the exposure. The amount of adjustment that leads to the smallest AIC or BIC can be chosen as for the primary model. Observed confounders can either be included in the model, or first projected out from . Here we consider the former approach and refer to is as the “AICNE” approach for choosing , respectively, where “NE” stands for “No Exposure”. The BIC alternative is named in the analogous manner. The rationale of this approach is that if the unmeasured confounders have a strong impact on the values of and the association of interest () is relatively weak (a relatively common occurrence in environmental epidemiology), this approach can yield a choice of without overfitting the relationship between and . Because it involves fitting a model with the outcome , this approach is mostly limited to adjustment bases such as TPRS that can be explicitly computed. If all locations on the grid had a single outcome value, then this approach could be used for Fourier and wavelet filtering, but that setting is unlikely to occur in practice.
An approach that targets estimation error directly is to choose to minimize the estimated MSE of . Let be some large value for which it is assumed the estimator is unbiased. Then estimate the bias of as and estimate via the ‘sandwich’ estimator (White, 1980) for the fitted model. Together these provide a method for selecting :
(8) 
This approach, however, has the clear drawback that may not be an unbiased estimator (see Section 4.1).
Approaches that are more ad hoc are also possible. One such option is to choose by selecting a point right after a “knee” in the plot of against (for example, Figure 1). While visually intuitive, this approach requires a cumbersome formal definition. We define to be:
(9) 
where is a order difference operator and . The set limits to values of that are beyond the largest secondorder difference, an approximation of curvature, in the sequence. Within this set, the value of that gives the first minimum in the firstorder differences is selected. Importantly, this approach does not account for differences in the resolution of and can be greatly impacted by the noise in the finite differences.
4.1 Impact of overadjustment
If the choice of basis functions exactly matches the underlying confounding surface , then sufficiently rich adjustment will remove bias. However, if the choice of basis functions does not match the underlying confounding surface , then residual bias may remain regardless of the choice of . This could lead to increases in bias, since the adjustment basis () then becomes a nearinstrumental variable (IV) that is highly correlated with the exposure but only weakly related to the outcome . Adjustment for a nearIV in contexts with residual confounding can amplify bias from the residual confounders at a rate greater than any bias reduction due to confounding by the nearIV (Pearl, 2011). This suggests that the MSE approach presented above may perform quite poorly if the choice of basis functions is incorrect.
5 Simulations
5.1 Primary Setup
We conducted a set of simulations to compare the different adjustment approaches, demonstrating both different choices of spatial basis and the different methods for selecting the scale of adjustment. The data for the simulations are created at points lying on a grid over the unit square . For each simulation, we constructed a fixed exposure surface and a fixed unmeasured confounder surface on this grid.
We considered six different unmeasured confounder surfaces, , constructed to have “large” and “fine” scale variation for each of three choices of basis: TPRS, sinusoidal functions, and a spatial Gaussian process (GP). Table 1 summarizes these surfaces, and mathematical detail on their definitions is provided in the Supplemental Material, Section B.1.
We constructed the exposure surface as , where is a fixed realization of a Gaussian process with exponential covariance structure. This structure for allows for the amount of correlation between and to be controlled, without requiring that they are generated in the same manner (e.g. correlated Gaussian process as used by Page et al. (2017)). For Simulations 1 and 2, we considered two values for the range parameter in the covariance function used to generate : 0.05 and 0.50, respectively. The value of 0.05 results in smaller scale spatial variation than the candidate confounders, which is needed to eliminate bias. The value of 0.50 results in spatial variation at a coarser scale than some of the confounder surfaces, which may lead to the persistence of bias but is also a situation that could reasonably arise in practice. The parameter controls the correlation between and and thus the amount of bias due to unmeasured confounding. Because the correlations between and differ for each choice of , we calculated so that the amount of bias was equal for all confounder surfaces, which makes comparison of the results more straightforward. Plots of the surfaces and the formula for are provided in Supplemental Material Section B.2. The values of and
were standardized to have mean zero and unit variance.
For each simulation replication, the outcome was constructed as where . This setup is designed to reflect a feature of the Sister Study application in that the residual variation in the outcome, even after accounting for unmeasured confounding, is large relative to the exposure variation. We redrew a sample of observation locations (selected from the grid of points) for 1,000 replications of each simulation. For Simulations 1 and 2, we set and chose such that the uncorrected bias would be 0.2. For each simulation we also conducted a variation with no effect () to estimate Type 1 error at the nominal level.
We compare estimates from four sets of models: (i) a model with semiparametric adjustment via TPRS in the outcome model and models with preadjustment of exposure (ii) by TPRS (at observed locations), (iii) via a highpass Fourier filter, and (iv) via wavelet thresholding. We present estimates for a sequence of adjustment amounts and using the selection methods from Section 4.
5.2 Effective bandwidth
The effective bandwidths for TPRS and a highpass filter on the unit square are shown in Figure 2. The difference in ranges of between the two bases is clear, with the highpass frequency filter spanning a much larger range of spatial scales than TPRS. The slight “hiccup” in Figure (a)a at is an artifact of the shape of that particular basis function (and the constraints of the gridded locations). For the highpass filter, the spatial smoother corresponding to highpass filters and never cross zero, which is why points are only shown for . For both adjustment bases, there is an approximately linear relationship between tuning parameter ( or ) and on the loglog scale.
5.3 Simulation Results
The point estimates for the different choices of adjustment basis and unobserved confounder surfaces are provided in Figure 3 for Simulation 1 (exposure with fine scale variation) and Figure 4 for Simulation 2 (exposure with larger scale variation). In each figure, each panel shows the mean estimate for the estimators that automatically select and across a range of fixed values of . For both simulations, the bias of the unadjusted estimator that ignores the unmeasured confounding is 1.2, by construction.
5.3.1 Results for fixed
When the adjustment approach matches the basis used to generate the confounder, we see that the bias can be removed after sufficient adjustment. For Simulation 1 when confounding is due to and , this occurs when there is TPRS adjustment of at least and , respectively. In addition to leading to low bias (topleft panels of Figures 3), this leads to low MSE that matches or beats all other estimators (Supplemental Materials Table 1), and nearly correct coverage rates (94% for nominal 95% confidence interval [CI]; see Supplemental Materials Table 2). For and , complete adjustment occurs when there is a high pass filter with cutoff at a frequency of 7 and 23, respectively. However the variance of the estimates from the Fourier filtering approach increases greatly with the amount of adjustment, so the MSE remains relatively large despite the lack of bias (Supplemental Materials Table 1).
When the adjustment approach does not match the basis generating the confounder, we see that in some cases most bias can still be eliminated but in other cases bias can persist. When the unmeasured confounder has large scale variation (, , and ), we see that all approaches are able to eliminate most bias after sufficient adjustment. However, for the confounder surfaces that have small scale variation, we see the persistence of bias regardless of the amount of adjustment () and bias amplification with increasing adjustment (). This bias amplification occurs even when the correct adjustment basis is chosen, which can seen by the increase in bias for confounder surface when adjusting with Fourier filtering up to .
In general, we see that TPRS can successfully adjust for confounding even when the unobserved confounder is generated from a different basis. For example, when , adjustment with TPRS is able to remove the confounding bias for . However, when , the bias is not eliminated even for very high choices of relative to the sample size of . This difference is due to the smaller range of effective bandwidths of TPRS compared to those of the highpass filter approach (Figure 2). By construction, variation in and extends up to and , respectively. Using the relationship in Figure (b)b, this corresponds to values of of approximately 0.12 and 0.03, respectively. A value of corresponds to approximately , but there is not value of for which TPRS can achieve .
For Simulation 2, in which the exposure has larger scale spatial variation, we see similar trends to most of the Simulation 1 results. However, we do see greater effects of bias amplification, which now appears when the confounder is as well as when the confounder is (Figure 4). Unlike in Simulation 1, the bias induced by is not reduced at all.
In both sets of simulations, empirical Type I error when
followed the same patterns as the coverage rates when . Specifically, the error rates were at or close to 0.05 when the correct amount of adjustment was done using the correct basis (Supplemental Materials Table 3).5.3.2 Results when selecting automatically
The estimators that automatically select using information criteria perform generally well. For adjustment using TPRS in the outcome model in Simulation 1, the model selected by minimizing AIC from the outcome model without exposure (AICNE) performs the best in almost all settings. For confounders other than , a large amount of adjustment is selected, which leads to reduced bias (Figure 3). The MSE of the estimator with adjustment selected by AICNE is smaller than the MSE for estimators using fixed values of (Supplemental Materials Table 1). For some surfaces, the amount of adjustment selected by BICNE yielded slightly smaller MSE, but coverage rates were always better for selection by AICNE (Supplemental Materials Table 2). In Simulation 2, selection using BICNE performs best in bias (Figure 4) and MSE (Supplemental Materials Table 4) for all settings except when the confounder is . This reflects the benefit of greater penalization for model complexity (which leads to less adjustment) in settings with extensive bias amplification. When the confounder is , selection using BIC from the full outcome model performs the best. However, coverage rates are slightly better using AICNE for selection in most cases.
When preadjusting with TPRS at observed locations in Simulation 1, the amount of adjustment selected by minimizing AICNE was also better in most settings. Again while MSE was slightly lower for the estimator with amount of adjustment selected by BICNE, coverage rates were always better for selection by AICNE. Performance in all settings was similar to using adjustment for TPRS in the outcome model. In Simulation 2, minimizing BICNE does better in most settings, having lower variance since it tends to select a smaller model.
For adjustment using Fourier or wavelet filtering, the minimallyadjusted model is almost always selected when minimizing AIC and BIC since the number of parameters estimated for filtering increases exponentially. This results in lower MSE and better coverage than the unadjusted models, but poorer performance than adjusting with TPRS (Supplemental Materials Tables 1 through 5). The exception is in Simulation 2 for confounders and , when the models selected by AIC and BIC do much worse than the unadjusted model.
Selecting the amount of adjustment by the estimatedMSE criterion (equation (8)) reduced most of the bias in Simulation 1. However, it let to standard errors that were much larger than the other approaches, as can be seen in Figure 3. For Fourier and wavelet filtering, the standard errors were so large that a single estimate provided little information. The corresponding MSE was much larger than all other estimators (Supplemental Materials Table 1). In Simulation 2, the bias amplification led to estimates that were as bad or worse. The ad hoc “knee” based approach (equation (9)) performed similarly poorly to the estimatedMSE approach, although had better coverage.
5.4 Additional Simulations
We conducted additional simulations that included measured confounders , and the results were not substantively different from Simulation 1 and 2 so are not shown here. We also conducted simulations in which the number of distinct locations was smaller than the number of subjects, leading to repeated locations. However, little difference was observed between those results compared to the primary results presented here. Approaches to reweighting had some benefit in terms of reducing variance in settings with extreme bias (small variance in and chosen for large relative bias), but no practical impact on the amount of bias.
5.5 Simulation Conclusions
Based on the results of Simulation 1 and 2, the best overall approach to reducing confounding bias is adjustment with TPRS at observed locations, either in the outcome model or using exposure preadjustment. The best approach for selecting the amount of adjustment is to minimize AICNE or BICNE (AIC or BIC from an outcome model without the exposure). In cases where bias amplification may be a concern, due to either only large scale variation in the exposure surface or very fine scale confounding, preference should be given to BICNE over AICNE.
6 Sister Study Application
We now compare these different spatial confounding adjustment approaches in the Sister Study example of the association between SBP and PM. The estimates from performing exposure preadjustment using TPRS across all observed grid locations, with each location repeated for the number of subjects at the same location, are provided in Figure (a)a. These results are similar to those in Figure 1, as the adjustment basis is the same and the preadjustment that leads to Figure (a)a repeated locations as needed. Based upon the simulations (Section 5.5), the preferred method for selecting the appropriate amount of adjustment is to minimize AICNE or BICNE. Because the exposure surface in this application is quite coarse (25km grid), we use BICNE for selecting our primary result to reduce the impact of bias amplification from overadjustment. This choice results in adjustment using TPRS basis functions. When adjustment is done in the health model, the estimated association is a difference of 0.96 mmHg (95% CI: 0.50, 1.42) in SBP for each 10 difference in PM (Table 2). When the exposure is preadjusted, the estimated association is a difference of 0.92 mmHg (95% CI: 0.46, 1.41) in SBP for each 10 difference in PM.
For comparison, Table 2 also includes the results for the other approaches to selecting . Using AICNE or AIC results in a large amount of adjustment being selected (). This appears to be extensive overfitting of the model; the increases in the value of the loglikelihood as increases are small relative to the large sample size (). The MSE approach also chooses a large amount of adjustment: for outcome model adjustment and for exposure preadjustment. All of these approaches with large adjustment result in negative point estimates due to the downward trend observed in Figures 1 and (a)a. This downward trend is likely due to bias amplification from residual confounding, since PM only explains a small fraction of the variation in SBP and the measured confounders, while numerous, likely cannot capture all of the confounding relationships. Furthermore, the coarse spatial scale of the exposure means that variation due to unmeasured confounders is likely finerscale than the exposure.
Results from two alternative approaches to exposure preadjustment are presented in Figures (b)b and (c)c. These figures correspond, respectively, to preadjustment at observed locations, with duplicated locations removed, and preadjustment over all locations in the domain, with each location included once. The point estimates corresponding to the different selection methods for these adjustment approaches are provided in the bottom half of Table 2. The amount of adjustment selected by AIC and BIC (using the full outcome model) is small, ranging from to . These choices of adjustment yield point estimates ( to ) similar to the main results from the models that adjust at all observed locations ( and ). The difference in estimates between Figures (b)b and (c)c is relatively small, suggesting that the restriction of the preadjustment to the observed locations does not have a large impact on the results. However, there is a notable qualitative difference between these results and those that included duplicate locations for the adjustment (Figures 1 and (a)a). While all four approaches show a negative trend in the point estimates for large values of , the approaches that adjust using observed locations including duplicates decrease at smaller values of and yield negative point estimates. We explored using reweighting to correct for the impact of the duplicated locations in the preadjustment approaches and obtained results that were either qualitatively similar to the results without reweighting or were highly unstable (see Supplemental Materials Section C.2). The magnitude of the difference in the point estimates from these alternative approaches using TPRS (including or excluding duplicate locations) suggests that residual confounding is likely occurring in this setting.
Estimates for a fixed choice of 1,000 km are also provided in Table 2. This corresponds to adjustment using degrees of freedom (Supplemental Materials Section C.1). A bandwidth of this size smooths large scale variation across the contiguous United States (which is approximately 4,500 km by 2,900 km). Because there are wellestablished largescale trends in health outcomes across the United States (Mensah et al., 2005), this value was chosen so that the analysis accounts for longscale trends in systolic blood pressure.
Based on the simulation results and duplicated observed locations, we recommend adjustment using TPRS as described above instead of using the Fourier or wavelet approaches. However, we present results for those approaches here for illustration. To apply the preadjustment approaches with Fourier and wavelet basis functions, we embed the gridded locations within a larger square grid. Because the predictions of 2006 annual average PM of Sampson et al. (2013) are only defined over the contiguous United States, the added points are assigned exposure concentration of zero. This results in a grid of size 184 184 for the Fourier approach and 256 256 for the wavelet approach. We see from Figures (a)a and (b)b that the point estimates are relatively stable for all amounts of adjustment. This is probably due in part to the sinusoidal and wavelet basis functions not being representative of the spatial structure of the realworld unmeasured confounders, so increased adjustment removes little variation from the outcome or exposure. Additionally, because these adjustment methods rely on each location being present only a single time, we expect trends similar to Figures (b)b and (c)c.
7 Discussion
We have examined different approaches to adjusting for unmeasured spatial confounding and quantifying and selecting the scale of adjustment. These approaches are motivated as extensions of time series methods for temporal confounding (Peng et al., 2006; Szpiro et al., 2014). We presented a method for comparing the spatial scales across choices of basis using the effective bandwidth . We showed examples using TPRS, a Fourier basis, and wavelets, each of which indexes variability in different forms and on different scales. We showed that adjustment with TPRS is limited to a smaller range of spatial scales than the Fourier approach. However, the smaller bandwidth range of TPRS can have more flexibility at these scales. But when the true variation is beyond the scales TPRS can reach, it fails to remove confounding bias.
For a particular application, a choice of the amount of adjustment must be made. We identified selecting the amount of adjustment using AICNE and BICNE as the preferred approaches for most settings. When there is substantive knowledge available about the scales of variation involved, can be chosen a priori. Ad hoc approaches such as the “knee” and estimated MSE methods performed poorly.
Preadjusting the exposure by Fourier filtering or wavelet thresholding is attractive due to the orthogonality of those bases. However, they are both severely limited by the requirement of a square grid, assumptions of periodicity, and the need to reweight the population. For wavelets, the thresholding at dyadic levels also leads to limited available intervals for adjustment. Although in settings where there is a priori knowledge about confounding relationships, nonuniform thresholding of the wavelet coefficients could add flexibility. For these reasons, the more flexible TPRS are likely to be preferred over the Fourier or wavelet basis for settings where TPRS can represent variation at the scale of interest, such as the example study of PM in the Sister Study. The Fourier basis ccould be chosen when veryfine scale adjustment is needed or there is external evidence to support periodic variation in the confounders.
One of the key results from our simulations is that the addition of spatial basis functions, or an equivalent preadjustment procedure, does not necessarily reduce bias in the point estimate. In fact, it can lead to substantial bias amplification. This is an important contrast to the time series settings, in which more aggressive adjustment is recommended to reduce bias from unmeasured confounding (Peng et al., 2006). The exposure in a time series settings are typically measured directly, which means that there is substantial residual variation in the exposure at each time point, whereas the use of an exposure prediction model in a spatial context means that spatial exposures are often quite smooth. The potential for bias amplification means that using changes in the point estimates is often not a good approach to assessing the extent of confounding bias.
We have presented results here in the context of a linear health model. However, these approaches to spatial confounding adjustment can be applied in generalized linear model settings. The patterns of bias reduction (or increase) for a sequence of adjustment values will differ by context and choice of link function, but the connection between adjustment basis, spatial scale, and overall interpretation remains the same as the linear case. For example, Keet et al. (2018) used TPRS to adjust for largescale confounding across the contiguous United States in an analysis of particulate matter and asthmarelated outcomes. Automated selection could be done using extensions of information criteria such as QIC (Pan, 2001).
In summary, we have presented methods for describing and selecting the spatial scale of spatial confounding adjustment using different choices of basis. These methods can be used to more accurately describe the extent of spatial confounding adjustment in future studies with spatial exposures.
Acknowledgements
This work was supported by grants T32ES015459 and R21ES024894 from the the National Institute for Environmental Health Sciences (NIEHS). The Sister Study supported in part by the Intramural Research Program of the NIH, NIEHS (Z01ES044005). This work was also supported in part by the US Environmental Protection Agency (EPA) through awards RD83479601 and RD835871. This was work has not been formally reviewed by the EPA. The views expressed in this document are solely those of the authors and do not necessarily reflect those of the Agency. EPA does not endorse any products or commercial services mentioned in this publication.
References
 Akaike (1973) Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Csaki (Eds.), Second International Symposium on Information Theory, pp. 267–281. Budapest: Akademiai Kiado.
 Antonelli et al. (2017) Antonelli, J., J. Schwartz, I. Kloog, and B. A. Coull (2017). Spatial Multiresolution Analysis of the Effect of PM2.5 on Birth Weights. Annals of Applied Statistics 11, 792–807.
 Besag et al. (1991) Besag, J., J. York, and A. Mollié (1991). Bayesian image restoration with two applications in spatial statistics. Annals of the Institute of Statistical Mathematics 43(1), 1–20.
 Brochu et al. (2011) Brochu, P. J., J. D. Yanosky, C. J. Paciorek, J. Schwartz, J. T. Chen, R. F. Herrick, and H. H. Suh (2011). Particulate air pollution and socioeconomic position in rural and urban areas of the Northeastern United States. American Journal of Public Health 101(SUPPL. 1), 224–230.
 Burger and Burge (2009) Burger, W. and J. M. Burge (2009). Principles of Digital Image Processing: Core Algorithms. London: Springer.
 Burnett and Krewski (1991) Burnett, R. and D. Krewski (1991). Air pollution effects on hospital admission rates : A random effects modeling approach. The Canadian Journal of Statistics 22(4), 441–458.

Carl and
Kühn (2008)
Carl, G. and I. Kühn (2008).
Analyzing spatial ecological data using linear regression and wavelet analysis.
Stochastic Environmental Research and Risk Assessment 22(3), 315–324.  Chan et al. (2015) Chan, S. H., V. C. van Hee, S. Bergen, A. A. Szpiro, L. A. DeRoo, S. J. London, J. D. Marshall, J. D. Kaufman, and D. P. Sandler (2015). Longterm air pollution exposure and blood pressure in the Sister Study. Environmental Health Perspectives 123(10), 951–958.
 Clayton and Kaldor (1987) Clayton, D. and J. Kaldor (1987). Empirical Bayes estimates of agestandardized relative risks for use in disease mapping. Biometrics 43(3), 671–681.
 Clayton et al. (1993) Clayton, D. G., L. Bernardinelli, and C. Montomoli (1993). Spatial correlation in ecological analysis. International journal of epidemiology 22(6), 1193–1202.
 Daubechies (1988) Daubechies, I. (1988). Orthonormal Bases of Compactly Supported Wavelets II. Variations on a Theme. SIAM Journal on Mathematical Analysis 24(2), 499–519.
 Davis et al. (2019) Davis, M. L., B. Neelon, P. J. Nietert, K. J. Hunt, L. F. Burgette, A. B. Lawson, and L. E. Egede (2019). Addressing geographic confounding through spatial propensity scores: a study of racial disparities in diabetes. Statistical Methods in Medical Research 28(3), 734–748.
 Diez Roux et al. (2001) Diez Roux, A. V., S. S. Merkin, D. Arnett, L. Chambless, M. Massing, F. J. Nieto, P. Sorlie, M. Szklo, H. A. Tyroler, and R. L. Watson (2001). Neighborhood of residence and incidence of coronary heart disease. N Engl J Med 345(2), 99–106.
 Dominici et al. (2004) Dominici, F., A. McDermott, and T. J. Hastie (2004). Improved semiparametric time series models of air pollution and mortality. Journal of the American Statistical Association 99(468), 938–948.
 Dominici et al. (2003) Dominici, F., A. McDermott, S. L. Zeger, and J. M. Samet (2003). Airborne particulate matter and mortality: Timescale effects in four US cities. American Journal of Epidemiology 157(12), 1055–65.
 Fuentes et al. (2007) Fuentes, M., P. Guttorp, and P. D. Sampson (2007). Using transforms to analyze spacetime processes. In L. Held, B. Finkenstadt, and V. Isham (Eds.), Statistical Methods for SpatioTemporal Systems, pp. 77–150. Boca Raton: Chapman Hall/CRC.
 Gonzalez and Woods (2008) Gonzalez, R. and R. Woods (2008). Digital Image Processing (Third ed.). Upper Saddle River: Pearson Prentice Hall.
 Guinness and Fuentes (2017) Guinness, J. and M. Fuentes (2017). Circulant Embedding of Approximate Covariances for Inference From Gaussian Data on Large Lattices. Journal of Computational and Graphical Statistics 26(1), 88–97.
 Haining (1991) Haining, R. (1991). Bivariate Correlation with Spatial Data. Geographical Analysis 23(3), 210–227.
 Hanks et al. (2015) Hanks, E. M., E. M. Schliep, M. B. Hooten, and J. a. Hoeting (2015). Restricted spatial regression in practice: geostatistical models, confounding, and robustness under model misspecification. Environmetrics 26, 243–254.
 Hastie and Tibshirani (1990) Hastie, T. J. and R. Tibshirani (1990). Generalized Additive Models. New York: Chapman & Hall.
 Hodges and Reich (2010) Hodges, J. S. and B. J. Reich (2010). Adding spatiallycorrelated errors can mess up the fixed effect you love. The American Statistician 64(4), 325–334.

Hughes and
Haran (2013)
Hughes, J. and M. Haran (2013).
Dimension reduction and alleviation of confounding for spatial generalized linear mixed models.
Journal of the Royal Statistical Society. Series B: Statistical Methodology 75, 139–159.  Jerrett et al. (2004) Jerrett, M., R. T. Burnett, J. Brook, P. Kanaroglou, C. Giovis, N. Finkelstein, and B. Hutchison (2004). Do socioeconomic characteristics modify the short term association between air pollution and mortality? Evidence from a zonal time series in Hamilton, Canada. Journal of epidemiology and community health 58(1), 31–40.
 Keet et al. (2018) Keet, C. A., J. P. Keller, and R. D. Peng (2018). Longterm coarse particulate matter exposure is associated with asthma among children in medicaid. American Journal of Respiratory and Critical Care Medicine 197(6), 737–746.
 Mensah et al. (2005) Mensah, G. A., A. H. Mokdad, E. S. Ford, K. J. Greenlund, and J. B. Croft (2005). State of disparities in cardiovascular health in the United States. Circulation 111(10), 1233–1241.
 Nason (2008) Nason, G. P. (2008). Wavelet Methods in Statistics with R. New York: Springer.
 Paciorek (2010) Paciorek, C. J. (2010). The importance of scale for spatialconfounding bias and precision of spatial regression estimators. Statistical Science 25(1), 107–125.
 Page et al. (2017) Page, G. L., Y. Liu, Z. He, and D. Sun (2017). Estimation and Prediction in the Presence of Spatial Confounding for Spatial Linear Models. Scandinavian Journal of Statistics 44(3), 780–797.
 Pan (2001) Pan, W. (2001). Akaike’s information criterion in generalizad estimating equations. Biometrics 57(March), 120–125.
 Papadogeorgou et al. (2019) Papadogeorgou, G., C. Choirat, and C. Zigler (2019). Adjusting for Unmeasured Spatial Confounding with Distance Adjusted Propensity Score Matching. Biostatistics 20(2), 256–272.
 Pearl (2011) Pearl, J. (2011). Invited commentary: Understanding bias amplification. American Journal of Epidemiology 174(11), 1223–1227.
 Peng et al. (2006) Peng, R. D., F. Dominici, and T. A. Louis (2006, mar). Model choice in time series studies of air pollution and mortality. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169(2), 179–203.
 Ramsay et al. (2003a) Ramsay, T., R. Burnett, and D. Krewski (2003a). Exploring bias in a generalized additive model for spatial air pollution data. Environmental Health Perspectives 111(10), 1283–1288.
 Ramsay et al. (2003b) Ramsay, T. O., R. T. Burnett, and D. Krewski (2003b). The effect of concurvity in generalized additive models linking mortality to ambient particulate matter. Epidemiology 14(1), 18–23.
 Reich et al. (2006) Reich, B. J., J. S. Hodges, and V. Zadnik (2006). Effects of residual smoothing on the posterior of the fixed effects in diseasemapping models. Biometrics 62(December), 1197–1206.
 Royle and Wikle (2005) Royle, J. A. and C. K. Wikle (2005). Efficient statistical mapping of avian count data. Environmental and Ecological Statistics 12(2), 225–243.
 Sampson et al. (2013) Sampson, P. D., M. Richards, A. A. Szpiro, S. Bergen, L. Sheppard, T. V. Larson, and J. D. Kaufman (2013, aug). A regionalized national universal kriging model using Partial Least Squares regression for estimating annual PM2.5 concentrations in epidemiology. Atmospheric Environment 75, 383–392.
 Schwartz (1994) Schwartz, J. (1994). Nonparametric smoothing in the analysis of air pollution and respiratory illness. Canadian journal of statistics 22(4), 471–487.
 Schwarz (1978) Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics 6(2), 461–464.
 Stuart (2010) Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical science : a review journal of the Institute of Mathematical Statistics 25(1), 1–21.
 Szpiro et al. (2014) Szpiro, A. A., L. Sheppard, S. D. Adar, and J. D. Kaufman (2014). Estimating acute air pollution health effects from cohort study data. Biometrics 70(1), 164–174.

Tiefelsdorf and
Griffith (2007)
Tiefelsdorf, M. and D. a. Griffith (2007).
Semiparametric filtering of spatial autocorrelation: The eigenvector approach.
Environment and Planning A 39(5), 1193–1221.  Wakefield (2007) Wakefield, J. (2007). Disease mapping and spatial regression with count data. Biostatistics 8(2), 158–183.
 White (1980) White, H. (1980). A HeteroskedasticityConsistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity. Econometrica 48(4), 817–838.
 Wood (2003) Wood, S. N. (2003). Thin plate regression splines. Journal of the Royal Statistical Society: Series B 65(1), 95–114.
Comments
There are no comments yet.