Glaucoma is one of the leading causes of blindness in the world. While its etiology is not fully understood, it is known to be caused by damage to the optic nerve head (ONH) that can be induced by intraocular pressure (IOP). Researchers hypothesized that the biomechanics of the peripapillary (PP) scleral region close to the ONH, shown to be an important determinant of ONH biomechanics, may play a major role in glaucoma pathogenesis and progression (Sigal et al., 2005; Burgoyne and Downs, 2008). Recently, novel custom instrumentation was developed that can induce a fixed level of IOP and precisely measure the mechanical strain in the posterior human sclera (Fazio, Bruno, Reynaud, Poggialini and Downs, 2012; Fazio, Grytz, Bruno, Girard, Gardiner, Girkin and Downs, 2012)
. A commercial laser speckle interferometer (ESPI) is used to measure the IOP-induced scleral displacement continuously around the posterior eye. The scleral displacement is processed to estimate the mechanical strain tensor on a grid of points on the scleral surface, and the largest eigenvalue of this strain tensor computed to yield the maximum principal strain (MPS), a scalar measurement for each location on the scleral surface that summarizes the magnitude of tensile strain at that location. Intuitively, scleral regions with higher MPS for a given level of IOP are more pliable, which in principle could either relieve IOP and reduce the potential for ONH damage or focus strain at the ONH and increase glaucoma damage. Age is a primary glaucoma risk factor and age-related stiffening occurs in other load-bearing soft tissues. Thus, the researchers hypothesized that age-related changes in scleral stiffness might contribute to the known age-related increase in glaucoma risk.
Utilizing this custom instrumentation, Fazio et al. (2014) conducted a study to test the hypothesis that scleral stiffness increases with age, with the expectation that future work will follow to elucidate the specific role of scleral stiffness in glaucoma. They obtained twenty pairs of eyes from normal human donors in the Lions Eye Bank of Oregon in Portland, Oregon, and the Alabama Eye Bank in Birmingham, Alabama. From each subject, the MPS was measured in the posterior globes of both left and right eyes on a partial spherical domain with 120 circumferential locations and 120 meridional locations , where corresponds to the ONH. For each eye, MPS was measured under nine different IOP levels (7, 10, 15, 20, 25, 30, 35, 40, and 45 mmHg). Further scientific and technical details can be found in Section 5. Figure 6 plots a polar azimuthal projection of MPS functions for one subject under 45 mmHg IOP level. The center of each panel corresponds to the ONH of each eye. The hypothesis is that MPS will decrease with age, especially in scleral regions closer to the ONH.
The resulting data set is complex and high dimensional with many layers of structure. First, for each eye at each IOP level, the MPS measurements constitute a function on a two-dimensional partial spherical domain for which one needs to account for within-function (intrafunctional) correlations. Second, there are multiple sources of between-function (interfunctional) correlation. The measurements from the left and right eye from the same subject are expected to be correlated, and measurements from the same eye across the IOP levels should be serially correlated. The strength of these nested and serial correlations may potentially vary around the scleral surface. Based on preliminary looks at the data, it appears that the age effect on MPS may not be linear, and also appears to vary around the scleral surface. This data set is also enormous, with over 4.5 million measurements, which poses practical problems for model building and fitting.
Many researchers, when faced with such complexities, would use one of several strategies to simplify the data so they can analyze them. Some researchers would get rid of the complexities of the functional data by computing summaries and modeling only those, while discarding the original functional data. In this application area, some researchers create summaries in the peripapillary (PP) and mid-peripheral (MP) scleral regions by integrating over the region closest to the ONH () and further out (), respectively, or over several circumferential sections. This practice would miss any scientific insights not captured by these summaries. Alternatively, some researchers would only model a subset of the data (e.g. only a single IOP or single eye per subject) to avoid having to deal with potentially complex interfunctional correlation, or ignore this correlation entirely by modeling IOP within eyes or eyes within subject as independent. Some would ignore intrafunctional correlation by modeling individual pixels on the image independently. Many researchers would also only consider parametric or linear effects for covariates such as age without considering whether more complex nonparametric covariate effects might be necessary to capture the true relationship. All of these simplification strategies have potential statistical downsides.
In our opinion, it would be far preferable to model this complex function data set in its entirety using a statistical model that is flexible enough to capture all of its potentially complex intrafunctional and interfunctional structure. However, to the best of our knowledge, no statistical model has been presented in existing literature that simultaneously handles all of this structure. In this paper, we use the Bayesian functional mixed model (BayesFMM) framework, introduced in Morris and Carroll (2006) and further developed in subsequent papers, to model these data, which has sufficient flexibility to capture nonparametric covariate effects, serial and nested interfunctional correlation, functions on a fine 2d partial spherical domain, and is computationally efficient enough to scale up to this enormous size and produce Bayesian inference for functional parameters as well as any desired summaries. This requires a careful description of how to utilize the BayesFMM framework to model smooth nonparametric effects and serial interfunctional correlation, which has not been done in any existing literature to date.
While motivated by and applied to the glaucoma scleral strain data, the BayesFMM framework we present here is more generally applicable to many types of complex, high dimensional functional data of modern interest including wearable computing data, genome-wide data, proteomics data, geospatial time series data and neuroimaging data. Our hope is that besides revealing scientific insights into glaucoma, this paper can serve as a template for performing a state-of-the-art functional response regression analysis for other complex functional data sets.
The rest of the paper is organized as follows. Section 2 contains a brief review of some relevant literature in functional regression, including existing methods for modeling nested or serial interfunctional correlation and nonparametric smooth covariate functional effects. Section 3 contains methods, overviewing the BayesFMM framework and providing methodological details for how to fit serially correlated functions and nonparametric smooth functional effects in this framework and discussing its properties. Section 4 describes a model selection heuristic that can be used to assess which of the various potential complex modeling structures are needed for the current data before having to run a full MCMC. Section 5 presents the details of our analysis of the glaucoma scleral strain data, including details on basis function and modeling components chosen, a summary of results, and various sensitivity analyses. Section 6 contains a discussion of general scientific conclusions from our analysis and an assessment of the strengths and weaknesses of the BayesFMM framework for this setting of complex functional regression models. Supplementary materials provide additional computational details and graphical results of the analysis, as well as code to fit the models contained in the paper.
2 Literature Review
There is a rich and rapidly expanding literature on methods to perform functional regression, which includes functional predictor regression (scalar-on-function), functional response regression (function-on-scalar), and function-on-function regression. For example, see Morris (2015) for an extensive review of work in this area, which has exploded in the past decade. In this paper, our goal is to regress the scleral strain MPS functions on predictors age and IOP, so we are interested in functional response regression. Here, we will summarize some key existing literature relevant to the structures present in the glaucoma scleral strain data set, including methods that account for nested and serial interfunctional correlation and smooth nonparametric covariate effects.
As summarized in Morris (2015), much of the work on functional response regression assumes independently sampled functions, but a number of published models can handle interfunctional correlation. Many are focused on nested or crossed sampling designs that induce compound symmetry covariance structures among functions sampled within the same cluster, including Brumback and Rice (1998), Morris et al. (2003), Berhane and Molitor (2008), Aston et al. (2010), and Goldsmith and Kitago (2016). There are relatively few that model serially correlated functions, for which functions are observed at multiple levels of some continuous variable for the same subject. Most commonly, the functions occur along a grid of time points, which could be called longitudinally correlated functional data, but can also occur over other continuous variables such as IOP for our glaucoma data. There are a number of papers that focus on functional predictor regression (Goldsmith et al., 2012; Gertheiss, Goldsmith, Crainiceanu and Greven, 2013; Kundu et al., 2016; Islam et al., 2016) or estimate multi-level principal components (Greven et al., 2010; Zipunnikov et al., 2011; Chen and M’́uller, 2012; Li and Guan, 2014; Zipunnikov et al., 2014; Park and Staicu, 2015; Shou et al., 2015; Hasenstab et al., 2017) for serially correlated functions, but these papers do not deal directly with functional response regression, i.e. do not regress the functions on covariates while accounting for this serial correlation in the error structure. Any methods built for a general functional mixed modeling framework containing multiple levels of general random effect functions, including the BayesFMM framework first introduced by Morris and Carroll (2006) and the functional additive mixed model (FAMM) framework first introduced by Scheipl et al. (2015), can be used for functional response regression while accounting for serial interfunctional correlation if an acceptable parametric form for the serial variable can be found. However, no specific models or examples in existing papers have presented such a case. In this paper, we will demonstrate in detail how to account for serial correlation within the BayesFMM framework.
Most functional response regression work, while modeling coefficients as nonparametric in the functional domain , has focused on models linear in covariates , e.g. with terms such as . A number of methods allow functional coefficients that are smooth and nonparametric in covariate as well, e.g. . The last chapter of Wood (2006) describes additive mixed models (AMMs) that extend Laird and Ware (1982) to include nonparametric fixed effects and parametric random effects. After specifying a parametric form for in fixed and random effects, this framework can fit terms that are nonparametric in but parametric in using mgcv in R. Scheipl et al. (2015) describe a generalization of AMMs to the functional regression settting, yielding functional additive mixed models (FAMM) that can fit functional response, functional predictor, or function-on-function regression models, and allowing terms that are nonparametric in both and , using mgcv to fit the underlying models. Initially designed for splines, this work was extended to allow the use of functional principal components (fPC) for the random effect functions to handle sparse, irregularly sampled outcomes in Cederbaum et al. (2015), to model generalized outcomes in Scheipl et al. (2016), and to utilize a new boosting-based fitting procedure for some models that leads to other computational and modeling benefits in Brockhaus et al. (2015). This series of work is summarized in a review article Greven and Scheipl (2017a). These works utilize the fact, going way back to Wahba (1978)
, that penalized splines can be represented using a linear mixed model framework. Using similar representations, the BayesFMM framework first introduced inMorris and Carroll (2006) can also be used to fit nonparametric terms with appropriate specification of the design matrices and in the model, but this has not been done in any existing paper. In this paper, we will describe in detail how to accommodate smooth nonparametric terms in the BayesFMM framework.
To our knowledge, the only methods in existing literature with the flexibility to both fit smooth nonparametric functional covariate terms and simultaneously account for nested and serial correlation are the FAMM framework (Scheipl et al., 2015; Greven and Scheipl, 2017a) and the BayesFMM framework (Morris and Carroll, 2006). Both of these frameworks are extremely flexible, based on functional extensions of linear mixed models, but have important differences, which are discussed in detail in Morris (2017) and Greven and Scheipl (2017b). The FAMM framework has several limitations that prevent its application to our glaucoma scleral strain data, including the requirement of functions to be on 1d Euclidean domains, use of spline bases for fixed effects with L2 penalties, and use of a computational approach that may not scale up well to enormous data sets like this. Thus, to fit these data, we adapt the BayesFMM framework to include smooth nonparametric age effects and serial correlation across IOP, and demonstrate how we can use it to reveal insights into the biomechanical etiology of glaucoma.
We first overview the BayesFMM framework in Section 3.1, and then for ease of exposition we build up the necessary components of our model separately before presenting our final general model. In Section 3.2, we demonstrate how to capture serial interfunctional correlation through functional growth curve models, and then demonstrate how to accommodate smooth nonparametric covariate functional effects in Section 3.3. Besides presenting the models, we also describe their properties and contrast with existing alternative methods. In Section 3.4 we present a general alternative form of the BayesFMM that includes smooth nonparametric terms that we use to fit our glaucoma scleral strain data.
3.1 Bayesian Functional Mixed Models (BayesFMM)
Suppose we have a sample of functions observed on a common fine grid of size on a domain , which is potentially multi-dimensional and/or non-Euclidean (Morris et al., 2011). The FMM introduced by Morris and Carroll (2006) is a functional response regression model given by
where are fixed effect functions that model the effect of covariate on the response at position for each covariate , and the are random effect functions at level corresponding to design matrix , with being the number of random effects at the respective levels. As in linear mixed models for scalar data, the fixed and random effect predictors can be discrete or continuous, and can involve individual covariates or interactions of multiple covariates. Although not explicitly portrayed in (1), this modeling framework also accommodates functional predictors to perform function-on-function regression (Meyer et al., 2015).
Distributional and Covariance Assumptions: Here, for simplicity, we first describe the Gaussian FMM with conditionally independent random effect and residual error functions, and then mention some other alternatives available in this framework. In this case, the random effect functions are iid mean zero Gaussian Processes with intrafunctional covariance cov and the residual error functions are iid mean zero Gaussian Processes with intrafunctional covariance cov. Other extensions of this framework allow the option of conditional autoregressive (CAR) (Zhang et al., 2014) or Matern spatial covariance or AR() temporal interfunctional correlation structures in the residual errors (Zhu et al., 2014). Although focusing on Gaussian regression here, a robust version of this framework assuming heavier tailed distributions on the random effects or residuals is available (Zhu et al., 2011)
if robustness to outliers is desired, and can also be utilized with any other features or modeling components in the BayesFMM framework.
Basis Transform Modeling Approach: A basis transform modeling approach is used to fit model (1). This first involves representing the observed functions with a basis expansion with a set of basis functions :
While initially developed for wavelets (Morris and Carroll, 2006), as first discussed in Morris et al. (2011), the modeling approach can be used with any basis. It is meant to be used with lossless transforms with for all observed , so that the basis coefficients contain all information within the observed functional data , or at least near-lossless with
for some small value and measure . This assures that the chosen basis is sufficiently rich such that for practical purposes it can recapitulate the observed functional data, and visual inspection of the raw functions and basis transformation should reveal virtually no difference. Any basis functions can be used, including commonly used choices splines, wavelets, Fourier bases, PCs or creatively constructed custom bases, and can be defined on multi-dimensional or non-Euclidean domains .
Rather than including the bases in a design matrix and using scalar regression methods to fit the model, our approach is to transform the observed functions into the basis space to obtain the basis coefficients , a matrix for which element contains the basis coefficient for observed function , fit a basis-space version of the FMM to these coefficients, and then transform results back to the data space model (1) for estimation and inference. With basis representation written in matrix form with a matrix of basis functions evaluated on the observational grid with , the coefficients can be computed by with as long as rank, or for certain basis functions including wavelets and Fourier bases their special structure enables fast algorithms for computing these coefficients.
Basis Space Model: Thus, rather than fitting model (1) directly, the basis-space version of the model is fit for each basis coefficient :
where , , and are basis coefficients for the functional fixed effects , functional random effects , and functional residuals , respectively. While in principle correlation across basis coefficients can be accommodated, for complex, high-dimensional functions, it may be beneficial to model these basis coefficients independently. For the Gaussian FMM with conditionally independent random effect and residual error functions, it is assumed and with and
scalar variance components. Although modeling independently in the basis space, this structure induces intrafunctional correlation according to the chosen basis functions. For example, for the residual error functions, the inducedintrafunctional correlation S with is given by:
where , and defined likewise. For suitably chosen basis functions that effectively capture the characteristic structure of the observed functions , this can allow a flexible class of covariance structures indexed by covariance parameters. One can assess the suitability of these assumptions by taking the basis-space variance components, computing (5), and plotting this covariance matrix to see if it appears to capture the salient structure. Figure 3 panels (e) and (f) plot the intrafunctional correlation structure induced by the chosen tensor wavelet basis for the scleral strain MPS data set for a particular scleral location for the eye-to-eye random intercepts and residual error functions, and the file intrafunctional_correlation.mp4 in the supplement shows a more extensive summary of all random levels across scleral locations.
Shrinkage Priors for Regularization of Fixed Effects: If fitting this model using a frequentist approach, L1 or L2 penalties could be imposed on the basis space fixed effects to induce regularization/smoothing of the fixed effect functions , as described in Morris (2015). However, this framework was designed to use a Bayesian modeling approach, in which case regularization of the fixed effect functions is accomplished through specification of shrinkage priors for the corresponding basis coefficients:
for some mean zero distribution with corresponding regularization parameters indexed by that define a partitioning of the basis coefficients into regularization sets, which are subsets of basis coefficients sharing the same regularization parameters. Morris and Carroll (2006) used the spike-slab prior (George and McCulloch, 1993) for and we also use that here, but other alternatives include Gaussian, Laplace (Park and Casella, 2008), Horseshoe (Carvahlo et al., 2010), Normal-Gamma (Griffin and Brown, 2010), and Dirichlet-Laplace (Bhattachary et al., 2015). The regularization parameters
can be given hyperpriors or be estimated by empirical Bayes, which is described in detail for the spike-slab inMorris and Carroll (2006).
Model Fitting Approach:
A fully Bayesian modeling approach is used to fit a Markov Chain Monte Carlo (MCMC) to the basis space model (4) for each , and then transforming back to the data space, e.g. using to yield posterior samples for each parameter in the data-space FMM (1). This requires specification of priors for the variance components; a vague empirical Bayes approach is used that centers the prior on REML starting values of these parameters, minimally informative to be equivalent to two data points of information, as detailed in Section 5. BayesFMM fits a marginalized version of model (4) with all random effects integrated out, which integrates over the interfunctional correlation induced by the random effect levels when updating the fixed effects, and speeds convergence of the chain and calculations since the random effect functions themselves need not be sampled. As detailed in the supplement, the fixed effect updates involve conjugate (spike-slab) Gibbs steps and variance components are updated via Metropolis-Hastings with proposal variances automatically computed based on the corresponding Fisher information. If desired, posterior samples for random effects can be obtained by sampling from conjugate Gaussians.
Given these posterior samples, one can compute any desired posterior probabilities and pointwise or joint credible bands that can be used for Bayesian inference. Joint bands can be constructed using the approach described inRuppert et al. (2003). These inferential summaries can be constructed in the data or basis space for any functional of model parameters, including contrasts (e.g. ), nonlinear transformations (e.g. exp), derivatives (e.g. , or integrals (e.g. for some ) aggregating information across regions of . We make use of these to produce inference on numerous scientifically interesting summaries for our scleral strain MPS data in Section 5.
Example FMM for Scleral Strain MPS Data:. To illustrate this framework, suppose that we model MPS functions from both left and right eyes for each subject, but only for a specific IOP level, and suppose we are willing to assume a linear age effect. Let and be MPS functions for the left and right eyes respectively from the -th subject, with indexing the scleral domain, with being spherical coordinates on the scleral surface. We could represent these data with the following FMM:
with and . The induce a generalized compound symmetry covariance structure between the functions from the left and right eyes from the same subject, with cov. To simultaneously model data for all IOP, we would need to add a random effects level to capture the serial correlation across MPS functions for different IOP for the same eye. We next describe how this can be done.
3.2 Accommodating Serial Interfunctional Correlation via Functional Growth Curves
Recall that in our scleral strain MPS data set, for each eye we have MPS functions from each of a series of IOP levels ranging from 7 mmHg to 45 mmHg, which induces a serial correlation across scleral strain functions for the same eye according to IOP. It is important to account for this serial interfunctional correlation in some appropriate fashion in order to obtain efficient estimates and accurate inference for any fixed effect functions in the model. Here we demonstrate how functional random effects in a functional mixed modeling framework can be used to capture this serial correlation using a type of functional growth curve model, and discuss the properties of this strategy in the context of the BayesFMM framework using a basis transform modeling approach.
Basic Growth Curve Model for Scleral Strain MPS Data: For illustration, here we only consider the serial effect of IOP and omit any age effect and consider only the left eye for each subject. Suppose is the MPS function for the subject, , after exposure to an IOP level of . Note that is a function of that varies across serial variable . To capture the serial effect of , we consider the following functional growth curve model:
where is the mean MPS for IOP level and scleral location , is a mean zero random effect for subject that represents a subject-specific growth curve in that is allowed to vary across scleral location , and are residual error functions assumed to be independent and identically distributed mean zero Gaussians with covariance . If we can find a suitable parametric form for the serial effect of with basis functions , we assume and with cov. In practice, we recommend using basis functions that are orthogonal across , i.e. for to obviate the need to have cross-covariance terms between and to avoid additional computational complexity in the model.
The introduction of this eye-level random growth curve induces serial covariance across functions for the same eye at different levels of IOP, with cov Indexed by , the strength and shape of this serial covariance can vary across the scleral surface . Figure 3 panel (d) contains the induced serial correlation across IOP for the marked scleral location, and the file IOP_corr.mp4 in the supplementary materials is a movie file that demonstrates how this correlation varies over the scleral surface. Note also that cov meaning that this structure enables “borrowing of strength” from nearby in determining the strength and shape of the serial covariance according to the intrafunctional covariance indicated by the off-diagonal elements of . If model (8) is marginalized with respect to the random effect functions, the resulting error terms can be seen to contain the induced serial correlation structure, which is subsequently accounted for in any estimation or inference of the fixed effect functions.
Incorporation into BayesFMM framework: It can be seen that model (8) can be written as a functional mixed model with fixed effect predictors and random effect levels, each with subject-specific random effect functions. Thus, any FMM framework allowing multiple levels of random effect functions could be used to fit this model. In the BayesFMM framework, after transforming the observed functions into the basis space through as described in Section 3.1, the model for coefficient would be given by:
with and . This would induce serial covariance across in the model for each basis coefficient with cov, and the marginalized model that is fit with the integrated out will contain this serial covariance in the error structure and explicitly account for it when updating the fixed effect coefficients. This basis space model induces the serial covariance structure described in the data space model (8), with =
. The heteroscedasticity of the variance components across basis functionsallows the strength and shape of the serial covariance to vary across the scleral surface , and effectively borrows strength from nearby through the chosen basis functions as determined by the induced off-diagonal elements of . Figure 3 panel (e) plots the intrafunctional correlation structure corresponding the random intercept induced by the tensor wavelet basis chosen for the scleral strain MPS data set at the marked scleral location, and file intrafunctional_correlation.mp4 in the supplement presents the induced intrafunctional correlation for each of the across all scleral locations.
This strategy can be used with any parametric model indicated by the, preferably orthogonalized. Section 5 demonstrates that the IOP effect in the scleral strain MPS data is hyperbolic, so we devise an orthogonalized hyperbolic model for these data. Also, note that while the serial variable IOP is sampled on a common grid across subjects for our data, this strategy can allow each the grid points for the serial variable to vary across subjects.
3.3 Smooth Nonparametric Covariate Functional Effects
One of the primary scientific goals in the scleral strain MPS data is to study the effect of age on MPS and assess how it varies around the scleral surface. Preliminary investigations of the data suggest that the age effect might not follow a simple parametric form, and a nonparametric representation might be appropriate. While the fixed effect functions in the BayesFMM framework are linear in the covariates, using the mixed model representation of penalized splines shown by Wahba (1978), it is possible to fit a semiparametric functional mixed model with a smooth nonparametric age effect using this framework, as we will demonstrate in this section.
Smooth Nonparametric Age Effect for Scleral Strain MPS data: For ease of exposition, in this section we just consider a single smooth nonparametric term with no other covariates or random effects. Thus, suppose for each subject we only model a single scleral strain function , say for left eye and IOP=45mmHg, with the model:
where is a functional intercept and represents a nonparametric effect of age on MPS at scleral location , with and penalizing to induce smoothness across for each . As we will demonstrate, it is possible to represent this smooth nonparametric term as a sum of a linear fixed effect function for age and spline random effect functions,
for some suitably constructed random effect design matrix based on Demmler-Reinsch basis functions (Demmler and Reinsch, 1975), with spline random effects following a mean zero Gaussian with cov. These model components can be incorporated within a FMM framework like BayesFMM, as we now describe.
Incorporation into BayesFMM framework: To fit model (10) using the BayesFMM framework, we fit separate penalized splines for each basis coefficient , which induces correlated penalized spline fits for each scleral location . Specifically, after transforming the observed functions into the basis space according to the basis representation as described in Section 3.1, we specify the following model for each basis coefficient :
with and a smooth nonparametric function of for basis function . We pull out the intercept and constrain to ensure identifiability in additive models that contain multiple smooth nonparametric terms. We represent using B-spline basis functions,
where are B-spline coefficients and are the cubic B-spline basis functions defined by the knots such that
where , is the B-spline design matrix with the -th entry being , , and . Following Wand and Ormerod (2008), we assume the following prior distribution on the B-spline coefficients:
where is a matrix with . The resulting posterior mean of the spline random effects is , where . It can be shown that corresponds to the O’Sullivan penalized spline estimator of with penalty term (Wand and Ormerod, 2008), and if the knots are placed at each observed , then this corresponds to the cubic smoothing spline estimator.
The spectral decomposition of allows us to reformulate this prior specification as a mixed model with independent random effects as follows. It is known that . Therefore, the spectral decomposition of has the form of , where and . Let , where is a sub-matrix of corresponding to the first two columns of and is a sub-matrix of corresponding to the other columns. Let
be a two-dimensional vector, andbe an -dimensional random vector. It can be shown that with a fixed effect and .
Therefore, we have the following mixed model representation of (12),
where and . The are called the Demmler-Reinsch spline bases (Demmler and Reinsch, 1975). It can be shown that is a basis for the space of the straight line, so model can equivalently be rewritten as
where is an -dimensional vector consisting of 1’s and . We see that this is the form of a linear mixed model with a level of spline random effects with design matrix and random effects . With the intercept term pulled out as implied by the assumption, the term in (12) is given by , and thus in the BayesFMM framework we can incorporate a nonparametric term by simply including a linear fixed effect plus a level of random effects with the corresponding Demmler-Reinsch design matrix . These penalized splines for each basis , when projected back to the function space, induce a smooth nonparametric functional effect given by (11), with . Based on these derivations, we can add any additional smooth nonparametric term to the FMM framework by simply adding a linear fixed effect function and an additional level of spline random effects with a mean zero Gaussian with covariance cov
A similar procedure could be followed to utilize other spline modeling approaches within this framework, e.g. P-splines (Eilers and Marx, 1986) with differencing penalties or truncated polynomial splines (Ruppert et al., 2003), but we prefer the O’Sullivan splines (Wand and Ormerod, 2008) given their natural second derivative penalty and formal connection to smoothing splines.
Intrafunctional Correlation of Across : This framework allows the nonparametric smooth effect of to vary over , but is not the same as modeling independent splines for each . Because the splines are fit in the basis space, the nonparametric fits are correlated intrafunctionally by:
This means that the spline fit for borrows strength from other functional locations according to the effective intrafunctional covariance structure that induces smoothing across in the spline fits of .
Smoothing Parameter of , , is Nonstationary and Smooth Across : In our BayesFMM implementation of the smooth nonparametric term , we allow the penalized spline for each basis function to have its own smoothing parameter . The basis space model induces a residual error covariance matrix cov back in the data space, with diagonal elements , and a spline random effect covariance matrix cov back in the data space, with diagonal elements . Thus, the effective smoothing parameter for the induced spline fit at location is given by
meaning that the smoothness in is allowed to vary across , enabling some parts of the function to be linear with large and others to be nonlinear with small . Also, this smoothing parameter is not estimated independently for each , but the off-diagonal elements of S and imply a dependency across in , meaning that the model “borrows strength” across leading to smoothness in across .
We believe this to be the first presentation of a model with such flexibility in the literature, i.e. with varying smoothly across with the smoothing parameter in , , also varying smoothly across . The FAMM models of Scheipl et al. (2015) and Greven and Scheipl (2017a) estimate terms like that are smooth across both and , but utilize an additive penalty term involving marginal smoothing parameters in the and directions, and . This structure does not allow the type of nonstationarities enabled here, which in Section 5 of the supplement we demonstrate are necessary to accurately model the scleral strain MPS data. It may be possible in the FAMM framework to accommodate this type of flexibility by putting a spline on that varies smoothly across , but this has not been done in any published paper to date, and it is not clear whether such an approach would be computationally feasible for large functional data sets.
Degrees of Freedom Function : In the penalized spline literature with penalized spline estimator given by , a standard summary of the nonlinearity of the fit is given by the dimensionality of the projection space given by , called the degrees of freedom of the fit. A indicates a linear model and indicates significant nonlinearity. To assess how the degree of nonlinearity of the spline fit varies over , we can compute the degrees of freedom function marginally across by
with defined as in (19). In general semiparametric functional mixed models with other levels of random effects to account for interfunctional covariance according to model (21) below, as necessary for modeling our scleral strain MPS data, the derivation for is more complex, and outlined in Section 2 of the supplementary materials. Panel (c) of Figure 3 presents for the MPS data.
3.4 General Bayesian Semiparametric Functional Mixed Model
In order to model the highly structured scleral strain MPS data set, we need include all of the modeling structures described in the preceding sections, including random effects to capture nested and serial interfunctional correlation and smooth nonparametric smooth covariate effect functions, together in a common BayesFMM model. To highlight its ability to model smooth nonparametric structures as described in Section 3.3, it is useful to adapt the notation of the core BayesFMM model to explicitly include these terms. We term this version of the FMM a semiparametric functional mixed model since it includes both linear and smooth covariate effects.
Given a sample of functions , with covariates for fixed linear effects , smooth nonparametric effects , and levels of random effect covariates , we have the following semiparametric FMM:
with and being mean zero Gaussian processes with covariance surfaces and defined on .
with being the number of interior knots for the spline for , the corresponding Demmler-Reinsch design matrix, the corresponding spline random effect functions, and and modeling the interfunction covariance structure. As described above, the model would be fit in the transformed basis space. We first fit the basis space model with all random effects integrated out, and then sample the spline random effects from their complete conditional distribution while integrating out the other levels of random effects that capture any interfunctional covariance, and then project back to the function space in order to construct posterior samples of on any desired grid of .
While omitted from (21) for ease of presentation, this model can also be easily made to include any desired parametric-nonparametric interaction terms, with interaction of parametrically modeled covariate and nonparametrically modeled covariate being represented by the term . For
that are categorical dummy variables, this allows separate nonparametric fits offor different levels of the dummy variable. For continuous , this allows the corresponding slope to vary smoothly and nonparametrically with both and . For example, in our scleral strain data, one may wish to include an interaction term to allow the nonparametric age effect to vary across IOP levels. If dummy variables were specified for each IOP level, this would allow separate independent nonparametric age effects for each IOP level. If IOP is modeled continuously via a parametric model like the hyperbolic model described in Section 5, this would allow the hyperbolic coefficients to vary smoothly by age and scleral position, which would be equivalent to nonparametric age effects that vary across IOP but borrow strength from nearby IOP according to the structure induced by the hyperbolic model. In either case, the fixed effect and random spline design matrices corresponding to the would be given by and , respectively, which are straightforwardly included in the FMM. As described in Section 5, we considered these interaction structures, but found they did not appear necessary for representing the scleral strain MPS so were not included in the final model.
4 Model Selection Heuristic for Semiparametric BayesFMM
Given the extensive flexibility of the semiparametric BayesFMM framework, there are a large number of modeling decisions to make. For example, in our sclera strain MPS data set, should the age effect be linear or nonparametric? If nonparametric, should the smoothing parameter in age be allowed to vary across the scleral surface , or is a common smoothing parameter across all scleral locations sufficient? Should the fixed IOP effect be linear, hyperbolic, or nonparametric? Should there be an interaction of age and IOP? Should there be a fixed left vs. right eye effect? For the random effect levels, is the subject-specific random effect necessary to account for correlation between right and left eyes for the same subject? Is the correlation across functions from multiple IOP for the same eye sufficiently handled by a compound symmetry structure assuming equal correlations, or is a structure allowing serial correlation necessary? Should this serial correlation be based on a linear, parabolic, or hyperbolic model? These decisions are challenging to make in a simple generalized additive mixed model framework with scalar responses, and become even more challenging in the current setting with complex, high-dimensional functional responses.
There are a some papers in the frequentist literature for performing variable selection in functional regression contexts (Scheipl et al., 2013; Gertheiss, Maity and Staicu, 2013; Brockhaus et al., 2015). However, there is a lack of functional regression model selection methods for MCMC-based fully Bayesian models such as the semiparametric BayesFMM, which present special challenges. One could split the data into training and validation data sets, fit separate MCMC for each prospective model in the training data, and then compute ratios of predicted marginal densities, integrating over MCMC posterior samples, for the validation data, as done in Zhu et al. (2014)
, for example. These predictive Bayes Factors would provide a rigorous model selection measure, or alternatively parallel MCMC could be run for each prospective model, and a multinomial random variable with Dirichlet prior used to select and perform Bayesian model averaging across models as the chains progress. These strategies might work fine for simple, low dimensional data sets or settings with only a few prospective models, but for the current setting with complex, high-dimensional data, they are impractical.
In this section, we present a model selection heuristic that we have developed that can explore a number of potential model structures to find which seem to be most appropriate for the given data without running any MCMC, and also provides ML and REML estimates that can be used as starting values for the parameters in the BayesFMM. This heuristic is admittedly ad hoc, but is based on standard methods and appears to perform well in simulations, and so we believe can be a useful tool for modelers to assess which structures to included in their semiparametric FMM.
Our overall approach is to fit linear mixed models (LMM) to each basis coefficient using the lme function in R (Pinheiro et al., 2017) for each prospective model, and then use a weighted voting scheme based on importance weights for each basis and an adapted Bayesian Information Criterion (aBIC) to obtain probability scores for each prospective model. Here we outline the steps in detail.
Basis transform and importance weights: Transform the raw functions to the basis space , and compute a series of weights that measure the relative importance of each basis for representing the data set. These weights can be computed by , with . For orthogonal , the represent the relative percent energy captured by basis coefficient .
Fit basis-specific LMM and compute aBIC scores: For each prospective model , use lme in R (Pinheiro et al., 2017) to fit the corresponding LMM to the data for each basis coefficient , and compute an adapted version of the BIC (), which we define as:
where the log-likelihood is the marginal likelihood of the fixed effect and variance components of the model with the non-spline random effects integrated out conditional on the data for basis , is the total number of observations in the dataset for basis , and is the total number of parameters of model . As discussed by Vaida and Blanchard (2005) and Spiegelhalter et al. (2002), selection of the effective number of parameters for LMM or Bayesian hierarchical models is tricky and context dependent. If inference is desired on the random effects themselves, then counting only fixed effects and variance components as parameters is not appropriate. In our setting, we are not interested in the random effects at levels capturing interfunctional correlation as we work with the marginalized model, but for nonparametric terms we are clearly interested in the “random effects” corresponding to the spline coefficients. Thus, we count the number of parameters to be the sum of the number of fixed effects, the number of variance components and the estimated degrees of freedom for each nonparametric term. This last term adjusts appropriately for the extra parameters of the spline fits even thought they are captured as random effects in the LMM.
Use weighted voting scheme to rank models: We compute a probability weight for each model :
This procedure is applied in two steps: first assessing different fixed effect models (including parametric and/or nonparametric effects), and second assessing various random effect structures for capturing interfunctional variability while conditioning on the best fixed effect model.
In principle, indicates whether the model is the best in terms of for the data set on the basis. Therefore, the is computed via a weighted voting scheme, an aggregated measure of proportion of times is the best model across the all basis coefficients, with basis coefficients weighted by . In this way, the model fit for basis coefficients that account for a larger proportion of the total variability in the data count more towards the overall model selection. Empirically, we have found this weighted voting scheme seems to work well, as it is robust in the sense of not allowing any one basis function, especially one explaining a relatively low proportion of total energy for the given data set, to dominate the model selection because of an extreme score. This can also be applied using alternative measures (e.g. ). We acknowledge that this strategy is ad hoc and more rigorous model selection methods for settings like this are needed, but we believe it can provide useful guidance for modelers and performs well in simulations, as described below.
A word of caution:. This heuristic is meant for selecting among various different modeling structures for specified covariates as done for our case study, or perhaps could be used to select among a few covariates, but it is not intended for high-dimensional variable selection across many potential predictors
. In such settings, consideration of a large number of models and only fitting the “best” one can dramatically inflate type I error rates, and post-selection inference as described inBerk et al. (2013) would need to be considered.
Simulation Study on Model Selection: We conducted a simple simulation study to investigate the performance of this model selection heuristic. We considered four different models:
Model 1 (null model): ,
Model 2 (linear age effect): ,
Model 3 (nonparametric age effect): , and
Model 4 (linear age effect, random effect): .
We fit each of these models to the scleral strain data and used the fitted model as the truth, and simulated 100 replicate data sets for each model. For each simulated data set, we fit each of these four models and performed the model selection procedure using to select the best model. In all four scenarios, this procedure selected the correct model 100/100 times. Average values of for each model can be found in Section 4 of the supplementary materials, and Section 6 of the supplement investigates issues that can arise in variable selection of GAMMs when considering nonparametric smooth terms of subject-specific covariates in models including subject-level random effects.
5 Glaucoma Scleral Strain MPS Case Study
5.1 Overview of Glaucoma Scleral Strain MPS Data
As described in Section 1, glaucoma is characterized by ONH damage related to IOP but its etiology is not fully known. Researchers have hypothesized that biomechanics of the scleral region close to the ONH may modulate the effect of IOP on the ONH, and thus may play an important role in glaucoma. In particular, the scleral surface is elastic so deforms under pressure, which can partially relieve IOP-induced forces on the eye, including the ONH. Thus, studies of these properties could reveal insights into the etiology of glaucoma.
Recently, novel custom instrumentation was developed that can precisely measure the mechanical strain in the posterior human sclera at a fixed level of IOP (Fazio, Bruno, Reynaud, Poggialini and Downs, 2012; Fazio, Grytz, Bruno, Girard, Gardiner, Girkin and Downs, 2012). Briefly, the posterior 1/3 of the eye is clamped, sealed, and pressurized. Next, the eye is preconditioned, and then pressurized from 7 mmHg to 45 mmHg using an automated system with computer feedback control, while scleral surface displacements are measured by a laser speckle interferometer. This device measures a light interference distribution that is used to reconstruct the surface displacement field in three dimensions with nanometer-scale precision. These displacements were processed as described in Fazio, Bruno, Reynaud, Poggialini and Downs (2012) to compute the 3D strain tensor, a matrix summarizing the displacement in the meridional, circumferential, and radial directions, continuously around the outer scleral surface. The leading eigenvalue of the strain tensor, called the maximum principal strain (MPS), was computed on a grid of scleral locations for 120 circumferential locations and 120 meridional locations , where corresponds to the ONH. This yields MPS functions defined on a grid of 14,400 points on the scleral surface that comprises a partial spherical domain.
Using this custom instrumentation, Fazio et al. (2014) conducted a study to investigate age-related changes in the scleral surface strain. They obtained twenty pairs of eyes from normal human donors in the Lions Eye Bank of Oregon in Portland, OR and the Alabama Eye Bank in Birmingham, AL. For each subject, the MPS measurements were obtained as described above at nine different levels of IOP (7, 10, 15, 20, 25, 30, 35, 40, and 45 mmHg) for both left and right eyes. The data for both eyes from one subject failed a quality control check, so was excluded from analysis, as did one of the eyes from four other subjects. Thus, the data we analyzed consisted of 34 eyes from 19 subjects. With 14,400 measurements for each of 9 IOP levels 34 eyes, this data set contained over 4.5 million measurements. Let be the MPS for eye for subject under IOP level at scleral location indexed by , which on the sampling grid can be written as a vector of length . The primary goals are to study MPS, assessing how it varies around the scleral surface, across IOP, and with age. The hypothesis is that MPS is greater near the ONH, which could confer a protective effect, and that MPS tends to decrease with age, which could contribute to increased stress on the ONH thus conferring increased glaucoma risk.
5.2 Model Specification
Basis Transform: Various criteria can be considered when choosing which basis to use within the BayesFMM framework, including sparse representation, fast calculation, richness for representing the functional parameters at the various levels of the models, ability to capture the key visual features of the observed functions, and flexibility for representing the intrafunctional correlation in the data. Multiresolution bases like wavelets have advantages for many of these considerations, so we constructed a custom rectangular wavelet basis defined on the cylindrical spherical projection of the partial scleral space , which is a tensor transform computed by successively applying 1D wavelet transforms to the meridional and circumferential directions.
Tensor Wavelets for Scleral Space:. Specifically, we constructed as a tensor wavelet, , with meriodonal wavelet
being a db3 wavelet basis with three vanishing moments, reflection boundary condition, 5 levels of decomposition and circumferential waveletbeing a db3 wavelet with three vanishing moments, 5 levels of decomposition and periodic boundary conditions since its domain is circular, covering the entire circumferential space. This transform yielded a basis . While single-indexed here for simplicity of presentation, these basis coefficients can be written as multi-indexed by circumferential scale , meriodonal scale , circumferential locations and meriodonal locations with . The levels and correspond to the father wavelet coefficients at the lowest level of decomposition, and the other and index the corresponding mother wavelets at increasing levels of scale. With being the observed function for subject , eye , and IOP on the scleral surface sampling grid of size written in vector form, this basis representation can be written as , where is a basis matrix with elements and is a vector of cor