Gut motility describes the coordinated pattern of contractions and relaxations of the longitudinal and circular smooth muscle layers of the gut wall. This motility plays a central role in normal digestive health, mixing and propelling gut content in a controlled fashion. Motility disorders in children and adults cause significant personal, societal and financial burdens, costing the healthcare systems many billions of dollars per year . In order to improve treatment options, we need to gain insight into the changes in gut dysmotility associated with a disorder. For example, in the human colon, large propulsive propagating contractions have been shown to be absent or diminished in patients with severe constipation . In all regions of the gut, muscle contractions can be detected using intraluminal manometry. This procedure involves placement, through the nose or anus, of a flexible, small diameter catheter, containing multiple sensors. In recent years the number of sensors on these catheters has increased dramatically. For colonic manometry recordings, these high-resolution manometry catheters have allowed for a far greater insight into the nature of normal  and abnormal  contractile motor patterns. Analysis of these data is difficult. Manometric recordings consist of peaks of pressure which reflect contractile activity of the intestinal muscle. While, visual analysis can easily identify patterns of motor activity, the process is very time consuming, unstandardised, subject to personal bias, and in some instances the sheer volume of pressure peaks, occurring at multiple frequencies across many sensors can be overwhelming.
An automated approach to overcome these issues is needed. In this paper, a two-stage process to achieve this is presented. The first stage involves computing a frequency spectrum of the spatiotemporal pressure recordings with the wavelet transform. The second stage compares the spectra via a functional mixed-effects model to reveal where differences over frequencies may lie in response to various known predictors (e.g. response to a meal, age, colonic region). Although we focus on colonic manometry, the two stages are applicable to other kinds of recorded phenomena of a similar quasiperiodic nature. The second stage is also widely applicable to data where each observation can be represented as a function over a shared gridded domain.
The continuous wavelet transform  converts a signal from the time domain to the time-frequency111The transformation is actually to the time-scale domain, where an appropriate conversion from scale to frequency can be chosen. domain, such that for any point in time we get a distribution of frequencies (wavelet coefficients) occurring in the signal at that time (Fig 1b).
The application of the wavelet transform for the analysis of acquired data have been used in many disciplines, from the recording of pressure signals in the human rectum  to analysis of black hole gravitational waves . In particular, the wavelet transform has many generally applicable expositions can be found in the geophysical sciences [8, 9, 10, 11, 12]
. In this paper, we focus on the time-averaged wavelet coefficients, including 2D histograms over frequencies and phase-differences between pairs of signals to get an estimate of propagation of quasiperiodic pressure signals. This is similar to the work done on the solar atmosphere using the FFT and wavelets .
For statistical analysis, we use function-on-scalar rather than the more conventional scalar-on-scalar regression. Scalar-on-scalar regression covers the commonly encountered regression such as the two-sample t-test, ANOVA, or generalised linear model. Observations from this framework consist of a scalar (single number) response, and a set of scalar predictors encoding the experimental conditions (e.g. response to stimulus) and individual characteristics of each subject being recorded (e.g. age or sex). Function-on-scalar regression extends this so that responses are not scalars but functions, such as frequency spectra. See for a review on functional regression.
There are many previous publications on functional data analysis, with one of the first text books on the topic published in 1997 . Functions in mixed-effects models are often represented by sums of basis functions, such as splines [17, 18, 19], sines and cosines [20, 21], or discrete wavelet bases [22, 23, 24, 25, 26]. Non-parametric forms such as Gaussian processes (GP) are more expressive but computationally expensive. GPs have seen a slower adoption in functional mixed-effects models, with simpler designs such as a single function , fixed-effects models , or simple hierarchical models .
When considering a single function model, authors have explored more complex GP structures such as heteroscedasticity and other non-stationarities. Goldberget al.  and Titsias and L’azaro-Gredilla 
presented a GP model with a GP-based input-dependent noise variance. Tolvanenet al.  presented a GP with a GP-based input-dependent signal and noise variance. They noted that input-dependent lengthscale and signal variance are underidentifiable, and opted for an input-dependent signal variance. Despite this underidentifiability, Heinonen et al.  successfully fit a GP with a squared-exponential kernel function with GP-based input-dependent signal variance, noise variance, and lengthscale.
Here we present a heteroscedastic Gaussian Process mixed-effect model with Gaussian Process residuals over a common grid. A substantial performance improvement is achieved on a 2D grid by exploiting separable kernels via Kronecker factorisation, inspired by . The model is conveniently written in the Stan  probabilistic programming language.
We also present a novel amplitude-weighted coherence summary of the cross-wavelet transform useful for focusing on evident activity by effectively down-weighting periods of quiescence, when little to no contractile activity is present.
This paper is organised as follows. The wavelet transform is summarised in section II, including the calculation of global wavelet power spectrum used for inference. Section III describes how the wavelet spectra from two signals can be combined to calculate the wavelet cross-spectrum and the associated coherence, also outlining the calculation of histograms over phase-differences from the power cross-spectra and coherence. Section IV presents a method for statistically modelling the wavelet results as a heteroscedastic functional mixed-effects model on the basis of Gaussian processes, with inference conducted by Hamiltonian Monte Carlo (HMC) using the Stan software ecosystem . We apply our method to signals recorded with colonic manometry in 11 healthy volunteers, presenting the results in section V, and conclude with a discussion in section VI.
Ii Wavelet Transform
where is an admissable wavelet function, and the
superscript represents the complex conjugate. An admissible wavelet function is one which has zero mean and its Fourier transform is continuously differentiable, with an extra desirable property that it be localised in both time and frequency. Intuitively, measures the variation of within a neighbourhood at of size proportional to .
In practice, we choose from finite set of logarithmically-spaced scales
, specify the wavelet basis function in the frequency domain, and perform the convolution in (1) via FFT utilising the convolution theorem with
where , is frequency-domain wavelet function, is the frequency-domain signal, and are the Fourier and inverse Fourier transforms, and represents the frequency-domain locations in radians per second.
To map from scales (seconds) to frequencies (Hz) we use “Synchrosqueezing” . Synchrosqueezing redistributes the wavelet coefficients based on the first time-derivative of the phase (also known as “instantaneous frequency”). For a given set of equally-and-logarithmically spaced positive frequency bins with ordered edges , where is the set of half-open intervals characterising each bin’s domain, synchrosqueezing can be described as
where represents the time-differentiable “unwrapped-in-time” phase in radians with the complex argument (or angle) denoted by the parentheses-less function with domain-codomain . The function returns the interval from that contains , and is the indicator function. The equally-and-logarithmically spaced condition on the bins can be satisfied with a constant , where for all . An example is shown in Fig. 1b.
Let be an element of . Since and the equality clearly holds, then we only need to consider a single arbitrary to represent bin . We will use the logarithmic-centre of each bin, and define the set of bin centres as
Switching to discrete-time representation with samples recorded at times we can view the wavelet spectrum as . The time-average of the squared amplitudes produces the global wavelet power spectrum
An example is shown in Fig. 1c.
Iii Cross-Wavelet Transform
The cross-wavelet transform combines two wavelet spectra with the complex-conjugated product
where and are the synchrosqueezed wavelet transforms of the two signals labelled and . The combined subscript denotes the cross-wavelet transform between the two signals.
A global wavelet power cross-spectrum could be computed in the same way for as shown for in equation (8). However, this discards the useful phase information contained in . The effect of the complex-conjugated product is that the resulting phase represents the difference in phase between the two signals. For each frequency, computing a squared-amplitude-weighted histogram of the phase-differences yields a 2D histogram in the frequency-phase domain, analogous to the global wavelet power spectrum but stratified by phase-differences.
Given a set of equally-spaced bins with ordered edges representing phase-differences, where is the set of half-open intervals characterising each bin’s domain, we define the 2D histogram of frequencies and phase-differences by
where returns the interval from that contains , and is the set of all the time samples such that is in the bin containing . To ensure all phases are covered and . Let be an element of . Since and the equality clearly holds, then we only need to consider a single arbitrary to represent bin . We will use the centre of each bin, defining the set of bin centres as
If pairs of sensors are spaced sufficiently close together in the environment being recorded, then the cross-wavelet transform between sensors in such a pair allows us to measure propagating quasiperiodic activity. The sign of the phase-difference determines the direction of propagation. An example is shown in Fig. 1d, where the dark feature in the upper half of the panel and slightly to the left of the dotted centre line indicates a strong tendency of the pressure waves in Fig. 1a to move bottom-to-top, i.e. in an oral direction. The value of the phase-difference (rad) at the frequency of interest (Hz) and the separation between the pair of sensors (cm) can be used to determine the apparent velocity of propagation (cm/s) with the simple formula
For quasiperiodic pressure signals in these data, a more appropriate measure of propagation may be “pace” which is the inverse velocity (s/cm), where synchronous events (or phase-locking) between the two signals have a more robust-for-modelling pace of 0, rather than a velocity at .
If the distance between pairs of sensors is too large in relation to the propagation speed and frequency of the phenomena being recorded, then aliasing occurs whenever events are out of step with each other by more than half a cycle. This causes the phase-difference to misrepresent the true direction of propagation and its apparent velocity. In this case coherence can be used as a measure of coordination between pairs of sensors, even if we are forfeiting a measure of velocity. Coherence is a value between 0 and 1 that measures the coefficient-of-determination between two time series as a function of time and frequency .
To calculate the coherence, the method in  uses the power cross-spectrum divided by each of the individual power spectra and . However, this is always identical to 1, and so  suggest smoothing in time and scale the cross-spectrum and the individual power spectra before calculating the coherence. However, since we are using the synchronsqueezed coefficients and , then we smooth only in time and not scale, because the frequency-based redistribution of coefficients makes smoothing in scale no longer relevant. Here, we measure coherence as
where the angle brackets denote smoothing-in-time, allowing for an -dependent smoothing width. The smoothing width is somewhat arbitrary , but one can still tailor it to the wavelet function and perhaps the phenomena being recorded. For the examples presented in this paper we use Gaussian smoothing with a width parameter of since we’re using a Morlet wavelet function with an intrinsic frequency of radians, which corresponds to approximately 4 oscillations occurring within about 95% of the wavelet’s Gaussian envelope.
It is clear that (15) is no greater than 1 due to Jensen’s inequality, and no less than 0 due to the squared modulus. A shorter smoothing width will tend to produce higher coherence since the region over which the signal influences the coherence measurement is shorter. In the limit of no smoothing, substituting (9) into (15) yields 1. In the limit of total smoothing or averaging over the entire time range, and assuming that is zero-mean random processes, then (15) will tend to 0.
To obtain a global measure of coherence, we use the logit-transformed squared-amplitude-weighted average over time
where calculates the -transformed average of weighted by , over the domain . A centre dot in a function denotes the variable of a curried function, that is, for some function we have . Utilising , equation (10) could be written
where is the identity function.
Similarly to in equation (10) we can calculate the histogram of over phases, after a logit transformation and weighting by the power cross-spectrum , with
which is only meaningful in the absence of aliasing where sensors are sufficiently close together, such that the phase domain correctly reflects the direction of propagation.
When numerically calculating the coherence, we find that some values may be equal to 1 or 0, which go to under the logit transformation, contaminating the sum in (17) for all other locations in . To circumvent this problem we clip the coherence to a range where the maximum is practically equivalent to 1 and minimum practically equivalent to 0. For the examples presented in this paper we clipped to a range of .
For each unit of statistical data, we obtain from the wavelet analysis a 1D curve or , or a 2D surface or . Such a curve or surface is considered to be a response under the influence of a set of predictors which can be any number of categorical or numerical variables specified by the experimental design. We want to measure and compare the effects of the given predictors.
An independent regression model could be fit for each location in either the frequency or frequency-phase domains. However, performing an independent fit at each location would require a multiple-comparison adjustment, and would fail to account for correlations between locations, effectively weakening the power of the analysis.
Instead, we capture correlations between locations by treating the response curves and surfaces as individual functions rather than simply collections of independent points. We model these functions as samples from Gaussian processes, which allow us to specify a formula for correlation between locations, without needing to specify a formula for the shape of the functions themselves.
A Gaussian process (GP) is a probability distribution with an infinite number of random variables, such that any finite set of variables form a multivariate Gaussian distribution. This is achieved by specifying a covariance kernel function, which when given a finite set of locations allows us to build an covariance matrix with elements . We have only finite data, and so the kernel function is evaluated only at the available data locations when fitting the GP. However, we can inspect the GP at any number of arbitrary locations in the kernel’s domain, hence the infinite nature of the model as a step beyond a multivariate Gaussian. An analogy is fitting a simple regression line. The line is fit only to a finite set of data, but once we have an intercept and slope we can define a function , where y-locations can be calculated for any arbitrary set of x-locations, not just those for which we have data. See  for a text book introduction to GPs.
The latent GP function-on-scalar mixed-effect model we use can be written in the form
where represents the Gaussian process distribution, is a kernel function describing the structured -standardized noise covariance, represents unstructured -standardized noise variance, and is the response function for observation . The responses are based on the transformed amplitude, , or coherence, . The intuition behind the -standardized noise co/variance can be seen by rearranging the terms in (19) and (20) to
which facilitates efficient inference by not requiring the structured residuals on the right-hand-side of (23) to be sampled, nor requiring a matrix inversion per observation. When evaluating the likelihood specified by (23), for the 1D case a simple Cholesky decomposition is sufficient, but for the 2D case an eigendecomposition is needed to separate the kernel functions from the unstructured noise (see https://github.com/lwiklendt/gp_kron_stan for a Stan model source code example).
In the mean specified by (21), is a design matrix of population-level predictors (a.k.a. fixed-effects) with
representing the row vector of predictors pertaining to observation. is a vector of iid latent GPs representing the population-level effects. is a design matrix of group-level predictors (a.k.a. random-effects). is a vector of potentially correlated latent GPs representing the group-level effects. Depending on the experimental design, an optional offset term is included in (21) which may be either set to the mean of all as a way of centring the data, inferred to include a measure of variability in the centring, or given a different value per observation if some measure of exposure needs to be incorporated that would not otherwise fit as its own predictor in or .
Analogous to the predictors and for the mean, the matrices and
are respectively the population-level and group-level predictors for the log standard deviation (22), with corresponding effects and . An explicit offset term is missing here since such an offset is implicitly handled by the scale of .
Each GP function in each vector of population-effects is given an iid prior
However, for the vectors of group-effects functions we include correlations between functions via multivariate or multi-output GPs
where and are covariance matrices dependent on the structure of the and design matrices. These matrices will generally be block-sparse, facilitating efficient computation.
The response functions , and the design matrices , , , and are the supplied “input” data. The vectors of functions , , , , and hyperparameters, are to be estimated and correspond to “outputs” of the inference. The structure of the design matrices depends on the experimental design, and we find it easiest to derive the design matrices (also known as “model matrices”) based on formula notation as specified in section 2 of . We provide an example application in section V using the formulae (37) and (38).
We are interested in modelling amplitudes or coherences that were calculated using the wavelet transform as described in sections II and III. To fit amplitudes or coherences over frequencies, is a scalar that represents frequencies. To fit over frequencies and phase-differences, is a 2D point that represents frequencies in one dimension and phase-differences in the other.
Iv-B Kernel Functions
The form and parameters of the kernel functions depend on whether the response functions are 1D or 2D. There are many potential kernels to choose from, and they can even be built up from smaller kernels , but for the sake of brevity we will limit our exposition to one concrete kernel function for each type of domain. For the case of 1D curves over frequencies we use a log-space squared-exponential kernel
with specifying the lengthscale of the correlation based on the distance between any two frequencies and . At a distance of 0 we have equal frequencies , where the correlation is 1 and covariance is . As the distance approaches the correlation and covariance approach 0. For the case of 2D surfaces over frequences and phase-differences we use a product of the log-space squared-exponential kernel and a periodic kernel
where and specify the log-frequency and phase-difference lengthscales. When the difference in phase-differences and is either 0 or , or any integer multiple of , then the phase-difference component of the kernel will be 1, identifying the locations and .
For equations (24) and (25), represents the kernel function used in constructing a covariance matrix, but for equations (26) and (27) is a kernel function used in constructing a correlation matrix by setting , since including a free parameter for variance in would make the model non-identifiable due to the variance parameters already defined in and .
The kernel in (29) is separable, such that we can write it as
where with abuse of notation we are identifying kernel functions based on their argument symbols, such that and are different functions, with defined in (28) and defined in (31). Since we can factorise the 2D kernel (30), we can create a covariance matrix using the Kronecker product of the individual covariance matrices built from kernels (28) and (31)
The Kronecker factorisation of the kernel matrices also allows for a substantial speed up in the numerical calculation of the Cholesky and eigen decompositions of the covariance matrices  used in inference.
Prior distributions for hyperparameters and are experiment dependent, and will in general depend on the scale of the data. For the application presented in section V, each subscript (omitted for brevity) is treated independently, unless otherwise specified. We used while ensuring , for the correlation between and we used , and for a half-Student-
distribution with 3 degrees of freedom.
A coarse grid was chosen for the functional domain so that posterior sampling could complete within a reasonable time. The grid can be refined relatively quickly after the expensive sampling step. Rather than the naïve linear or cubic interpolation, we can use GP prediction such that the covariance between locations is faithfully preserved in the refinement.
Given a vector of grid coordinates , a vector of refined coordinates , a vector of function values corresponding to , and the kernel function , then we can produce a vector of refined function values at with
where is the covariance matrix obtained by applying to the coordinates in , and is the matrix given by the covariances obtained by applying to and .
Note, for the 2D case we can take advantage of
We use the Hamiltonian Monte Carlo sampler from the Stan package to obtain a posterior distribution of GPs which can be inspected to detect where and how locations may differ between various categorical predictors such as anatomical regions, treatment periods, patient types, and between various numerical predictors such as age, weight, caloric content of a meal, etc.
We applied the method to colonic manometry data recorded from the descending and sigmoid colon of 11 healthy volunteers during 1hr preprandial and postprandial periods. Data from these studies have been published previously . Using formula notation
predictor is a categorical variable with two levels, one for each of the recorded regions of the colon: descending and sigmoid. The
predictor is a binary variable indicating whether a recording was obtained during the preprandial “0” or postprandial “1” state, corresponding to a meal effect. The categorical variableidentifies the individual subject of a recording.
The predictor in (38) is a real-valued standardised count of the number of sensors (or channels) in the recording, which varies per subject per region. When computing weighted-averages over time as specified in sections II and III, we average not only over time but over both time and channels by effectively flattening the wavelet results into a single channel of length , where is the number of channels and is the number of time samples. Fewer channels are expected to result in a greater variation in the global averages, which is why we included it as a confounding factor of the signal variance. We set with the formula (38) since we don’t have repeated measurements, and so a within-subject variation is poorly identified.
Four types of responses were analysed, given by the 1D and 2D amplitudes from equations (8) and (10), and the 1D and 2D coherences from equations (16) and (18). The amplitudes were log-transformed and coherences logit-transformed to obtain the ’s in model (19-22). For the 1D responses 29 frequency-bins were used, and for the 2D responses 15 frequency-bins and 16 phase-bins were used. After sampling from the posterior, 200 frequency bins and 201 phase bins were used in refinement via GP interpolation (35), with results plotted in Fig. 2 for 1D responses and Fig. 3 for 2D responses.
For each response type, the Hamiltonian Monte Carlo run consisted of 200 warm-up iterations and 500 sampling iterations over 4 randomly-initialised chains resulting in 2000 samples from the posterior distribution. We used an adapt-delta of 0.95. Diagnostics showed no divergent transitions, a top tree depth of 9, and visual inspection of trace plots showed good convergence that was validated by an 222Here refers to the convergence diagnostic in Stan, and not our coherence measure from equations (16) and (18).
. Data appeared consistent with the posterior predictive distribution. On a 2013 MacBook Pro with 2.7GHz Core i7 running macOS Mojave with 16GB 1600MHz RAM using PyStan333Stan Development Team. 2018. PyStan: the Python interface to Stan, Version 2.18. http://mc-stan.org with 4 parallel CPU cores (1 per chain), each of the 1D response types completed sampling in minutes, and each of the 2D response types completed in hours. The computation process from raw pressure recording to time-averaged wavelet spectra (Fig. 1) for each of the 42 individual observations took seconds.
functions. Due to the log and logit transforms of the responses, the effects are logarithmically additive, which are represented as ratios. We plot all 2000 posterior samples as overlaid thin white lines, with 95% “credible intervals” outlined by white dotted lines. A substantial meal effect is evident in the amplitude responses (i) where the posterior curves in cells (c) and (f) are clearly above a ratio of 1, with greatest effect around the 3-4 cycles-per-minute (cpm) frequency. In (ii), although there is no such clear meal effect, we can see that in the sigmoid region the meal effect (f) around 3-4 cpm is increasing compared to the decreasing effect adjacent at around 1cpm and above 8cpm. We also notice the difference in meal effect between the descending and sigmoid regions (j) as an increase in coherence between 3-10cpm. An uptick near 16cpm in (a, b, d, e) is likely due to respiration artefact.
We have presented a method for analysing spatiotemporal manometry data by computing various time-averaged spectra and using them as responses in a functional mixed-effects model, inferred via HMC. This has allowed us to identify and contrast features in the spectra between various predictors of each spectrum.
Although the statistical model presented was developed to analyse quasiperiodic spatiotemporal manometry data, the model can be used (with appropriate choice of kernel function) to practically any kind of functional data that can be represented on a shared grid.
For the statistical analysis, we are not limited to testing for differences across predictors within the same locations. An example of this is already shown in Fig. 3 where the black-and-white hatched lines encircle regions where differences are inspected not between predictors but between phase locations within the same set of predictors. Since HMC produces samples from a distribution over all parameters given the data , that is , we are free to interrogate this distribution by applying any well-posed444Well-posed questions are in the form of an expectation over a function applied to the random variable , with distribution . This is achieved by summing over the samples from the posterior distribution, formally, . This is not as strong of a limitation as may first appear. For example, could include the function over a subset of parameters, allowing us to obtain the peak amplitude or coherence as a distribution over frequencies, and to potentially test how this peak frequency depends on predictors. question of interest to the samples.
The method presented in  was a first step in automated analysis of colonic propagation, but reduced the analysis to a single indicator that could discern neither speed of propagation nor frequency of activity. The results herein are confirmed by our previous manual analysis in . In that publication we demonstrated significant increased in the post prandial colonic cyclic motor pattern. That motor pattern consisted primarily of propagating pressure waves at a 3-4cpm frequency that travelled mostly in a retrograde direction. In our previous publication we also showed that the majority (76%) of the cyclic motor patterns was identified in the sigmoid colon. With our novel technique, in this paper these physiological features are all shown. The manual analysis in our previous paper  took several weeks to perform. To obtain similar results with the developed analysis in this paper took under 7 hours. Thus the detailed analysis provided by our developed technique is orders of magnitude beyond the methods currently available in both detail, speed of analysis, and manual labor saved. A postprandial increase in colonic activity was also shown in  using FFT and  using wavelets but without the rigorous statistical analysis the current paper provides.
Caveats and Limitations
When applying synchrosqueezing, we sometimes notice frequency-edge artefacts for activity close the the highest or lowest frequencies in . This may be due to an insufficient source of instantaneous frequency to be redistributed near the edge frequencies with equation (3). We suggest to choose sufficiently high and low scales to capture activity at least one octave above and below the highest and lowest frequencies to be examined, as long as the data is of sufficient sampling frequency and duration to permit such a margin.
In the application presented here, our data was recorded from a catheter with equally-spaced sensors. For unequally-spaced sensors, sensor pairs at different separations showing the same phase-difference will not represent the same propagation speed. Either a per-sensor-pair adjustment to the phase-difference should be made, or the inverse-velocity (pace) should be considered as a substitute for the phase-difference. This will require choosing a more appropriate phase-domain kernel since the phase-dependent result will no longer neatly fall into a periodic kernel with a constant period over all frequencies.
There is a clear use of amplitude-weighted coherence, equations (16) and (17), for cases where sensors are too far apart to obtain a meaningful measure of phase-difference. However, the amplitude-weighted coherence is also useful even when sensors are sufficiently close, if the experiment requires periods of quiescence to be ignored. We may want to focus only on periods of higher activity (higher amplitudes) if quiescent periods (lower amplitudes) occur at random periods in the signal and/or have coherences that are unrepresentative or independent of any predictors.
If the amplitude-weighted coherence, (16) and (18), is applied to a weak signal where a momentary but large amplitude is present, then that momentary event will dominate the rest of the recording. To account for the fact that substantially fewer time samples influence the calculation of the coherence in such cases, a predictor for the signal variance (38) can be added, which quantifies the proportion of the signal from which coherence is effectively retained due to amplitude-weighting.
Formula (23) allows for GP residuals. It is unclear how one could extend the responses to other distributions, such as Poisson or Bernoulli, while retaining residual correlation. Perhaps one would need to either sample latent structured residuals, or abandon the residual correlation altogether.
Data where some grid points have missing values is not handled. Imputing the missing values could be a solution, achieved by sampling from the residual GP at the missing locations. This would require computing the mean and covariance of the residual GP at the missing values per observation (Algorithm 2.1 in). Had we not used structured residuals, such imputation would be trivial to implement and computationally inexpensive.
-  A. F. Peery, S. D. Crockett, C. C. Murphy, J. L. Lund, E. S. Dellon, J. L. Williams, E. T. Jensen, N. J. Shaheen, A. S. Barritt, S. R. Lieber et al., “Burden and cost of gastrointestinal, liver, and pancreatic diseases in the United States: update 2018,” Gastroenterology, 2018.
-  G. Bassotti, M. Gaburri, B. Imbimbo, L. Rossi, F. Farroni, M. Pelli, and A. Morelli, “Colonic mass movements in idiopathic chronic constipation.” Gut, vol. 29, no. 9, pp. 1173–1179, 1988.
-  P. G. Dinning, L. Wiklendt, L. Maslen, I. Gibbins, V. Patton, J. W. Arkwright, D. Z. Lubowski, G. O’Grady, P. A. Bampton, S. J. Brookes, and M. Costa, “Quantification of in vivo colonic motor patterns in healthy humans before and after a meal revealed by high-resolution fiber-optic manometry,” Neurogastroenterology & Motility, pp. n/a–n/a, 2014. [Online]. Available: http://dx.doi.org/10.1111/nmo.12408
-  P. Dinning, L. Wiklendt, L. Maslen, V. Patton, H. Lewis, J. Arkwright, D. Wattchow, D. Lubowski, M. Costa, and P. Bampton, “Colonic motor abnormalities in slow transit constipation defined by high resolution, fibre-optic manometry,” Neurogastroenterology & Motility, vol. 27, no. 3, pp. 379–388, 2015.
-  S. Mallat, A wavelet tour of signal processing: the sparse way. Academic press, 2008.
E. Jiang, P. Zan, S. Zhang, X. Zhu, and Y. Shao, “Optimal wavelet packet decomposition for rectal pressure signal feature extraction,” inAdvanced Computational Intelligence (ICACI), 2012 IEEE Fifth International Conference on. IEEE, 2012, pp. 1204–1208.
-  B. P. Abbott, R. Abbott, T. Abbott, M. Abernathy, F. Acernese, K. Ackley, C. Adams, T. Adams, P. Addesso, R. Adhikari et al., “Observation of gravitational waves from a binary black hole merger,” Physical review letters, vol. 116, no. 6, p. 061102, 2016.
-  P. C. Liu, “Wavelet spectrum analysis and ocean wind waves,” in Wavelet Analysis and Its Applications. Elsevier, 1994, vol. 4, pp. 151–166.
-  K.-M. Lau and H. Weng, “Climate signal detection using wavelet transform: How to make a time series sing,” Bulletin of the American meteorological society, vol. 76, no. 12, pp. 2391–2402, 1995.
-  C. Torrence and G. P. Compo, “A practical guide to wavelet analysis,” Bulletin of the American Meteorological society, vol. 79, no. 1, pp. 61–78, 1998.
-  C. Torrence and P. J. Webster, “Interdecadal changes in the ENSO-monsoon system,” Journal of Climate, vol. 12, no. 8, pp. 2679–2690, 1999.
-  A. Grinsted, J. C. Moore, and S. Jevrejeva, “Application of the cross wavelet transform and wavelet coherence to geophysical time series,” Nonlinear processes in geophysics, vol. 11, no. 5/6, pp. 561–566, 2004.
-  B. Lites and E. Chipman, “The vertical propagation of waves in the solar atmosphere. I-Observations of phase delays,” The Astrophysical Journal, vol. 231, pp. 570–588, 1979.
-  A. Andić, “Propagation of high frequency waves in the quiet solar atmosphere,” Serbian Astronomical Journal, no. 177, pp. 1–14, 2008.
-  J. S. Morris, “Functional regression,” Annual Review of Statistics and Its Application, vol. 2, pp. 321–359, 2015.
-  J. O. Ramsay and B. W. Silverman, Functional data analysis. Springer New York, 1997.
-  W. Guo, “Functional mixed effects models,” Biometrics, vol. 58, no. 1, pp. 121–128, 2002.
-  J. Shi, B. Wang, E. Will, and R. West, “Mixed-effects Gaussian process functional regression models with application to dose–response curve prediction,” Statistics in medicine, vol. 31, no. 26, pp. 3165–3177, 2012.
-  J. Yang, D. D. Cox, J. S. Lee, P. Ren, and T. Choi, “Efficient Bayesian hierarchical functional data analysis with basis function approximations using Gaussian–Wishart processes,” Biometrics, vol. 73, no. 4, pp. 1082–1091, 2017.
-  P. J. Diggle and I. Al Wasel, “Spectral analysis of replicated biomedical time series,” Journal of the Royal Statistical Society: Series C (Applied Statistics), vol. 46, no. 1, pp. 31–71, 1997.
-  M. Bose, J. S. Hodges, and S. Banerjee, “Toward a diagnostic toolkit for linear models with Gaussian-process distributed random effects,” Biometrics, 2018.
-  J. S. Morris, M. Vannucci, P. J. Brown, and R. J. Carroll, “Wavelet-based nonparametric modeling of hierarchical functions in colon carcinogenesis,” Journal of the American Statistical Association, vol. 98, no. 463, pp. 573–583, 2003.
-  J. S. Morris and R. J. Carroll, “Wavelet-based functional mixed models,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 68, no. 2, pp. 179–199, 2006.
-  J. S. Morris, P. J. Brown, R. C. Herrick, K. A. Baggerly, and K. R. Coombes, “Bayesian Analysis of Mass Spectrometry Proteomic Data Using Wavelet-Based Functional Mixed Models,” Biometrics, vol. 64, no. 2, pp. 479–489, 2008.
-  J. S. Morris, V. Baladandayuthapani, R. C. Herrick, P. Sanna, and H. Gutstein, “Automated analysis of quantitative image data using isomorphic functional mixed models, with application to proteomics data,” The annals of applied statistics, vol. 5, no. 2A, p. 894, 2011.
-  J. G. Martinez, K. M. Bohn, R. J. Carroll, and J. S. Morris, “A study of Mexican free-tailed bat chirp syllables: Bayesian functional mixed models for nonstationary acoustic time series,” Journal of the American Statistical Association, vol. 108, no. 502, pp. 514–526, 2013.
-  P. W. Goldberg, C. K. Williams, and C. M. Bishop, “Regression with input-dependent noise: A Gaussian process treatment,” Advances in neural information processing systems, pp. 493–499, 1998.
-  J. Q. Shi, R. Murray-Smith, and D. Titterington, “Hierarchical Gaussian process mixtures for regression,” Statistics and Computing, vol. 15, no. 1, pp. 31–41, 2005.
-  J. Yang, H. Zhu, T. Choi, D. D. Cox et al., “Smoothing and Mean–Covariance Estimation of Functional Data with a Bayesian Hierarchical Model,” Bayesian Analysis, vol. 11, no. 3, pp. 649–670, 2016.
M. K. Titsias and M. Lázaro-Gredilla, “Variational heteroscedastic
Gaussian process regression,” in
Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 841–848.
-  V. Tolvanen, P. Jylänki, and A. Vehtari, “Expectation propagation for nonstationary heteroscedastic Gaussian process regression,” in Machine Learning for Signal Processing (MLSP), 2014 IEEE International Workshop on. IEEE, 2014, pp. 1–6.
-  M. Heinonen, H. Mannerström, J. Rousu, S. Kaski, and H. Lähdesmäki, “Non-Stationary Gaussian Process Regression with Hamiltonian Monte Carlo,” in Artificial Intelligence and Statistics, 2016, pp. 732–740.
-  S. R. Flaxman, A. Gelman, D. B. Neill, A. J. Smola, A. Vehtari, and A. G. Wilson, “Fast hierarchical Gaussian processes,” 2015. [Online]. Available: http://sethrf.com/files/fast-hierarchical-GPs.pdf
-  B. Carpenter, A. Gelman, M. Hoffman, D. Lee, B. Goodrich, M. Betancourt, M. Brubaker, J. Guo, P. Li, and A. Riddell, “Stan: A Probabilistic Programming Language,” Journal of Statistical Software, Articles, vol. 76, no. 1, pp. 1–32, 2017. [Online]. Available: https://www.jstatsoft.org/v076/i01
-  M. Farge, “Wavelet transforms and their applications to turbulence,” Annual review of fluid mechanics, vol. 24, no. 1, pp. 395–458, 1992.
-  I. Daubechies, J. Lu, and H.-T. Wu, “Synchrosqueezed wavelet transforms: an empirical mode decomposition-like tool,” Applied and computational harmonic analysis, vol. 30, no. 2, pp. 243–261, 2011.
-  C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning. The MIT Press, 2006.
-  D. Bates, M. Mächler, B. Bolker, and S. Walker, “Fitting Linear Mixed-Effects Models Using lme4,” Journal of Statistical Software, vol. 67, no. i01, 2015.
-  D. Duvenaud, “Automatic model construction with Gaussian processes,” Ph.D. dissertation, University of Cambridge, 2014.
-  Y. Saatçi, “Scalable inference for structured Gaussian process models,” Ph.D. dissertation, University of Cambridge, 2012.
-  L. Wiklendt, S. D. Mohammed, S. M. Scott, and P. G. Dinning, “Classification of normal and abnormal colonic motility based on cross-correlations of pancolonic manometry data,” Neurogastroenterology & Motility, vol. 25, no. 3, pp. e215–e223, March 2013.
-  P. Dinning, T. Sia, R. Kumar, R. Mohd Rosli, M. Kyloh, D. Wattchow, L. Wiklendt, S. Brookes, M. Costa, and N. Spencer, “High-resolution colonic motility recordings in vivo compared with ex vivo recordings after colectomy, in patients with slow transit constipation,” Neurogastroenterology & Motility, vol. 28, no. 12, pp. 1824–1835, 2016.