1 Introduction
Gaussian processes (GPs) offer a flexible nonparametric way of modeling unknown functions. While Gaussian process regression and classification are commonly used in problems where the domain of the unknown function is continuous, recent work has seen use of GP models also in mixed domains, where some of the input variables are categorical or discrete and some are continuous. Applications of mixeddomain GPs are found e.g. in Bayesian optimization (GarridoMerchán and HernándezLobato, 2020), computer experiments (Zhang and Notz, 2015; Deng et al., 2017; Roustant et al., 2020; Wang et al., 2021) and longitudinal data analysis (Cheng et al., 2019; Timonen et al., 2021). For example in biomedical applications, the modeled function often depends on categorical covariates, such as treatment vs. no treatment, and accounting for such timevarying effects is essential. Since all commonly used kernel functions (i.e. covariance functions) are defined for either purely continuous or purely categorical input variables, kernels for mixeddomain GPs are typically obtained by combining continuous and categorical kernels through multiplication. Additional modeling flexibility can be obtained by summing the product kernels as has been done in the context of GP modeling for longitudinal data (Cheng et al., 2019; Timonen et al., 2021).
It is well known that exact GP regression has a theoretical complexity of and requires memory, where is the number of observations. This poses a computational problem which in practice renders applications of exact GP regression infeasible for large data. Various scalable approximation approaches for GPs have been proposed (see e.g. (Liu et al., 2020) for a review). However, many popular approaches, such as the inducing point (Snelson and Ghahramani, 2006; Titsias, 2009)
and kernel interpolation
(Wilson and Nickisch, 2015)methods, can only be applied directly if the kernel (i.e. covariance) function is continuous and differentiable. In addition, they typically require the Gaussian observation model, which is not appropriate for modeling for example discrete, categorical or ordinal response variables. See Section
3 for a review of previous methods.In this work, we present a scalable approximation scheme for mixeddomain GPs and additive mixeddomain GPs, where the covariance structure depends on both continuous and categorical variables. We extend the Hilbert space reducedrank approximation
(Solin and Särkkä, 2019) for said additive mixeddomain GPs and, making it applicable to e.g. analysis of large longitudinal data. The approach
scales linearly with respect to data set size

allows a wide variety of different categorical kernels

allows product kernels that consist of any number of continuous and categorical kernels, as well as sums of such products

allows an arbitrary observation model and full Bayesian inference for the model hyperparameters
To our knowledge, there are no existing approaches that satisfy these conditions.
2 Gaussian Processes
2.1 Definition
A Gaussian process (GP) is a collection of random variables, any finite number of which has a multivariate normal distribution
(Rasmussen and Williams, 2006). A function is a GP(1) 
with mean function and kernel (or covariance) function , if for any finite number of inputs
, the vector of function values
follows a multivariate normal distribution with mean vector and covariance matrix with entries . The mean function is commonly the constant zero function , and we have this convention throughout the paper. The kernel function encodes information about the covariance of function values at different points, and therefore affects the model properties crucially.2.2 Bayesian GP regression
In GP regression, the conditional distribution of response variable given covariates is modeled as some parametric distribution , where represents possible parameters of the observation model. The function has a zeromean GP prior with covariance function that has hyperparameters . We focus on Bayesian GP modeling, where in addition to the GP prior for , we have a paramemeter prior distribution for . Given observations our goal is to infer the posterior
(2) 
where . The part
(3) 
is the prior and
(4) 
is the likelihood. This task often has to be done by sampling from using MCMC methods, which requires evaluating the righthand side of Eq. 2 (and possibly its gradient) thousands of times. As the likelihood and parameter prior usually are independent over each parameter and data point, they scale linearly and are not a bottleneck. Instead, computing the GP prior density
(5) 
where the matrix has entries , is a costly operation as evaluating the (log) density of the dimensional multivariate normal distribution has generally complexity (see Suppl. Section 4). Furthermore, the matrix takes memory.
An often exploited fact is that if the observation model (and therefore likelihood) is Gaussian, can be analytically marginalized and only the marginal posterior needs to be sampled. This reduces the MCMC dimension by and likely improves sampling efficiency, but one however needs to evaluate which is again an dimensional multivariate Gaussian. The complexity and memory requirements therefore remain. In this paper, we generally assume an arbitary observation model, and defer the details of the Gaussian observation model until Suppl. Section 5.
2.3 Additive GP regression
In additive GP regression, the modeled function consists of additive components so that , and each component has a GP prior
(6) 
independently from other components. This means that the total GP prior is with
(7) 
Furthermore, for each vector
(8) 
where . The matrix is defined so that its elements are . This means that the prior for is , where .
Bayesian inference with MCMC for additive GP models requires sampling all , meaning that adding one component increases the number of parameters by (plus the possible additional kernel hyperparameters). Moreover, the multivariate normal prior (Eq. 5) needs to be evaluated for each component, adding the computational burden. In the case of Gaussian likelihood, adding more components does not add any multivariate normal density evaluations as it still needs to be done only for . Also the marginal posteriors of each are analytically available (see Suppl. Section 5).
2.4 Mixeddomain kernels for longitudinal data
Longitudinal data is common in biomedical, psychological, social and other studies and consists of multiple measurements of several subjects at multiple time points. In addition to time (often expressed as subject age), other continuous covariates can be measured. Moreover, in addition to subject id, other categorical covariates, such as treatment, sex or country can be available. In the statistical methods literature, such data is commonly modeled using generalized linear mixed effect models (Verbeke and Molenberghs, 2000). In recent work (Quintana et al., 2016; Cheng et al., 2019; Timonen et al., 2021), longitudinal data has been modeled using additive GPs, where, similar to commonly used linear models, each component is a function of at most one categorical and one continuous variable. Each variable is assigned a onedimensional base kernel and for components that contain both a continuous and categorical kernel, the kernel is their product. As the total kernel is composed of the simpler categorical and continuous kernels through multiplication and addition, it has a mixed domain.
These models have the very beneficial property that the effects of individual covariates are interpretable. The marginal posterior distributions of each component be studied to infer the marginal effect of different covariates. As an example, if is just the exponentiated quadratic (EQ) kernel
(9) 
and is age, the component can be interpreted as the shared effect of age. On the other hand, if is the product kernel where
(10) 
is the zerosum (ZS) kernel (Kaufman and Sain, 2010) for a categorical variable that has categories, can be interpreted as the categoryspecific effect of the continuous covariate . This also has the property that the effect sums to zero over categories at all values for (see Timonen et al. (2021) for proof), which helps in separating the category effect from the shared effect, if a model has both.
Sometimes it is required to mask effects that are present for only a subset of the individuals, such as the case individuals when the data also has a control group. Using the kernel language, effect of component can be masked by multiplying by a binary kernel which returns 0 if either or takes a value in any of the masked categories, and 1 otherwise.
3 Related Research
GPs and categorical inputs
A suggested approach to handle GPs with categorical covariates is to use a onehot encoding which turns a variable with
categories into binary variables, of which only one is on at a time, and then apply a continuous kernel for them. GarridoMerchán and HernándezLobato (2020) highlight that the resulting covariance structure is problematic because it does not take into account that only one of the binary variables can be one at a time. This poorly motivated approach might have originated merely from the fact that common GP software have lacked support for categorical kernels. We find it more sensible to define kernels directly on categorical covariates, as that way we can always impose the desired covariance structure.Categoryspecific effects of a continuous covariate can be achieved also by assigning independent GPs for the different categories. This way we have only continuous kernel functions, and can possibly use scalable approaches that are designed for them. This limited approach however cannot define any additional covariance structure between the categories, such as the zerosum constraint (Eq. 10). The ZS kernel is a special case of compound symmetry (CS), and for example Roustant et al. (2020) concluded that a CS covariance structure was more justified than using only indenpendent GPs in their nuclear engineering application.
Chung et al. (2020) developed a deep mixedeffect GP model that facilitates individualspecific effects and scales as , where is the number of individuals and is the number of time points. Zhang et al. (2020) handled categorical inputs by mapping them to a continuous latent space and then using a continuous kernel. While this approach can detect interesting covariance structures, it does not remove need to perform statistical modeling with a predefined covariance structure as in Section 2.4. Another related nonparametric way to model group effects is to use hierarchical generalized additive models (Pedersen et al., 2019), as smoothing splines can be seen as a special case of GP regression (Kimeldorf and Wahba, 1970).
Scalable GP approximations
A number of approximation methods exist that reduce the complexity of GP regression to , where controls the accuracy of the approximation. Popular approaches rely on global sparse approximations (QuiñoneroCandela and Rasmussen, 2005) of the covariance matrix between all pairs of data points, using inducing points. The locations of these inducing points are generally optimized using gradientbased continuous optimization simultaneously with model hyperparameters, which cannot be applied when the domain is not continuous. In Fortuin et al. (2021), the inducingpoint approach was studied in purely discrete domains and Cao et al. (2015) presented an optimization algorithm that alternates between discrete optimization of inducing points, and continuous optimization of the hyperparameters. Disadvantages of this method are that it cannot find inducing points outside of the training data, does not perform full Bayesian inference for the hyperparameters, and assumes a Gaussian observation model.
4 MixedDomain Covariance Function Approximation
4.1 Basic idea
We continue with the notation established in Section 2, and note that denotes a general input that can consist of both continuous and categorical dimensions. We consider approximations that decompose the GP kernel function as
(11) 
where functions have to be designed so that the approximation is accurate but easy to compute. This is useful in GP regression, because we get a lowrank approximate decomposition for the kernel matrix , where is the matrix with elements . Using this approximation, we can write the approximate GP prior using parameters with independent standard normal priors, connected to through the reparametrization , where . Evaluating the prior density has now only cost. After obtaining posterior draws of , we can obtain posterior draws of with cost, which comes from computing the matrix . The likelihood (Eq. 4) can then be evaluated one data point at a time and the total complexity of the approach is only . Furthermore, the memory requirement is reduced from to , since we only need to store and never compute . This is the approach used throughout this paper, and the focus is on how to design the functions for different kernel functions so that the approximation is accurate with .
4.2 Continuous isotropic covariance functions
A continuous stationary covariance function depends only on the difference and can therefore be written as . Such covariance functions can be approximated by methods that utilize the spectral density
(12) 
If the covariance function is isotropic, meaning that it depends only on the Euclidean norm , also is isotropic and can be written as , i.e. as a function of one variable. As shown in (Solin and Särkkä, 2019), an isotropic covariance function can be approximated as
(13) 
where and and are the
first eigenfunctions and eigenvalues of the Dirichlet boundary value problem
(14) 
for a compact set . We see that this approximation has the same form as Eq. 11 with and . The spectral density has a closed form for many kernels, and the domain can be selected so that the eigenvalues and eigenfunctions have one too. Functions are therefore easy to evaluate and the computation strategy described in Section 4.1 can then be used. As an example, when and with , we have
(15) 
and it was proven in (Solin and Särkkä, 2019) that in this case uniformly for any stationary that has a regular enough spectral density. For example for the EQ kernel (Eq. 9) , the spectral density is .
4.3 Kernels for categorical variables
Let us study a kernel , where is a finite set of possible values (categories). We can encode these categories numerically as integers . Because there are only possible input combinations for , and therefore , we can list them in the matrix which has elements . If is symmetric, the symmetric square matrix has the orthogonal eigendecomposition
(16) 
where is the diagonal matrix containing the eigenvalues , on the diagonal and has the eigenvectors as its columns. For each column , we can define function so that . We see that
(17)  
(18) 
meaning that we have written in the form of Eq. 11 with and . Note that this is an exact function decomposition for and not an approximation. The complexity of computing the eigendecomposition is , but in typical applications and this is not a bottleneck. Actually, for example for the ZS kernel and other CS kernels, the eigenvalues have a closed form and the corresponding eigenbasis is known (see Suppl. Section 2). Furthermore, if does not depend on any hyperparameters, the eigendecomposition only needs to be done once before parameter inference. If it is of type where is the only parameter, the decomposition can obviously be done just for which again has no parameters. Evaluating functions is easy as it corresponds to just looking up a value from the matrix .
4.4 Mixeddomain product kernels
We now consider approximating a product kernel with , where for each we have an available decomposition
(19) 
which might be an approximation or an exact decomposition of . The total approximation is
(20)  
(21) 
where . We have now a representation of the product kernel in the form of Eq. 11 with sum terms. Note that since the individual kernels in the product kernel can be both categorical and continuous, Eq. 21 provides a kernel representation for mixeddomain GPs with product kernels. Also note that grows exponentially with .
4.5 Mixeddomain sum kernels
The most general kernels that we consider are of the form , where is the number of product factors in the sum term . If each has a (possibly approximate) decomposition
(22) 
with sum terms, we can approximate with
(23)  
where for each . Now we have a sum representation (Eq. 11) of the kernel with terms.
4.6 Mixed kernels for longitudinal data
In our framework, we consider mixed kernels , where is a mixed space of both continuous and categorical dimensions, consisting of multiplication and addition so that
(24) 
where each is isotropic and depends only one continuous dimension of and each depends only on one categorical dimension of , which has different categories. For each , we use the basis function approximation (Eq. 13) with basis functions and domain , and for each the exact decomposition (Eq. 18). Using Eq. 23, we can write in the format of Eq. 11 with
(25) 
terms. In each term, the function is a product of factors and factors .
As an example, if for each and we use basis functions for all components, then the scalability is where . Further, if each categorical variable has many different values, then the scalability is , where .
5 Results
We demonstrate the scalability and accuracy of the presented approach using experiments with simulated and real data. In all experiments we use the Dynamic HMC algorithm of Stan (version 2.27) (Carpenter et al., 2017), with target acceptance rate set to , for MCMC sampling the parameters of our approximate models^{1}^{1}1Code will be made available at https://github.com/jtimonen/scalablemixeddomainGPs.. All models are fitted by running four independent MCMC chains for 2000 iterations each, discarding the first half of each chain as warmup. In all experiments, we use Student priors with degrees of freedom for the kernel parameters, lognormal priors with mean 0 and scale 1 for kernel lengthscale parameters
. In Experiments 1 and 2 the noise variance parameter
of the Gaussian observation model has an InverseGamma prior with shape parameters and . Priors are on normalized data scale, meaning that continuous variables are standardized to zero mean and unit variance during inference.In all experiments we use the same number of basis functions for each approximate continuous kernel. We use also for all approximate components the same domain scaling factor , which is defined so that with being times the halfrange of the continuous covariate of the approximated kernel (RiutortMayol et al., 2020).
Experiments 12 are run on a modern CentOS 7 computing cluster and Experiment 3 on a laptop computer.
5.1 Experiment 1: Simulation study
In the first experiment we create simulated longitudinal data consisting of categorical variables and , and a continuous variable . We create data with 9 individuals, where individuals with belong to group , individuals with belong to group and individuals belong to group . For each individual 16, we create observations at time points where is drawn uniformly from the interval , and is varied as . For individuals 79, observations are created similarly, with .
We consider an additive GP model with kernels
(26) 
and we simulate a realization of data using , and . We then generate a response variable measurements , where and the realization represents the ground truth signal.
Data from individuals 16 is used in training, while data from individuals 79 is left for testing. Using the training data, we fit an exact and approximate model with the correct covariance structure from Eq. 26 using the Gaussian likelihood model. Exact model is fitted with lgpr (Timonen et al., 2021), which also uses Stan for MCMC. The exact model utilizes the marginalization approach for GPs, as Gaussian observation model is specified.
Figure 1 shows the posterior predictive mean of the exact model and different approximate models with , using . We see that with and the mean predictions are indistinguishable from the exact model. We fit the approximate model using different values of and , and repeat the experiment using different values for . Results in Figure 3 validate empirically that the runtime scales linearly as a function of both and .
We compute the mean log predicive density (MLPD), at test points (see Suppl. Section 3 for details about outofsample prediction and MLPD). Results in Figure 2 show that the MLPD of the approximate model approaches the exact model as grows. It is seen that with small data sizes and small , the predictive performance can actually be better than that of the exact model, possibly because the coarser approximation is a simpler model that generalizes better in this case.
5.2 Experiment 2: Canadian weather data
We analyse data that consists of yearly average temperature measurements in 35 Canadian weather stations (Ramsay and Silverman, 2005). There is a total of data points, which are daily temperatures at the 35 locations, averaged over the years 19601994. We fit an additive GP model , with Gaussian likelihood using the EQ kernel for and the product EQZS kernel for and .
We used domain scaling factor for all components and ran the 4 MCMC chains in parallel using 4 CPU cores. This was repeated with different values of , where is the number of basis functions for each components. Total runtimes for fitting the models were and hours, respectively. The posterior distributions of each model component with are in Figure 4. The posterior predictive distribution for each station separately is visualized in Suppl. Figure 1.
5.3 Experiment 3: US presidential election prediction
In the last example we demonstrate a betabinomial observation model and model the vote share of the Republican Party in each state in US presidential elections. By twoparty vote share we mean proportion of votes cast to the Republican candidate divided by the sum of votes to both the Republican and Democratic candidates^{2}^{2}2Data is from (MIT Election Data and Science Lab, 2017).. Following Trangucci (2017), Washington DC is excluded from the analysis. We use data from the 19762016 elections as training data, meaning that .
We fit an additive GP model , with betabinomial observation model using the EQ kernel for and the product EQZS kernel for and . The observation model is
(27) 
where are the number of votes for the Republican and Democratic parties, respectively, , , and
invlogit
(f+w0). We use a prior for the intercept and a LogNormal(1,1) prior for the parameter.Fitting a model with and on a 2018 MacBook Pro computer (2.3 GHz QuadCore Intel i5 CPU), running the 4 chains in parallel, took approximately 18 minutes. The posterior distributions of each model component are in Figure 5. See also Supplementary Figure 2, where we have also visualized the data from 2020 election to validate that the model predicts well into the future.
6 Conclusion
Gaussian processes offer an attractive framework for specifying flexible models using a kernel language. The computational cost of their exact inference however limits possible applications to small data sets. Our scalable framework opens up a rich class of GP models to be used in large scale applications of various fields of science as the computational complexity is linear with respect to data size. We have presented a scalable approximation scheme for mixeddomain covariance functions, and demonstrated its use in the context of Bayesian GP regression. However, it can also be applied in GP applications where the kernel hyperparameters are optimized using a marginal likelihood criterion.
We recall that we have assumed that the categorical kernels are symmetric, and continuous kernels are stationary. Nonstationary effects can still be modeled by applying a warping on the input first, and then using a stationary kernel (see for example (Cheng et al., 2019)). Another limitation of the approach is that when the number of product terms in a kernel grows, the total number of basis functions required for that component grows exponentially and can become too large. This still leaves us with a large class of mixeddomain GP models that are scalable.
Acknowledgements
We thank Aki Vehtari and Gleb Tikhonov for useful comments on draft versions of this manuscript, and acknowledge the computational resources provided by Aalto ScienceIT, Finland. This work was supported by the Academy of Finland and Bayer Oy.
References
 Efficient optimization for sparse Gaussian process regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (12), pp. 2415–2427. Cited by: §3.
 Stan: a probabilistic programming language. Journal of Statistical Software 76 (1), pp. 1–32. Cited by: §5.
 An additive Gaussian process regression model for interpretable nonparametric analysis of longitudinal data. Nature Communications 10. Cited by: §1, §2.4, §6.

Deep mixed effect model using Gaussian processes: a personalized and reliable prediction for healthcare.
The ThirtyFourth AAAI Conference on Artificial Intelligence (AAAI20)
. Cited by: §3.  Additive Gaussian process for computer models with qualitative and quantitative factors. Technometrics 59 (3), pp. 283–292. Cited by: §1.
 Sparse Gaussian processes on discrete domains. IEEE Access 9 (), pp. 76750–76758. Cited by: §3.
 Dealing with categorical and integervalued variables in Bayesian optimization with Gaussian processes. Neurocomputing 380, pp. 20–35. Cited by: §1, §3.
 Bayesian functional ANOVA modeling using Gaussian process prior distributions. Bayesian Analysis 5 (1), pp. 123–149. Cited by: §2.4.

A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines
. The Annals of Mathematical Statistics 41 (2), pp. 495 – 502. Cited by: §3. 
When Gaussian process meets big data: a review of scalable GPs.
IEEE Transactions on Neural Networks and Learning Systems
31 (11), pp. 4405–4423. Cited by: §1.  U.S. President 1976–2020, V6. External Links: Document Cited by: footnote 2.
 Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ (5). Cited by: §3.
 A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, pp. 1939–1959. Cited by: §3.
 Bayesian nonparametric longitudinal data analysis. Journal of the American Statistical Association 111 (515), pp. 1168–1181. Cited by: §2.4.
 Functional data analysis. 2nd edition, Springer, New York, NY. Cited by: §5.2.

Gaussian Processes for Machine Learning
. MIT Press, Cambridge, Massachusetts. Cited by: §2.1.  Practical hilbert space approximate Bayesian Gaussian processes for probabilistic programming. External Links: 2004.11408 Cited by: §3, §5.
 Group kernels for Gaussian process metamodels with categorical inputs. SIAM/ASA Journal on Uncertainty Quantification 8 (2), pp. 775–806. Cited by: §1, §3.
 Sparse Gaussian processes using pseudoinputs. In Advances in Neural Information Processing Systems, Y. Weiss, B. Schölkopf, and J. Platt (Eds.), Vol. 18. Cited by: §1.
 Hilbert space methods for reducedrank Gaussian process regression. Statistics and Computing. Cited by: §1, §3, §4.2.
 lgpr: an interpretable nonparametric method for inferring covariate effects from longitudinal data. Bioinformatics 37 (13), pp. 1860–1867. Cited by: §1, §2.4, §2.4, §5.1.
 Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, D. van Dyk and M. Welling (Eds.), Proceedings of Machine Learning Research, Vol. 5, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, pp. 567–574. Cited by: §1.
 Hierarchical Gaussian processes in Stan. External Links: Link Cited by: §5.3.
 Linear mixed models for longitudinal data. Springer, New York, NY. Cited by: §2.4.
 Scalable Gaussian processes for datadriven design using big data with categorical factors. arXiv:2106.15356. Cited by: §1.
 Kernel interpolation for scalable structured Gaussian processes (KISSGP). CoRR abs/1503.01057. External Links: Link Cited by: §1.
 A latent variable approach to gaussian process modeling with qualitative and quantitative factors. Technometrics 62 (3), pp. 291–302. Cited by: §3.
 Computer experiments with qualitative and quantitative variables: a review and reexamination. Quality Engineering 27, pp. 2–13. Cited by: §1.
References
 Efficient optimization for sparse Gaussian process regression. IEEE Transactions on Pattern Analysis and Machine Intelligence 37 (12), pp. 2415–2427. Cited by: §3.
 Stan: a probabilistic programming language. Journal of Statistical Software 76 (1), pp. 1–32. Cited by: §5.
 An additive Gaussian process regression model for interpretable nonparametric analysis of longitudinal data. Nature Communications 10. Cited by: §1, §2.4, §6.

Deep mixed effect model using Gaussian processes: a personalized and reliable prediction for healthcare.
The ThirtyFourth AAAI Conference on Artificial Intelligence (AAAI20)
. Cited by: §3.  Additive Gaussian process for computer models with qualitative and quantitative factors. Technometrics 59 (3), pp. 283–292. Cited by: §1.
 Sparse Gaussian processes on discrete domains. IEEE Access 9 (), pp. 76750–76758. Cited by: §3.
 Dealing with categorical and integervalued variables in Bayesian optimization with Gaussian processes. Neurocomputing 380, pp. 20–35. Cited by: §1, §3.
 Bayesian functional ANOVA modeling using Gaussian process prior distributions. Bayesian Analysis 5 (1), pp. 123–149. Cited by: §2.4.

A Correspondence Between Bayesian Estimation on Stochastic Processes and Smoothing by Splines
. The Annals of Mathematical Statistics 41 (2), pp. 495 – 502. Cited by: §3. 
When Gaussian process meets big data: a review of scalable GPs.
IEEE Transactions on Neural Networks and Learning Systems
31 (11), pp. 4405–4423. Cited by: §1.  U.S. President 1976–2020, V6. External Links: Document Cited by: footnote 2.
 Hierarchical generalized additive models in ecology: an introduction with mgcv. PeerJ (5). Cited by: §3.
 A unifying view of sparse approximate Gaussian process regression. J. Mach. Learn. Res. 6, pp. 1939–1959. Cited by: §3.
 Bayesian nonparametric longitudinal data analysis. Journal of the American Statistical Association 111 (515), pp. 1168–1181. Cited by: §2.4.
 Functional data analysis. 2nd edition, Springer, New York, NY. Cited by: §5.2.

Gaussian Processes for Machine Learning
. MIT Press, Cambridge, Massachusetts. Cited by: §2.1.  Practical hilbert space approximate Bayesian Gaussian processes for probabilistic programming. External Links: 2004.11408 Cited by: §3, §5.
 Group kernels for Gaussian process metamodels with categorical inputs. SIAM/ASA Journal on Uncertainty Quantification 8 (2), pp. 775–806. Cited by: §1, §3.
 Sparse Gaussian processes using pseudoinputs. In Advances in Neural Information Processing Systems, Y. Weiss, B. Schölkopf, and J. Platt (Eds.), Vol. 18. Cited by: §1.
 Hilbert space methods for reducedrank Gaussian process regression. Statistics and Computing. Cited by: §1, §3, §4.2.
 lgpr: an interpretable nonparametric method for inferring covariate effects from longitudinal data. Bioinformatics 37 (13), pp. 1860–1867. Cited by: §1, §2.4, §2.4, §5.1.
 Variational learning of inducing variables in sparse Gaussian processes. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, D. van Dyk and M. Welling (Eds.), Proceedings of Machine Learning Research, Vol. 5, Hilton Clearwater Beach Resort, Clearwater Beach, Florida USA, pp. 567–574. Cited by: §1.
 Hierarchical Gaussian processes in Stan. External Links: Link Cited by: §5.3.
 Linear mixed models for longitudinal data. Springer, New York, NY. Cited by: §2.4.
 Scalable Gaussian processes for datadriven design using big data with categorical factors. arXiv:2106.15356. Cited by: §1.
 Kernel interpolation for scalable structured Gaussian processes (KISSGP). CoRR abs/1503.01057. External Links: Link Cited by: §1.
 A latent variable approach to gaussian process modeling with qualitative and quantitative factors. Technometrics 62 (3), pp. 291–302. Cited by: §3.
 Computer experiments with qualitative and quantitative variables: a review and reexamination. Quality Engineering 27, pp. 2–13. Cited by: §1.
Comments
There are no comments yet.