Gaussian Processes (GPs) are an elegant Bayesian approach to model an unknown function. They provide regression models where a posterior distribution over the unknown function is maintained as evidence is accumulated. This allows Gaussian processes to learn complex functions if a large amount of evidence is available and makes them robust against overfitting in the presence of little evidence Rasmussen2010 ; Rasmussen2006 . A GP can model a large class of phenomena through the choice of its kernel which characterizes one’s assumption on how the unknown function autocovaries. However, the choice of kernel is a core aspect of the GP design, since the posterior distribution can significantly vary for different kernels. As a consequence, various kernels (i.e. Squared Exponential, Periodic, Matérn and kernel design methods have been proposed Rasmussen2006 .
Notably, Wilson Wilson2013 introduced a flexible kernel called Spectral Mixture (SM) by modelings its power spectral density with a sum of Gaussians. SM kernels can be represented by a sum of
spectral mixtures and can be derived from Bochner’s theorem as the inverse Fourier transform of their corresponding spectral density function. SM kernels for GP’s have been shown to be effective to discover the latent patterns in dataset and to extrapolateRemes2017 ; Wilson2013 ; Wilson2014a . SM have been successfully employed in various applications, like medical time series prediction Cheng2017 , arctic coastal erosion forecasting Kupilik2017 , urban environmental monitoring in sensor networks Liu2016 .
SM cannot capture time and phase involved cross correlations of spectral mixtures because they only use auto-convolution of simple base spectral mixtures transformed from standard Gaussian density function Wilson2013 . Therefore, although elegant and often successful, this approach is not fully representative of real-life phenomena where time and phase related correlations and dependencies between spectral mixtures occur. For instance, the monthly river flow in estuaries shows time and phase related patterns that impacted by the gravity and resonances of moon and sun Hipel1994 , such as short term monthly variations, medium term seasonal patterns and non-strict periodic long term trends related to position of moon and sun. Naturally, these patterns are mutually influenced and correlated, hence cannot be faithfully modeled using SM.
In this paper, we extend SM kernels to include time and phase delayed mutual dependencies. At first, we design a complex valued Gaussian spectral density incorporates time delay and phase delay in frequency domain and transform it to time domain through Fourier transform. Second, with using cross-convolution between a base mixture and the complex conjugate of another base mixture we construct a complex-valued and positive definite kernel representing more involved correlations between spectral mixtures. Finaly, we construct the time and phase dependent Generalized Convolution Spectral Mixture (GCSM) kernels which has more expressive dependency and stronger interpretation.
Specifically, we address the following questions. (1) How can we design a complex valued spectral density incorporates time and phase delay? (2) How to decompose the complex valued spectral density? (3) How to construct cross spectral mixtures with time and phase delay? (4) How to build a valid time and phase dependent generalized spectral mixture kernel which satisfies the positive definite condition? (5) What is the relation between extended GCSM and SM kernels and how do they perform on real-life data with time and phase delay? In our setting, SM becomes a special case of time and phase dependent GCSM without time delay, phase delay and cross spectral mixtures (that is, by only considering auto-convolution of base mixtures). The resulting number of base components in GCSM is while SM has just components.
We assess comparatively the performance of time and phase dependent GCSM kernels through extensive experiments on synthetic and real-life data. Results show the beneficial contribution of the proposed approach. This is a substantial extension of a paper under review for a conference. In that submission we present a convolution way to model the correlation of spectral mixtures which does not consider time and phase delay, so GCSM and SM have the same hyper-parameter space.
The remainder of this paper is organized as follows. Backgrounds on GP and related works are given in Section 2. Section 3 introduces time and phase dependent GCSM. Then Section 4 presents the difference between GCSM and other kernels. Section 5 and 6 describes hyper-parameters initialization and experiments on synthetic and real world dataset, respectively. Concluding remarks and future work are given in Section 7.
In this section, we first describe the Gaussian processes, spectral mixture kernels, and related works.
2.1 Gaussian Process
A Gaussian process defines a distribution over functions, specified by its mean and covariance function Rasmussen2006 . The mean function and covariance function can be written as
where is an arbitrary input variable in . The covariance function
mapping two random variables into, is applied to construct a positive definite covariance matrix, here denoted by . Given and , we can define a GP as
Without loss of generality we assume the mean of a GP to be zero. By placing a GP prior over functions through the choice of kernels and parameter initialization, and the train data, we can predict the unknown value and its variance (that is, its uncertainty) for a test point using the key following predictive equations for GP regression Rasmussen2006 :
is an training vector andis the ground values corresponding to training vector . Typically, GPs contain free parameters, called hyper-parameters, which can be optimized by minimizing the Negative Log Marginal Likelihood (NLML). The NLML is defined as follows:
where , are the hyper-parameters of the kernel function and noise level . The NLML above directly follows from the observation that .
2.2 Spectral Mixture Kernels
Usually, the smoothness and generalization properties of GPs depend on the kernel function and its hyper-parameters . Choosing an appropriate kernel function and its initial hyper-parameters based on prior knowledge from the data are the core steps of a GP. Various kernel functions have been proposed Rasmussen2006 , such as Squared Exponential (SE), Periodic (PER), and general Matérn (MA).
where , , , and are period, y-scaling, x-scaling hyper-parameters, respectively.
Recently new covariance kernels have been proposed in Wilson2014 ; Wilson2013 , called Spectral Mixture (SM) kernels. An SM kernel, here denoted by , is derived through modeling a spectral density (Fourier transform of a kernel) with Gaussian mixtures. A desirable property of SM kernels is that they can be used to reconstruct other popular standard covariance kernels. According to Bochner’s Theorem Bochner2016 , the properties of a stationary kernel entirely depend on its spectral density. With enough components can approximate any stationary covariance kernel Wilson2013 .
where is the number of components, is the dimension of dataset, , , and are weight, mean, and variance of the th mixture component in frequency domain, respectively. The variance can be thought of as an inverse length-scale, as a frequency, and as a contribution.
Bochner’s Theorem Bochner2016 ; Stein indicates a direction on how to construct a valid kernel from the frequency domain. This implies that this kind of kernels can also be transformed between time domain and frequency domain. Using the following definition, the spectral density of kernel function can be given by its Fourier transform:
Furthermore, the inverse Fourier transform of spectral density is the original kernel function .
where is the imaginary number. We will use a hat to denote the spectral density of a covariance function in the frequency domain. From Bochner’s theorem Bochner2016 ; Stein and are Fourier duals of each other. For SM kernel Wilson2014 , using Fourier transform of the spectral density where is a symmetrized scale-location mixture of Gaussians in the frequency domain, we have
2.3 Related Work
Since the introduction of SM kernels Wilson2013 ; Wilson2014a , various useful variations have been introduced Wilson2014a ; Duvenaud2013 ; Flaxman2015 ; Oliva2016 ; Jang2017 . For instance, the spectral mixture product kernel (SMP) Wilson2014a which uses multi-dimensional SM kernels, extends the application scope of SM kernels to image data and spatial time data, and is able to discover patterns on large multidimensional datasets. More recently, non-stationary spectral kernels modeling input-dependent Gaussian process frequency density surfaces have been introduced in Remes2017 . Ulrich2015 adds a channel level dependency related to phase in the context of multiple output problem. A limited time and phase shift for multiple output GPs is also proposed in Parra2017 . These approaches do not capture dependencies when used in a single task setting. This is a main difference with GCSM, which models dependencies and time-phase delay in a single task setting.
3 Time and Phase Dependent Generalized Convolution SM Kernels for GPs
We can now address the first four questions mentioned in the introduction.
3.1 Designing Complex Valued Spectral Density with Time Delay and Phase Delay
In an ordinary SM kernel we have and
where is a -dimensional spectral density vector. The spectral density of SM kernel in the frequency domain is just a standard multivariate Gaussian function with amplitude , which ignores time and phase delay. In order to increase the flexibility and expressiveness of SM kernels, we propose to incorporate time and phase delay. At first, from a signal process’s perspective Bateman1954 , the Fourier transform of ( is the time delay) in the frequency domain is
( is the imaginary number). Second, the Fourier transform of phase delay ( is the phase delay) in the frequency domainBateman1954 is
As we known, Fourier transforms and multiplications of Gaussian functions are also Gaussian functions. Based on the Fourier transforms of time and phase delay between time domain and frequency domain, we can extend the SM spectral density function to include time and phase delay simultaneously as follows:
We call this complex valued density function the time and phase delayed GCSM spectral density function.
3.2 Convolution of Time Delayed and Phase Delayed Spectral Mixtures
Here we present a convolution way to decompose the complex valued spectral density into base spectral densities. Generally, any covariance function can be represented as a convolution form on Gaspari1999 :
From the above equation, we can get the symmetric of covariance function where denotes the convolution operator and is the basis kernel function. When we apply a Fourier transform to the general convolution form of the covariance function then we obtain:
Let , we have
for our GCSM kernel, which can be seen as the basis function of each spectral density component . On the other hand, convolution of covariance functions is also a valid covariance function. Inspired by this, we introduce the cross-spectral density of GCSM mixtures to model time-phase correlated and mixture-dependent components. Since the cross spectral density function should satisfy the positive definite condition, we construct the positive definite cross spectral density function as .
where is a symmetric, positive definite -by- covariance matrix in the frequency domain, is a -dimensional time delay vector, and is a -dimensional phase delay vector. Unlike the multidimensional ordinary SM, here is not necessarily diagonal. is definitely positive definite because for any non-zero vector ,
where the overline denotes complex conjugate.
3.3 Time and Phase Dependent Spectral Mixtures
By using Fourier transforms on convolution of time delayed and phase delayed spectral mixtures and by using inverse Fourier transforms on its cross spectral densities, the time and phase dependent generalized convolution spectral mixture can be defined as follows:
As mentioned before, like the symmetric properties of SM, GCSM’s spectral density must also be symmetric. So let GCSM’s spectral density . Similarly,
Here and are -dimensional cross time and phase delay vectors between spectral mixture and . When no time delay and phase delay are considered, we have
If we go further and make (where ), GCSM mixture reduces into ordinary SM mixture who assumed that spectral mixtures are independent Wilson2013 .
3.4 Time and Phase Dependent Generalized Convolution Spectral Mixture Kernel
Motivated by SM kernel and its spectral density formulation, if there are components and each component has weight in the original kernel spectral density, then for each base component , and . According to the distributivity of the convolution operator and symmetrical properties of GCSM, we have:
where denotes the number of auto-convolution spectral mixtures in the GCSM kernel. If there is no time and phase delay between different spectral mixtures, the cross components are only based on the convolution of cross base spectral mixtures. In this case GCSM becomes
Furthermore, if we just consider the auto-convolution of each base spectral mixture then the GCSM kernel reduces to Equation (10), that is, the ordinary SM kernel.
4 Comparisons Between Time and Phase Dependent GCSM and Other Kernels
In this section we aim to answer the last question mentioned in the introduction. Figure 1 illustrates the difference between SM and GCSM, where each connection represents a convolution component of the kernel: SM is an auto-convolution spectral mixture kernel that ignores dependencies between spectral mixtures, so it can be considered a special case of GCSM. Table 1 illustrates the difference between GCSM and popular kernels in term of hyper-parameter space, degrees of freedom, number of components and characteristics. GCSM is more flexible than the other kernels. Even without time () and phase () delay, GCSM ( ) also includes correlations between spectral mixtures (see Equation (28)).The price to pay is that the gradient computation for GCSM is more involved, because the dependencies between base spectral mixture components are considered.
|Kernel||Parameters||Freedom degrees||Number of components|
Figure 2 shows the auto-convolution spectral mixtures for SM in Equation (10), and Figure 3 presents dependencies between base components for GCSM in Equation (28) without time and phase dependent cross-convolution, and time and phase dependent cross-convolution spectral mixtures in time and frequency domain for GCSM as given in Equation (27). GCSM allows for correlated and mixture-dependent components. Therefore in order to give a clear illustration of cross components, we set the parameters of the components relatively close to each other in SM. The plots show the neat presence of cross convolution components and their contribution to the final kernel even without time and phase delay. When or the cross convolution components are shifted and centered at a different position. From a frequency domain perspective, this kind of shift is also reflected in the corresponding spectral densities (see the two rows in Figure 3). From this analysis one can observe that the closer the frequency , scale and weight between mixtures in SM are, the higher the cross convolution components contribution in GCSM.
Obviously the diagonal values of kernel matrix in SM have nothing to do with the hyper-parameters . But for non time and phase dependency GCSM, the diagonal values are affected by and must be positive. However, the diagonal values in time and phase dependent GCSM are affected by all hyper-parameters including time and phase delay. Particularly the cosine term in contains much information about the diagonal values and can determine its sign, which means the diagonal values in cross components can be negative and that depends on time and phase delay. In addition to assumption of independent spectral mixtures in ordinary SM Wilson2013 and positive dependencies of cross spectral mixtures in non time and phase related GCSM, the negative dependency between spectral mixtures modeled by time and phase dependent GCSM is really a far improvement and extends the application range of spectral kernels. Experimentally, we verify the negative dependencies on the monthly river flow dataset.
5 Hyperparameters Initialization
Both SM and GCSM are sensitive to the initial values of their hyper-parameters which may affect the capability of GP kernels to discover and extrapolate patterns. An initialization strategy using empirical spectral density could be used to find a good initialization Wilson2013 . However the empirical spectral density is often noisy so cannot be directly used. Past research indicated that the sharp peaks of the empirical spectral density are near the true frequencies Wilson2013 . Inspired by this observation, we apply Gaussian Mixture analysis to the empirical spectral density in order to identify the cluster center of Gaussian spectral density. Based on this kind of Gaussian mixture analysis on spectral density the initial hyper-parameters are possible to configured. We use this initialization strategy in the experiment described in Section 6.2.
Recently, Bayesian parameter optimization has been shown to be highly beneficial for automatic parameter tuning Knudde2017 ; Snoek2012 . In our setting we use Bayesian optimization to find the minimum of objective function on some bounded hyper-parameter domain . In this context, the objective function of GCSM is defined to accept hyper-parameter domain ( and ) and return the negative log likelihood obtained by training. Particularly during the optimization, all of the information from previous evaluations are used for next evaluation rather than just consider gradient. A prior over functions and acquisition function are two necessary steps for perform Bayesian optimization. In this case we choose Gaussian process prior to express assumptions about the function being optimized, and the acquisition function (denoted by ) gives the next hyper-parameters in should be evaluated via . Here we applied Expected Improvement (EI) as the acquisition function Snoek2012 , which reflects expectation of the improvement over the current best hyper-parameters with regard to the predictive distribution.
As for the covariance function used in Bayesian optimization, we consider the Matérn 5/2 kernel with disabled ARD as suggested in Snoek2012
. Hyperparameter initialization by Bayesian optimization is in general better than using Gaussian mixture analysis of the empirical spectral.
However, Bayesian optimization is much more computationally expensive. Therefore in our setting we only apply Bayesian optimization to initialize hyper-parameters and of SM and of GCSM, and the hyper-parameters are initialized as , and the hyper-parameters and in GCSM are just randomly initialized. This is done in the first and last of our three experiments, in Subsection 6.1 and 6.3.
We comparatively assess the performance of GCSM on artificial and real world experiments. The artificial experiment is designed to illustrate the ability of GCSM in modeling normal, integral, derivative and spectral mixture level time delay () of signal sampled from . In the other two experiments we use real-life data. In the second experiment we want to show the capability of GCSM of capturing dependencies between base components even when using hyperparamter values optimized on SM. Therefore in the last experiment we explore the full power of GCSM on a real-life problem. In this case we use Bayesian optimization of hyper-parameters. We use Mean Absolute Error (
) as performance metric for all tasks. We implemented our models in TensorflowAbadi2016c and GPflow Matthews2017 in order to enhance scalability and to facilitate gradient computations.
6.1 Experiment on Synthetic Dataset
The artificial experiment is designed to extrapolate integral, derivative and spectral mixture level time delay of signal sampled from (). The data are generated as follows: 1) generate a normal time series of length 500 in the interval [-10, 10]; 2) numerically compute the first integration and differentiation of the generative signal; 3) add time delay into each mixture to form the final spectral mixture level time delayed signals. Both GCSM and SM are configured with the same for all experiments and with the same initial values for the hyper-parameters . The other parameters of , and , are initialized randomly.
For the normal signals sampled from we randomly choose half of data as training data, and the rest as test data. The integration signals in the interval [-10, 0] are used for training (in cyan) and the remaining signal in the interval [0, 10] are used for testing (in yellow). Analogously, the differentiation signals in the interval [0, 10] are used for training and the rest for testing. Finally, for spectral mixture level time delayed signal, the interval [-5, 5] is selected as a test data and the rest as a training data. We consider the four settings described in Figure 4.
Results can be summarized as follows: (a) The difference in performance is negligible. Both GCSM and SM learned the covariance well. For the integration of the signal shown in Figure 3 (b), where its inherent pattern is more difficult to recognize and extrapolate, GCSM performs better than SM both with respect to MAE and confidence interval. Similarly, on the differentiation of the signal shown in Figure 3 (c) GCSM exhibits a better patterns learning and extrapolation ability. In particular, as shown in Figure 3 (d) GCSM yields better prediction and smoother confidence intervals.
Overall, this experiment indicates the capability of GCSM to correctly capture integration, differentiation and spectral mixture level time delay patterns of generative signal without any prior information, and achieves lowest MAE (see Table 2).
6.2 Airline Passengers Experiment
We compare the performance of GCSM and other GP kernels on a real-life extrapolation task: the airline passenger numbers recorded monthly from 1949 to 1961 Hyndman2010 . The airline passenger numbers dataset is a popular experiment which shows the advantage and flexibility of GPs because there are multiple patterns in the data, such as long term, medium term, seasonal and short term trends.
For GCSM, we fix the time and phase delay as 0, which means that only convolution is used, so GCSM and SM have the same parameter space (see Equations (28) and (10)). In all experiments we used . Furthermore, Gaussian mixtures of empirical spectral density are considered to initialize the hyper-parameters. Hyperparameters values are optimized on SM by randomly initializing 10 times and then optimizing them 1000 times using SM. These values are also used as initialization for GCSM. Since in this experimental setting GCSM and SM use the same hyper-parameters values, optimized on SM, we can directly compare the trained kernels and assess whether GCSM captures dependencies between base components.
Figure 4 (a) shows that in extrapolating the number of airline passengers GCSM (in blue dashed line) is better than ordinary SM (in red solid line) although the improvement is relatively small. Figures (b), (c) show the kernel matrix of and . Figure (d) shows the relative kernel difference . The difference is rather neat, indicating that the role of the cross components is strengthened.
6.3 Monthly River Flow Experiment
In order to show the full capability of GCSM in a real-life scenario for modeling correlation, time and phase delay between spectral mixtures, we consider the so-called monthly river flow dataset, which contains involved inherent time and phase patterns. The mean monthly river flow in Piper’s Hole River is the average flow from 1953 to 1981 Hipel1994
. The river drains into the head of Placentia Bay. Researchers wish to forecast the long range trend to monitor environment evolution, which can instruct future human activities. Interestingly, these flow recordings show time and phase related patterns and their variability over the period of recording. Especially in estuaries, the moon and sun are primarily responsible for the rising and falling of river tidal flows which are delayed and augmented by their gravity and resonances. Empirical analysis shows various characteristics of this flow data experiment: short term monthly variations, medium term seasonal patterns and non-strict periodic long term trend related to position of moon and sun, and some white noises.
We use and hyper-parameters are initialized through Bayesian optimization for both GCSM and SM, while and are randomly initialized in . As seen in Figure 5 (a), patterns in monthly river flow are more complicated than those of the previously considered benchmark datasets. Results indicate that both GCSM and SM can extrapolate the future month river flow well, with GCSM achieving better performance. There are multiple short term, medium-term and long term trends containing time and phase delay in the monthly river flow time series because the appearance time of flow peak is not periodical and its amplitude is always irregular. With the same number of components , GCSM iis clearly more effective in modeling complex patterns in data.
Another interesting characteristic of this experiment is that the max values (diagonal elements) of the trained kernel matrix are bigger than , so is negative. This shows that the negative cross components in (see Equation (31)) also contribute to the model. Their contribution is substantial, since the difference is neat. Overall, this experiment clearly shows that the spectral mixtures are mutually dependent.
We introduced time and phase dependent generalized convolution spectral mixture (GCSM) kernels, a extension of SM kernels capable to extrapolate complicated correlations across base spectral mixtures via cross convolution of complex valued spectral density incorporating time and phase delay in the frequency domain. GCSM generalizes ordinary SM by removing the assumption that spectral mixture components are independent. The cross spectral density constructed by a basis mixture and conjugate of another basis mixture guarantees that the proposed kernel is positive definite. The time and phase dependent GCSM kernel decomposition of each SM component in the frequency domain provides a way to discover the mutual dependent correlation between spectral mixtures.
Experiments on artificial and real-life datasets indicated that the proposed kernels for GP’s can discover time and phase delay between spectral mixtures through convolution, can identify and model complex structure of the data and make long-term trends forecasting.
There are two main issues which remain to be addressed in future work. A main issue is the initialization of time and phase delay parameters. Here we just used random initialization. However, more tailored, effective methods remain to be investigated. Another issue, common to all GP methods, is the problem of sparse or efficient inference Quinonero-Candela2005 ; Snelson2006 ; Wilson2015 ; Gardner2018 , which needs to be improved also for GPs with GCSM kernels. Lev́y process priors as proposed in Jang2017 present a promising approach for tackling this problem, by regularizing spectral mixtures for automatic selection of the number of components and pruning of unnecessary components.
This work was partly supported by China Scholarship Council (CSC).
Carl Edward Rasmussen and Hannes Nickisch.
Gaussian processes for machine learning (gpml) toolbox.Journal of Machine Learning Research, 11(Nov):3011–3015, 2010.
-  Carl Edward Rasmussen and Christopher KI Williams. Gaussian process for machine learning. MIT press, 2006.
-  Andrew Wilson and Ryan Adams. Gaussian process kernels for pattern discovery and extrapolation. In Proceedings of the 30th International Conference on Machine Learning (ICML-13), pages 1067–1075, 2013.
-  Sami Remes, Markus Heinonen, and Samuel Kaski. Non-stationary spectral kernels. In Advances in Neural Information Processing Systems, pages 4645–4654, 2017.
-  Andrew G Wilson, Elad Gilboa, Arye Nehorai, and John P Cunningham. Fast kernel learning for multidimensional pattern extrapolation. In Advances in Neural Information Processing Systems, pages 3626–3634, 2014.
-  Li-Fang Cheng, Gregory Darnell, Corey Chivers, Michael E Draugelis, Kai Li, and Barbara E Engelhardt. Sparse multi-output Gaussian processes for medical time series prediction. arXiv preprint arXiv:1703.09112, 2017.
-  Matthew Kupilik, Frank Witmer, Euan-Angus MacLeod, Caixia Wang, and Tom Ravens. Gaussian process regression for arctic coastal erosion forecasting. arXiv preprint arXiv:1712.00867, 2017.
-  Xiuming Liu, Teng Xi, and Edith Ngai. Data modelling with Gaussian process in sensor networks for urban environmental monitoring. In Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS), 2016 IEEE 24th International Symposium on, pages 457–462. IEEE, 2016.
-  Keith W Hipel and A Ian McLeod. Time series modelling of water resources and environmental systems, volume 45. Elsevier, 1994.
-  Andrew Gordon Wilson. Covariance kernels for fast automatic pattern discovery and extrapolation with Gaussian processes. University of Cambridge, 2014.
-  Salomon Bochner. Lectures on Fourier Integrals.(AM-42), volume 42. Princeton University Press, 2016.
-  ML Stein. Interpolation of spatial data: some theory for kriging. 1999.
-  David Duvenaud, James Robert Lloyd, Roger Grosse, Joshua B Tenenbaum, and Zoubin Ghahramani. Structure discovery in nonparametric regression through compositional kernel search. arXiv preprint arXiv:1302.4922, 2013.
-  Seth Flaxman, Andrew Wilson, Daniel Neill, Hannes Nickisch, and Alex Smola. Fast kronecker inference in Gaussian processes with non-Gaussian likelihoods. In International Conference on Machine Learning, pages 607–616, 2015.
-  Junier B Oliva, Avinava Dubey, Andrew G Wilson, Barnabás Póczos, Jeff Schneider, and Eric P Xing. Bayesian nonparametric kernel-learning. In Artificial Intelligence and Statistics, pages 1078–1086, 2016.
-  Phillip A Jang, Andrew Loeb, Matthew Davidow, and Andrew G Wilson. Scalable Levy process priors for spectral kernel learning. In Advances in Neural Information Processing Systems, pages 3943–3952, 2017.
-  Kyle R Ulrich, David E Carlson, Kafui Dzirasa, and Lawrence Carin. GP kernels for cross-spectrum analysis. In Advances in neural information processing systems, pages 1999–2007, 2015.
-  Gabriel Parra and Felipe Tobar. Spectral mixture kernels for multi-output Gaussian processes. In Advances in Neural Information Processing Systems, pages 6684–6693, 2017.
-  Harry Bateman. Tables of integral transforms [volumes I & II], volume 1. McGraw-Hill Book Company, 1954.
-  Gregory Gaspari and Stephen E Cohn. Construction of correlation functions in two and three dimensions. Quarterly Journal of the Royal Meteorological Society, 125(554):723–757, 1999.
-  Nicolas Knudde, Joachim van der Herten, Tom Dhaene, and Ivo Couckuyt. GPflowOpt: A Bayesian optimization library using tensorflow. arXiv preprint arXiv:1711.03845, 2017.
-  Jasper Snoek, Hugo Larochelle, and Ryan P Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in neural information processing systems, pages 2951–2959, 2012.
-  Martın Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. Tensorflow: A system for large-scale machine learning. arxiv preprint. arXiv preprint arXiv:1605.08695, 2016.
-  Alexander G de G Matthews, Mark van der Wilk, Tom Nickson, Keisuke Fujii, Alexis Boukouvalas, Pablo León-Villagrá, Zoubin Ghahramani, and James Hensman. GPflow: A Gaussian process library using tensorflow. Journal of Machine Learning Research, 18(40):1–6, 2017.
-  Rob J Hyndman and M Akram. Time series data library. Available from Internet: http://robjhyndman. com/TSDL, 2010.
-  Joaquin Quiñonero-Candela and Carl Edward Rasmussen. A unifying view of sparse approximate Gaussian process regression. Journal of Machine Learning Research, 6(Dec):1939–1959, 2005.
-  Edward Snelson and Zoubin Ghahramani. Sparse Gaussian processes using pseudo-inputs. In Advances in neural information processing systems, pages 1257–1264, 2006.
-  Andrew Gordon Wilson and Hannes Nickisch. Kernel interpolation for scalable structured Gaussian processes (KISS-GP). In Francis R. Bach and David M. Blei, editors, Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, Lille, France, 6-11 July 2015, volume 37 of JMLR Workshop and Conference Proceedings, pages 1775–1784. JMLR.org, 2015.
-  Jacob R Gardner, Geoff Pleiss, Ruihan Wu, Kilian Q Weinberger, and Andrew Gordon Wilson. Product kernel interpolation for scalable Gaussian processes. arXiv preprint arXiv:1802.08903, 2018.