Spectral Mixture Kernels with Time and Phase Delay Dependencies

08/01/2018
by   Kai Chen, et al.
Radboud Universiteit
0

Spectral Mixture (SM) kernels form a powerful class of kernels for Gaussian processes, capable to discover patterns, extrapolate, and model negative co-variances. In SM kernels, spectral mixture components are linearly combined to construct a final flexible kernel. As a consequence SM kernels does not explicitly model correlations between components and dependencies related to time and phase delays between components, because only the auto-convolution of base components are used. To address these drawbacks we introduce Generalized Convolution Spectral Mixture (GCSM) kernels. We incorporate time and phase delay into the base spectral mixture and use cross-convolution between a base component and the complex conjugate of another base component to construct a complex-valued and positive definite kernel representing correlations between base components. In this way the total number of components in GCSM becomes quadratic. We perform a thorough comparative experimental analysis of GCSM on synthetic and real-life datasets. Results indicate the beneficial effect of the extra features of GCSM. This is illustrated in the problem of forecasting the long range trend of a river flow to monitor environment evolution, where GCSM is capable of discovering correlated patterns that SM cannot and improving patterns recognition ability of SM.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 12

page 13

08/03/2018

Generalized Spectral Mixture Kernels for Multi-Task Gaussian Processes

Multi-Task Gaussian processes (MTGPs) have shown a significant progress ...
08/07/2018

Multi-Output Convolution Spectral Mixture for Gaussian Processes

Multi-output Gaussian processes (MOGPs) are recently extended by using s...
02/02/2018

Scalable Lévy Process Priors for Spectral Kernel Learning

Gaussian processes are rich distributions over functions, with generaliz...
11/08/2020

Skewed Laplace Spectral Mixture kernels for long-term forecasting in Gaussian process

Long-term forecasting involves predicting a horizon that is far ahead of...
03/11/2021

The Minecraft Kernel: Modelling correlated Gaussian Processes in the Fourier domain

In the univariate setting, using the kernel spectral representation is a...
06/14/2021

Marginalising over Stationary Kernels with Bayesian Quadrature

Marginalising over families of Gaussian Process kernels produces flexibl...
06/18/2015

Simultaneous Estimation of Non-Gaussian Components and their Correlation Structure

The statistical dependencies which independent component analysis (ICA) ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Gaussian Processes (GPs) are an elegant Bayesian approach to model an unknown function. They provide regression models where a posterior distribution over the unknown function is maintained as evidence is accumulated. This allows Gaussian processes to learn complex functions if a large amount of evidence is available and makes them robust against overfitting in the presence of little evidence Rasmussen2010 ; Rasmussen2006 . A GP can model a large class of phenomena through the choice of its kernel which characterizes one’s assumption on how the unknown function autocovaries. However, the choice of kernel is a core aspect of the GP design, since the posterior distribution can significantly vary for different kernels. As a consequence, various kernels (i.e. Squared Exponential, Periodic, Matérn and kernel design methods have been proposed Rasmussen2006 .

Notably, Wilson Wilson2013 introduced a flexible kernel called Spectral Mixture (SM) by modelings its power spectral density with a sum of Gaussians. SM kernels can be represented by a sum of

spectral mixtures and can be derived from Bochner’s theorem as the inverse Fourier transform of their corresponding spectral density function. SM kernels for GP’s have been shown to be effective to discover the latent patterns in dataset and to extrapolate

Remes2017 ; Wilson2013 ; Wilson2014a . SM have been successfully employed in various applications, like medical time series prediction Cheng2017 , arctic coastal erosion forecasting Kupilik2017 , urban environmental monitoring in sensor networks Liu2016 .

SM cannot capture time and phase involved cross correlations of spectral mixtures because they only use auto-convolution of simple base spectral mixtures transformed from standard Gaussian density function Wilson2013 . Therefore, although elegant and often successful, this approach is not fully representative of real-life phenomena where time and phase related correlations and dependencies between spectral mixtures occur. For instance, the monthly river flow in estuaries shows time and phase related patterns that impacted by the gravity and resonances of moon and sun Hipel1994 , such as short term monthly variations, medium term seasonal patterns and non-strict periodic long term trends related to position of moon and sun. Naturally, these patterns are mutually influenced and correlated, hence cannot be faithfully modeled using SM.

In this paper, we extend SM kernels to include time and phase delayed mutual dependencies. At first, we design a complex valued Gaussian spectral density incorporates time delay and phase delay in frequency domain and transform it to time domain through Fourier transform. Second, with using cross-convolution between a base mixture and the complex conjugate of another base mixture we construct a complex-valued and positive definite kernel representing more involved correlations between spectral mixtures. Finaly, we construct the time and phase dependent Generalized Convolution Spectral Mixture (GCSM) kernels which has more expressive dependency and stronger interpretation.

Specifically, we address the following questions. (1) How can we design a complex valued spectral density incorporates time and phase delay? (2) How to decompose the complex valued spectral density? (3) How to construct cross spectral mixtures with time and phase delay? (4) How to build a valid time and phase dependent generalized spectral mixture kernel which satisfies the positive definite condition? (5) What is the relation between extended GCSM and SM kernels and how do they perform on real-life data with time and phase delay? In our setting, SM becomes a special case of time and phase dependent GCSM without time delay, phase delay and cross spectral mixtures (that is, by only considering auto-convolution of base mixtures). The resulting number of base components in GCSM is while SM has just components.

We assess comparatively the performance of time and phase dependent GCSM kernels through extensive experiments on synthetic and real-life data. Results show the beneficial contribution of the proposed approach. This is a substantial extension of a paper under review for a conference. In that submission we present a convolution way to model the correlation of spectral mixtures which does not consider time and phase delay, so GCSM and SM have the same hyper-parameter space.

The remainder of this paper is organized as follows. Backgrounds on GP and related works are given in Section 2. Section 3 introduces time and phase dependent GCSM. Then Section 4 presents the difference between GCSM and other kernels. Section 5 and 6 describes hyper-parameters initialization and experiments on synthetic and real world dataset, respectively. Concluding remarks and future work are given in Section 7.

2 Background

In this section, we first describe the Gaussian processes, spectral mixture kernels, and related works.

2.1 Gaussian Process

A Gaussian process defines a distribution over functions, specified by its mean and covariance function Rasmussen2006 . The mean function and covariance function can be written as

(1)
(2)

where is an arbitrary input variable in . The covariance function

mapping two random variables into

, is applied to construct a positive definite covariance matrix, here denoted by . Given and , we can define a GP as

(3)

Without loss of generality we assume the mean of a GP to be zero. By placing a GP prior over functions through the choice of kernels and parameter initialization, and the train data, we can predict the unknown value and its variance (that is, its uncertainty) for a test point using the key following predictive equations for GP regression Rasmussen2006 :

(4)
(5)

where

is an training vector and

is the ground values corresponding to training vector . Typically, GPs contain free parameters, called hyper-parameters, which can be optimized by minimizing the Negative Log Marginal Likelihood (NLML). The NLML is defined as follows:

(6)

where , are the hyper-parameters of the kernel function and noise level . The NLML above directly follows from the observation that .

2.2 Spectral Mixture Kernels

Usually, the smoothness and generalization properties of GPs depend on the kernel function and its hyper-parameters . Choosing an appropriate kernel function and its initial hyper-parameters based on prior knowledge from the data are the core steps of a GP. Various kernel functions have been proposed Rasmussen2006 , such as Squared Exponential (SE), Periodic (PER), and general Matérn (MA).

(7)
(8)
(9)

where , , , and are period, y-scaling, x-scaling hyper-parameters, respectively.

Recently new covariance kernels have been proposed in Wilson2014 ; Wilson2013 , called Spectral Mixture (SM) kernels. An SM kernel, here denoted by , is derived through modeling a spectral density (Fourier transform of a kernel) with Gaussian mixtures. A desirable property of SM kernels is that they can be used to reconstruct other popular standard covariance kernels. According to Bochner’s Theorem Bochner2016 , the properties of a stationary kernel entirely depend on its spectral density. With enough components can approximate any stationary covariance kernel Wilson2013 .

(10)

where is the number of components, is the dimension of dataset, , , and are weight, mean, and variance of the th mixture component in frequency domain, respectively. The variance can be thought of as an inverse length-scale, as a frequency, and as a contribution.

Bochner’s Theorem Bochner2016 ; Stein indicates a direction on how to construct a valid kernel from the frequency domain. This implies that this kind of kernels can also be transformed between time domain and frequency domain. Using the following definition, the spectral density of kernel function can be given by its Fourier transform:

(11)

Furthermore, the inverse Fourier transform of spectral density is the original kernel function .

(12)

where is the imaginary number. We will use a hat to denote the spectral density of a covariance function in the frequency domain. From Bochner’s theorem Bochner2016 ; Stein and are Fourier duals of each other. For SM kernel Wilson2014 , using Fourier transform of the spectral density where is a symmetrized scale-location mixture of Gaussians in the frequency domain, we have

(13)

2.3 Related Work

Since the introduction of SM kernels Wilson2013 ; Wilson2014a , various useful variations have been introduced Wilson2014a ; Duvenaud2013 ; Flaxman2015 ; Oliva2016 ; Jang2017 . For instance, the spectral mixture product kernel (SMP) Wilson2014a which uses multi-dimensional SM kernels, extends the application scope of SM kernels to image data and spatial time data, and is able to discover patterns on large multidimensional datasets. More recently, non-stationary spectral kernels modeling input-dependent Gaussian process frequency density surfaces have been introduced in Remes2017 . Ulrich2015 adds a channel level dependency related to phase in the context of multiple output problem. A limited time and phase shift for multiple output GPs is also proposed in Parra2017 . These approaches do not capture dependencies when used in a single task setting. This is a main difference with GCSM, which models dependencies and time-phase delay in a single task setting.

3 Time and Phase Dependent Generalized Convolution SM Kernels for GPs

We can now address the first four questions mentioned in the introduction.

3.1 Designing Complex Valued Spectral Density with Time Delay and Phase Delay

In an ordinary SM kernel we have and

(14)

where is a -dimensional spectral density vector. The spectral density of SM kernel in the frequency domain is just a standard multivariate Gaussian function with amplitude , which ignores time and phase delay. In order to increase the flexibility and expressiveness of SM kernels, we propose to incorporate time and phase delay. At first, from a signal process’s perspective Bateman1954 , the Fourier transform of ( is the time delay) in the frequency domain is

(15)

( is the imaginary number). Second, the Fourier transform of phase delay ( is the phase delay) in the frequency domainBateman1954 is

(16)

As we known, Fourier transforms and multiplications of Gaussian functions are also Gaussian functions. Based on the Fourier transforms of time and phase delay between time domain and frequency domain, we can extend the SM spectral density function to include time and phase delay simultaneously as follows:

(17)

We call this complex valued density function the time and phase delayed GCSM spectral density function.

3.2 Convolution of Time Delayed and Phase Delayed Spectral Mixtures

Here we present a convolution way to decompose the complex valued spectral density into base spectral densities. Generally, any covariance function can be represented as a convolution form on Gaspari1999 :

(18)

From the above equation, we can get the symmetric of covariance function where denotes the convolution operator and is the basis kernel function. When we apply a Fourier transform to the general convolution form of the covariance function then we obtain:

(19)

Let , we have

(20)

for our GCSM kernel, which can be seen as the basis function of each spectral density component . On the other hand, convolution of covariance functions is also a valid covariance function. Inspired by this, we introduce the cross-spectral density of GCSM mixtures to model time-phase correlated and mixture-dependent components. Since the cross spectral density function should satisfy the positive definite condition, we construct the positive definite cross spectral density function as .

(21)

where is a symmetric, positive definite -by- covariance matrix in the frequency domain, is a -dimensional time delay vector, and is a -dimensional phase delay vector. Unlike the multidimensional ordinary SM, here is not necessarily diagonal. is definitely positive definite because for any non-zero vector ,

(22)

where the overline denotes complex conjugate.

3.3 Time and Phase Dependent Spectral Mixtures

By using Fourier transforms on convolution of time delayed and phase delayed spectral mixtures and by using inverse Fourier transforms on its cross spectral densities, the time and phase dependent generalized convolution spectral mixture can be defined as follows:

(23)

As mentioned before, like the symmetric properties of SM, GCSM’s spectral density must also be symmetric. So let GCSM’s spectral density . Similarly,

(24)

Here and are -dimensional cross time and phase delay vectors between spectral mixture and . When no time delay and phase delay are considered, we have

(25)

If we go further and make (where ), GCSM mixture reduces into ordinary SM mixture who assumed that spectral mixtures are independent Wilson2013 .

(26)

3.4 Time and Phase Dependent Generalized Convolution Spectral Mixture Kernel

Motivated by SM kernel and its spectral density formulation, if there are components and each component has weight in the original kernel spectral density, then for each base component , and . According to the distributivity of the convolution operator and symmetrical properties of GCSM, we have:

(27)

where denotes the number of auto-convolution spectral mixtures in the GCSM kernel. If there is no time and phase delay between different spectral mixtures, the cross components are only based on the convolution of cross base spectral mixtures. In this case GCSM becomes

(28)

Furthermore, if we just consider the auto-convolution of each base spectral mixture then the GCSM kernel reduces to Equation (10), that is, the ordinary SM kernel.

4 Comparisons Between Time and Phase Dependent GCSM and Other Kernels

In this section we aim to answer the last question mentioned in the introduction. Figure 1 illustrates the difference between SM and GCSM, where each connection represents a convolution component of the kernel: SM is an auto-convolution spectral mixture kernel that ignores dependencies between spectral mixtures, so it can be considered a special case of GCSM. Table 1 illustrates the difference between GCSM and popular kernels in term of hyper-parameter space, degrees of freedom, number of components and characteristics. GCSM is more flexible than the other kernels. Even without time (

) and phase () delay, GCSM ( ) also includes correlations between spectral mixtures (see Equation (28)).The price to pay is that the gradient computation for GCSM is more involved, because the dependencies between base spectral mixture components are considered.

Figure 1: Difference of convolution relationship between GCSM and SM with the same number of base components. Sub Figure a, the auto-convolution correlation of base spectral mixtures. Sub Figure b, the cross and auto-convolution correlation between spectral mixtures.
Kernel Parameters Freedom degrees Number of components
SE 2 1
Periodic 3 1
Matérn 2 1
SM
GCSM
Table 1: Comparisons between GCSM and other GP kernels, with respect to hyper-parameters, degrees of freedom, number of components and characteristics.
(a)
(b)
Figure 2: Auto-convolution spectral mixtures and corresponding spectral densities in ().
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
Figure 3: Covariance functions and corresponding spectral densities in . First row: GCSM with zero time and phase delay (), non-zero time delay and zero phase delay, zero time delay and non-zero phase delay, non-zero time and phase delay cross-convolution spectral mixtures (the figures only show half of 6 cross convolution spectral mixtures). Second row: the corresponding spectral densities of the spectral mixtures shown in the first row. For the cross convolution spectral densities, the real part is shown in solid line and the imaginary part in dashed line.

Figure 2 shows the auto-convolution spectral mixtures for SM in Equation (10), and Figure 3 presents dependencies between base components for GCSM in Equation (28) without time and phase dependent cross-convolution, and time and phase dependent cross-convolution spectral mixtures in time and frequency domain for GCSM as given in Equation (27). GCSM allows for correlated and mixture-dependent components. Therefore in order to give a clear illustration of cross components, we set the parameters of the components relatively close to each other in SM. The plots show the neat presence of cross convolution components and their contribution to the final kernel even without time and phase delay. When or the cross convolution components are shifted and centered at a different position. From a frequency domain perspective, this kind of shift is also reflected in the corresponding spectral densities (see the two rows in Figure 3). From this analysis one can observe that the closer the frequency , scale and weight between mixtures in SM are, the higher the cross convolution components contribution in GCSM.

According to Equation (27), Equation (28), and Equation (10), for diagonal elements of the trained kernel matrix (), we have

(29)
(30)
(31)

Obviously the diagonal values of kernel matrix in SM have nothing to do with the hyper-parameters . But for non time and phase dependency GCSM, the diagonal values are affected by and must be positive. However, the diagonal values in time and phase dependent GCSM are affected by all hyper-parameters including time and phase delay. Particularly the cosine term in contains much information about the diagonal values and can determine its sign, which means the diagonal values in cross components can be negative and that depends on time and phase delay. In addition to assumption of independent spectral mixtures in ordinary SM Wilson2013 and positive dependencies of cross spectral mixtures in non time and phase related GCSM, the negative dependency between spectral mixtures modeled by time and phase dependent GCSM is really a far improvement and extends the application range of spectral kernels. Experimentally, we verify the negative dependencies on the monthly river flow dataset.

5 Hyperparameters Initialization

Both SM and GCSM are sensitive to the initial values of their hyper-parameters which may affect the capability of GP kernels to discover and extrapolate patterns. An initialization strategy using empirical spectral density could be used to find a good initialization Wilson2013 . However the empirical spectral density is often noisy so cannot be directly used. Past research indicated that the sharp peaks of the empirical spectral density are near the true frequencies Wilson2013 . Inspired by this observation, we apply Gaussian Mixture analysis to the empirical spectral density in order to identify the cluster center of Gaussian spectral density. Based on this kind of Gaussian mixture analysis on spectral density the initial hyper-parameters are possible to configured. We use this initialization strategy in the experiment described in Section 6.2.

Recently, Bayesian parameter optimization has been shown to be highly beneficial for automatic parameter tuning Knudde2017 ; Snoek2012 . In our setting we use Bayesian optimization to find the minimum of objective function on some bounded hyper-parameter domain . In this context, the objective function of GCSM is defined to accept hyper-parameter domain ( and ) and return the negative log likelihood obtained by training. Particularly during the optimization, all of the information from previous evaluations are used for next evaluation rather than just consider gradient. A prior over functions and acquisition function are two necessary steps for perform Bayesian optimization. In this case we choose Gaussian process prior to express assumptions about the function being optimized, and the acquisition function (denoted by ) gives the next hyper-parameters in should be evaluated via . Here we applied Expected Improvement (EI) as the acquisition function Snoek2012 , which reflects expectation of the improvement over the current best hyper-parameters with regard to the predictive distribution.

(32)
(33)

As for the covariance function used in Bayesian optimization, we consider the Matérn 5/2 kernel with disabled ARD as suggested in Snoek2012

. Hyperparameter initialization by Bayesian optimization is in general better than using Gaussian mixture analysis of the empirical spectral.

However, Bayesian optimization is much more computationally expensive. Therefore in our setting we only apply Bayesian optimization to initialize hyper-parameters and of SM and of GCSM, and the hyper-parameters are initialized as , and the hyper-parameters and in GCSM are just randomly initialized. This is done in the first and last of our three experiments, in Subsection 6.1 and 6.3.

6 Experiment

We comparatively assess the performance of GCSM on artificial and real world experiments. The artificial experiment is designed to illustrate the ability of GCSM in modeling normal, integral, derivative and spectral mixture level time delay () of signal sampled from . In the other two experiments we use real-life data. In the second experiment we want to show the capability of GCSM of capturing dependencies between base components even when using hyperparamter values optimized on SM. Therefore in the last experiment we explore the full power of GCSM on a real-life problem. In this case we use Bayesian optimization of hyper-parameters. We use Mean Absolute Error (

) as performance metric for all tasks. We implemented our models in Tensorflow

Abadi2016c and GPflow Matthews2017 in order to enhance scalability and to facilitate gradient computations.

6.1 Experiment on Synthetic Dataset

The artificial experiment is designed to extrapolate integral, derivative and spectral mixture level time delay of signal sampled from (). The data are generated as follows: 1) generate a normal time series of length 500 in the interval [-10, 10]; 2) numerically compute the first integration and differentiation of the generative signal; 3) add time delay into each mixture to form the final spectral mixture level time delayed signals. Both GCSM and SM are configured with the same for all experiments and with the same initial values for the hyper-parameters . The other parameters of , and , are initialized randomly.

For the normal signals sampled from we randomly choose half of data as training data, and the rest as test data. The integration signals in the interval [-10, 0] are used for training (in cyan) and the remaining signal in the interval [0, 10] are used for testing (in yellow). Analogously, the differentiation signals in the interval [0, 10] are used for training and the rest for testing. Finally, for spectral mixture level time delayed signal, the interval [-5, 5] is selected as a test data and the rest as a training data. We consider the four settings described in Figure 4.

Results can be summarized as follows: (a) The difference in performance is negligible. Both GCSM and SM learned the covariance well. For the integration of the signal shown in Figure 3 (b), where its inherent pattern is more difficult to recognize and extrapolate, GCSM performs better than SM both with respect to MAE and confidence interval. Similarly, on the differentiation of the signal shown in Figure 3 (c) GCSM exhibits a better patterns learning and extrapolation ability. In particular, as shown in Figure 3 (d) GCSM yields better prediction and smoother confidence intervals.

Overall, this experiment indicates the capability of GCSM to correctly capture integration, differentiation and spectral mixture level time delay patterns of generative signal without any prior information, and achieves lowest MAE (see Table 2).

(a) Signal sampled from
(b) Integral of signal
(c) Derivative of signal
(d) Spectral mixture level time delayed signal
Figure 4: Comparison between GCSM and SM on an artificial experiment. (a) Signal randomly sampled from with , training data is randomly selected from signal and the rest as a test. (b) Integral of signal was numerically computed, the first half of data was selected as a training and the rest as a test. (c) Derivative of signal was numerically computed, the last half of data was selected as a training and the rest as a test. (d) Spectral mixture level time delayed signal , each component added a different time delay, the middle part was selected as a test and the rest as a training.

6.2 Airline Passengers Experiment

We compare the performance of GCSM and other GP kernels on a real-life extrapolation task: the airline passenger numbers recorded monthly from 1949 to 1961 Hyndman2010 . The airline passenger numbers dataset is a popular experiment which shows the advantage and flexibility of GPs because there are multiple patterns in the data, such as long term, medium term, seasonal and short term trends.

For GCSM, we fix the time and phase delay as 0, which means that only convolution is used, so GCSM and SM have the same parameter space (see Equations (28) and (10)). In all experiments we used . Furthermore, Gaussian mixtures of empirical spectral density are considered to initialize the hyper-parameters. Hyperparameters values are optimized on SM by randomly initializing 10 times and then optimizing them 1000 times using SM. These values are also used as initialization for GCSM. Since in this experimental setting GCSM and SM use the same hyper-parameters values, optimized on SM, we can directly compare the trained kernels and assess whether GCSM captures dependencies between base components.

Figure 4 (a) shows that in extrapolating the number of airline passengers GCSM (in blue dashed line) is better than ordinary SM (in red solid line) although the improvement is relatively small. Figures (b), (c) show the kernel matrix of and . Figure (d) shows the relative kernel difference . The difference is rather neat, indicating that the role of the cross components is strengthened.

(a) Airline passengers
(b)
(c)
(d)
Figure 5: Performance and comparison of SM and GCSM (with no time and phase delay) on on the airline passenger dataset. The first 96 monthly recordings are used for training (in black) and the next 48 months are used for testing (in green). (a) Airline passenger number prediction without considering time and phase delay, GCSM in dashed blue line. (b) Kernel matrix of . (c) Kernel matrix of . (d) The relative kernel difference between the kernels. The 95 confidence interval is based on GCSM’s predictions.

6.3 Monthly River Flow Experiment

In order to show the full capability of GCSM in a real-life scenario for modeling correlation, time and phase delay between spectral mixtures, we consider the so-called monthly river flow dataset, which contains involved inherent time and phase patterns. The mean monthly river flow in Piper’s Hole River is the average flow from 1953 to 1981 Hipel1994

. The river drains into the head of Placentia Bay. Researchers wish to forecast the long range trend to monitor environment evolution, which can instruct future human activities. Interestingly, these flow recordings show time and phase related patterns and their variability over the period of recording. Especially in estuaries, the moon and sun are primarily responsible for the rising and falling of river tidal flows which are delayed and augmented by their gravity and resonances. Empirical analysis shows various characteristics of this flow data experiment: short term monthly variations, medium term seasonal patterns and non-strict periodic long term trend related to position of moon and sun, and some white noises.

We use and hyper-parameters are initialized through Bayesian optimization for both GCSM and SM, while and are randomly initialized in . As seen in Figure 5 (a), patterns in monthly river flow are more complicated than those of the previously considered benchmark datasets. Results indicate that both GCSM and SM can extrapolate the future month river flow well, with GCSM achieving better performance. There are multiple short term, medium-term and long term trends containing time and phase delay in the monthly river flow time series because the appearance time of flow peak is not periodical and its amplitude is always irregular. With the same number of components , GCSM iis clearly more effective in modeling complex patterns in data.

Another interesting characteristic of this experiment is that the max values (diagonal elements) of the trained kernel matrix are bigger than , so is negative. This shows that the negative cross components in (see Equation (31)) also contribute to the model. Their contribution is substantial, since the difference is neat. Overall, this experiment clearly shows that the spectral mixtures are mutually dependent.

(a) Mean monthly flow
(b)
(c)
(d)
Figure 6: Predicting long term mean monthly flow in Piper’s Hole river. There are 348 months (Jan 1953 - Dec 1981) recordings, the first half are used for training (in cyan) and the rest are used for testing (in yellow). (a) Performance of SM and GCSM (=10). (b) Trained kernel matrix . (c) Trained kernel matrix . (d) The cross components and absolute difference between and .
Kernel Arti Arti Arti Arti Airline River flow
SE 0. 361 0. 427 0. 238 1. 253 51. 259 13. 720
Periodic 1. 096 0. 368 0. 351 1. 468 93. 662 13. 720
Matérn 5/2 0. 374 0. 419 0. 237 1. 251 352. 987 13. 720
SM 0. 094 0. 685 0. 152 1. 413 17. 427 13. 859
GCSM 0. 087 0. 232 0. 098 0. 847 16. 830 12. 739
Table 2: Summary of performance comparisons between GCSM and SM on the artificial experiment and real world dataset. The GCSM kernel consistently achieves the lowest MAE. Actually predictions using the kernel SE, Periodic and Matérn 5/2 are very bad especially for extrapolating task (Integral, Derivative, Time delayed signals, Airline passengers, and Monthly river flow) even the MAE looks like not so bad. It is very hard for them to find valid patterns in the data. Arti, Arti, Arti, and Arti correspond to tasks (the first, second, third, and fourth) in synthetic dataset experiments, respectively.

7 Conclusions

We introduced time and phase dependent generalized convolution spectral mixture (GCSM) kernels, a extension of SM kernels capable to extrapolate complicated correlations across base spectral mixtures via cross convolution of complex valued spectral density incorporating time and phase delay in the frequency domain. GCSM generalizes ordinary SM by removing the assumption that spectral mixture components are independent. The cross spectral density constructed by a basis mixture and conjugate of another basis mixture guarantees that the proposed kernel is positive definite. The time and phase dependent GCSM kernel decomposition of each SM component in the frequency domain provides a way to discover the mutual dependent correlation between spectral mixtures.

Experiments on artificial and real-life datasets indicated that the proposed kernels for GP’s can discover time and phase delay between spectral mixtures through convolution, can identify and model complex structure of the data and make long-term trends forecasting.

There are two main issues which remain to be addressed in future work. A main issue is the initialization of time and phase delay parameters. Here we just used random initialization. However, more tailored, effective methods remain to be investigated. Another issue, common to all GP methods, is the problem of sparse or efficient inference Quinonero-Candela2005 ; Snelson2006 ; Wilson2015 ; Gardner2018 , which needs to be improved also for GPs with GCSM kernels. Lev́y process priors as proposed in Jang2017 present a promising approach for tackling this problem, by regularizing spectral mixtures for automatic selection of the number of components and pruning of unnecessary components.

Acknowledgment

This work was partly supported by China Scholarship Council (CSC).

References