Financial applications of artificial intelligence research has become an area of rapid development that strives to forecast financial indicators and performance metrics using machine learning (ML) with promising results[34, 17, 28, 39]. Traditionally, financial indicators have been modelled using ARCH or GARCH models [9, 3, 14]. More recently, flexible ML-based models have been constructed with applications to financial data [12, 27, 32].
The movement of two (or more) financial indicators such as stocks and commodities in a similar fashion (also called co-movement) can be caused (i) by their mutual dependence, (ii) through changes in the same external factors that influence its price (e.g. political announcements, natural events, etc.), or (iii) through influences from a more complex economic system [21, 25, 38]. In any case, discovering these relationships is crucial for investors, financial experts, and for better understanding the market. The underlying economic principles can be hard to model as these systems are rather complex. Therefore, automatic discovery of those relationships through ML may be instrumental to provide novel market insights previously unknown, as well as to confirm present conjectures. This improved understanding of the interdependence among financial indicators can greatly aid financial planning for companies and policy makers alike.
By jointly modelling financial time series as multi-output Gaussian processes (MOGPs) with rich kernel functions [12, 27, 32], we aim to discover features that are inherent to the data such as quarterly or yearly patterns or business cycles. In particular, by parametrising the positive/negative correlations between two or more time-series, the interdependence among multiple financial indicators can be trained so that a variation in one time-series can predict the movement in another time-series. Given sufficient data and the availability of recurring effects (i.e. patterns), we expect to construct sound predictions of one time-series channel given the others.
In the rest of the paper we will first review classical and multi-output frameworks for Gaussian processes regression. Then in Section 3 we specify the multi-output spectral mixture kernel (MOSM) and related models. In Section 4 we show the application of MOGPs to two finance experiments, namely the gold, oil, NASDAQ, and USD index dataset and the currency exchanges of ten countries with respect to the USD. Finally, we discuss the results in Section 5.
2 Background: Multi-output GP
A Gaussian process (GP)  defines a non-parametric prior distribution over functions , where is the mean function (usually assumed to be zero) and
is the covariance (kernel) function. GPs can be used as a generative model for functions within Bayesian inference, therefore, data can be used to compute a predictive posterior distribution of unseen values of. The kernel function
dictates the behaviour of the modelled function, such as its periodicity and smoothness, and encodes knowledge of the time series of interest via its functional form and parameters. The choice of kernel is central in the GP framework with the radial basis function thede facto choice due to its smoothness properties . However, other more expressive yet more complex kernels have recently been considered that model, for example, periodicities [18, 36, 35, 20].
Although the GP’s literature both on methods and applications is broad, most of the works address the single-output scenario when only one time series is considered, that is, a function . The extension of the GP approach to multiple signals allows for jointly modelling output channels as coupled GPs, where the covariance function is a function , with the number of channels, defined element-wise as between channels and . The key feature of a multi-output GP is to model covariation across channels in addition to the standard temporal covariation handled by single-output GPs. One of the main challenges of MOGP models is designing flexible covariance functions while requiring the covariance function to be positive definite for all values of 
. Additionally, since an MOGP model would require parametrisation of a larger number of correlations, its increased amount of hyperparameters results in an increase in local minima and thus makes training more difficult.
A recent approach to design general and meaningful cross-channel covariances for MOGPs is to construct them in the spectral domain, that is, to parametrise their (cross) power spectral densities. An alternative is to consider a mixture of Gaussians as was originally proposed by Wilson 2013  for the single channel case. Developments in the field of multi-output and spectral mixture kernels have led to a range of new covariance functions such as the SM-LMC [11, 37], CSM , and the MOSM . The SM-LMC kernel introduces multi-output interpretations by linearly combining the channels and thus learning cross-channel covariances. These covariance are, however, restricted to have similar behaviour across the channels. A more flexible kernel is the CSM kernel which additionally models the phase differences across channels, allowing for non-symmetric covariance functions but still requiring strong correlation between channels. The MOSM kernel adds even more flexibility by introducing a time delay factor across channels that allows for delayed influences across channels to be modelled effectively.
3 Model specification
Let us establish the required notation. We define (single-output) GPs operating on input as
where the mean function and covariance function between inputs and  are respectively defined by
We say that a kernel is stationary if it can be expressed as
where for convenience of notation, we denote the input lag and will usually refer to stationary kernels simply as .
Using these Fourier pairs, we can specify (or parametrise) a kernel in the frequency space by only requiring it to be positive since Bochner’s theorem guarantees that the corresponding covariance function is always positive-definite. We can subsequently use a mixture of Gaussian radial basis functions (RBF) in frequency space with positive weights to yield the spectral mixture kernel 
with , , and . The kernel defined in Eq. 1 we refer to as the spectral mixture independent Gaussian process kernel (SM-IGP), as we will use it to model the outputs independently.
In order to extend the spectral mixture kernel into a multi-output kernel, we use Cramér’s theorem , which is the multivariate extension of Bochner’s theorem, to obtain the multi-output spectral mixture kernel (MOSM) as proposed by . The MOSM kernel between channels and at input lag is defined as
with the cross-spectral parameters defined by the magnitude, the mean, the covariance, the delay, and the phase. The channels are defined by indices and . For a detailed derivation of the MOSM kernel see .
|Model||Parametric relation with MOSM|
|SM-LMC [11, 37]|
MOSM can be understood as a more general kernel when compared to the SM-IGP, SM-LMC, and CSM kernels which can be obtained by constraining some of the parameters in Eq. (2). This is illustrated in Table 1, where both the mean and covariance become channel independent, where either the delay , phase , or both are set to zero, and where the magnitude parameter is scaled. Some MOGP kernels explicitly state an for every , which uses a weighted average of covariance functions per . In this paper we use .
4 MOGP for financial time series
This section implements and validates the above mentioned kernels on multi-channel financial time series. The experiments were performed using the multi-output Gaussian process toolkit111https://github.com/GAMES-UChile/mogptk (MOGPTK) , which contains a number of MOGP kernels and pre-training procedures. MOGPTK builds on GPFlow 
, which is in turn is backed by TensorFlow
and thus allows for automatic differentiation and the use of GPUs for computations (we used an 8GB Nvidia GeForce GTX 1080). All experiments were performed by 5 trials per trained model. Parameter initialisation for all MOGP kernels was achieved by estimating the power spectral density (PSD) of each channel using Bayesian non-parametric spectral estimation (BNSE) and obtaining its peaks as the means of the spectral representation. The optimisation relied on L-BFGS-B with a maximum of iterations.
The experiment aims are as follows: the first experiment models the correlation among gold, oil, NASDAQ, and the USD. The second experiment correlates ten currency exchanges with the USD.
4.1 Gold, Oil, NASDAG, and USD index
We considered the co-movement and interdependence among gold, oil, stock markets, and the USD. It is known that gold can be used to offset losses in other assets such as declining currencies, especially against USD depreciation , and therefore are expected to correlate in some fashion. On the other hand, oil and the value of the USD are linked as the price of a barrel of oil is globally expressed in USD. The value of the USD has shown to behave (albeit weakly) correlated with oil, especially after the global financial crisis of 2008 [4, 24]. Additionally, any fluctuation in the price of crude oil will affect economies and supply chains that are energy dependent [13, 10]. We represent these market effects through the NASDAQ Composite index as it covers a broad number of (mostly information technology) companies. Using these four financial series, which have been observed to influence one another, we can model the global underlying economic tendencies that affect these commodities and indicators.
We considered a dataset comprising series of gold and oil prices, the NASDAQ and the USD index (henceforth referred to as GONU) [15, 5, 19, 33], between January 2017 and December 2018 with a weekly granularity. We detrended and log-transformed the data signals and removed regions in each channel to mimic missing data. For oil we removed observations between 2018-10-05 and 2018-12-31 as well as removing of all observations randomly. For gold we removed observations between 2018-07-01 and 2018-10-01. Finally, for the gold, NASDAQ and USD channels we removed randomly. Overall, our experiment consisted of training points and
test points resulting in roughly five minutes of training time for the MOSM. We also set a Gaussian prior on the covariance magnitudes with the standard deviation of the hyperparameter set to the maximum value of each channel.
Fig. 1 shows a fit of the MOSM kernel. The MOSM model is able to encapsulate the structure of the channels with almost all data within the confidence interval of 95%, even for parts that have missing data but with a deviating imputation for NASDAQ. The related cross-correlation matrix is plotted in Fig. 2. Notice that the empirical cross-correlation matrix is showing correlation between gold, oil, and NASDAQ, with especially a strong dependency between oil and NASDAQ thus confirming our hypothesis. The hedging quality of gold can also be seen (albeit faintly) with the negative cross-correlation between gold and the USD index.
Our trained MOSM kernel is recovering the more significant dependencies such as the oil and gold correlation and the oil and NASDAQ correlation. In Fig. 1 these curves follow similar behaviour, especially for oil and the NASDAQ this is apparent. The USD is found to correlate more negatively with the other channels, as well as gold and the NASDAQ. It should be noted that the MOSM finds correlations by minimising the negative log-likelihood (NLL), where if three channels correlate, the model could find correlation between the first and second, and between the second and third channels, but not necessarily between the first and third, explaining the discrepancies between kernel and empirical cross-correlations. Furthermore, the MOSM only uses part of the data, and depending on the number of parameters and training it may not find all correlations. Table 2 (left) shows error values of the test set comparing different models against the MOSM.
|Gold, Oil, NASDAQ, USD index||Currency exchange rates|
|Model||nMAE ()||nRMSE ()||nMAE ()||nRMSE ()|
|SM-LMC [11, 37]|
4.2 Exchange Rates
Much like the GONU data set, the movement of exchange rates among large currencies is due to international market changes and national macro economic factors. Exchange rates are heavily influenced by inflation and interest rates, trade and economic performance. We chose ten exchange rates against the USD, namely the AUD, CAD, CHF, EUR, GBP, HKD, JPY, KRW, MXN, and NZD using a daily granularity with data ranging from 2017-01-01 to 2017-12-31. For all the channels, % of the data points have been removed randomly. All channels have the last 40 days removed except for EUR, JPY, and AUD. The EUR, JPY, and AUD thus act as reference channels to predict the other currency exchanges. For some channels an additional range has been removed to simulate missing data. Overall, we used training points and test points, where each trial took roughly minutes per trial for the MOSM.
Fig. 3 shows the currency exchange data set with a fit of the MOSM kernel. We see that the predicted posterior means at the removed tails follow the data quite closely. A possible reason why one channel can recover missing data better while other channels have difficulty doing so, lies in the fact that a strongly correlating channel is needed to impute the data. Notice that since the MOSM is a covariance-driven model, the EUR, JPY, and AUD channels can be used to reconstruct the other channels.
Fig. 4 shows how much the channels correlate among each other under the trained MOSM kernel. Among the EUR, GBP, and CHF channels we see a strong positive correlation which is highly likely as the EU is the major trading partner for the GBP and CHF. Furthermore, we see that the HKD correlates negatively with the EUR, JPY, and AUD as the AUD and JPY correlate positively. The correlation between AUD and NZD is hardly surprising as these markets usually move quite similarly due to the geographic constraints of New Zealand.
We have presented and implemented the MOGP approach through analysis of real-world financial time series. In particular, we have compared the performance of five trials of the MOSM, CSM, SM-IGP, and SM-LMC multi-output GP kernels, where we find that we are able to use the added flexibility of the MOSM to our advantage. A summary of kernel performance with respect to the normalised mean absolute error (nMAE) and normalised root mean square error (nRMSE) in the test points is given in Table 2, where we observe a general decrease in error for models that are more flexible. The MOSM shows lower error values although it is also the most difficult model to train due the number of extra parameters. With an appropriate choice of initialisation parameters it is, however, able to find better fits between the channels than other models in terms of nMAE and nRMSE.
The challenge of fitting volatile financial data is the fact that unpredictable pattern deviations occur without precedent. While for example the GARCH model allows for modelling the heteroskedastic nature of financial data (i.e. the varying magnitude of volatility over time), the spectral kernels do not as they are by definition stationary which is also one of their drawbacks. While we can extract some of the interdependencies between the channels, these cross-correlations are hard to train and prone to fluctuations between trials.
Future work could include exploring financial data sets with non-Gaussian likelihoods by warping GPs as proposed by [26, 29], or by using Student’s t-distribution likelihoods to better identify heteroskedasticity as used by GARCH and other financial models. Furthermore, better initialisation of hyperparameters and training can also greatly improve the results of the models which should remain an active area of research. However, the possibility of MOGPs to explore relations across channels could become a valuable asset in financial modelling and market dependency assessment.
-  (2016) TensorFlow: a system for large-scale machine learning. In 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16), pp. 265–283. External Links: Cited by: §4.
-  (1959) Lectures on fourier integrals. (am-42). Princeton University Press. External Links: Cited by: §3.
-  (1986) Generalized autoregressive conditional heteroskedasticity. Journal of econometrics 31 (3), pp. 307–327. Cited by: §1.
-  (2008) Crude oil prices and the USD/EUR exchange rate. Monetary Policy & the Economy, pp. 102–121. Cited by: §4.1.
-  Brent oil price. Note: https://www.eia.gov/dnav/pet/hist/RBRTEd.htmAccessed: 2019-09-01 Cited by: §4.1.
-  (1940) On the theory of stationary random processes. Annals of Mathematics 41 (1), pp. 215–230. External Links: Cited by: §3.
-  (2017) GPflow: a Gaussian process library using TensorFlow. Journal of Machine Learning Research 18 (40), pp. 1–6. Cited by: §4.
-  (2020) MOGPTK: The Multi-Output Gaussian Process Toolkit. arXiv e-prints, pp. arXiv:2002.03471. External Links: Cited by: §4.
-  (1982) . Econometrica 50 (4), pp. 987–1007. Cited by: §1.
-  (2011) Dynamic correlation between stock market and oil prices: The case of oil-importing and oil-exporting countries. International Review of Financial Analysis 20 (3), pp. 152–164 (en). External Links: Cited by: §4.1.
-  (1997) Geostatistics for natural resources evaluation. Oxford University Press. Cited by: §2, Table 1, Table 2.
-  (2015) Financial time series volatility analysis using gaussian process state-space models. In IEEE Global Conference on Signal and Information Processing, pp. 358–362. Cited by: §1, §1.
-  (2009) The impact of oil price shocks on the US stock market. International Economic Review 50 (4), pp. 1267–1287. Cited by: §4.1.
-  (1992) ARCH modeling in finance: a review of the theory and empirical evidence. Journal of Econometrics 52 (1), pp. 5 – 59. External Links: Cited by: §1.
-  LMBA gold price. Note: https://fred.stlouisfed.org/series/GOLDAMGBD228NLBMAccessed: 2019-09-01 Cited by: §4.1.
-  (1998) Introduction to Gaussian processes. NATO ASI Series F Computer and Systems Sciences 168, pp. 133–166. Cited by: §2.
-  (2016) Financial signal processing and machine learning. John Wiley & Sons, Ltd. Cited by: §1.
-  (2011) Multi-kernel Gaussian processes. In 22nd International Joint Conference on Artificial Intelligence, Vol. 2, pp. 1408–1413. Cited by: §2.
-  NASDAQ price. Note: https://finance.yahoo.com/quote/%5EIXIC/history?p=%5EIXICAccessed: 2019-09-01 Cited by: §4.1.
-  (2017) Spectral mixture kernels for multi-output Gaussian processes. Advances in Neural Information Processing Systems 30, pp. 6681–6690. Cited by: §2, §2, §3, Table 2.
-  (1988) The excess co-movement of commodity prices. Economic Journal 100 (403), pp. 1173–1189. Cited by: §1.
-  (2006) Gaussian Processes for Machine Learning. MIT Press. Cited by: §2, §2, §3, §3.
-  (2014) Can gold hedge and preserve value when the US dollar depreciates?. Economic Modelling 39, pp. 168–173 (en). External Links: Cited by: §4.1.
-  (2012) Modelling oil price and exchange rate co-movements. Journal of Policy Modeling 34 (3), pp. 419–440. External Links: Cited by: §4.1.
-  (2001) A measure of comovement for economic variables: theory and empirics. The Review of Economics and Statistics 83 (2), pp. 232–241. External Links: Cited by: §1.
-  (2019) Compositionally-warped Gaussian processes. Neural Networks 118, pp. 235 – 246. Cited by: §5.
-  (2017) A novel approach to forecasting financial volatility with Gaussian process envelopes. arXiv e-prints, pp. arXiv:1705.00891. External Links: Cited by: §1, §1.
-  (2018) Machine learning for quantitative finance: fast derivative pricing, hedging and fitting. Quantitative Finance 18 (10), pp. 1635–1643. External Links: Cited by: §1.
-  (2004) Warped Gaussian processes. In Advances in Neural Information Processing Systems 16, pp. 337–344. Cited by: §5.
-  (2012) Interpolation of spatial data: some theory for kriging. Springer Science & Business Media. Cited by: §3.
-  (2018) Bayesian nonparametric spectral estimation. Advances in Neural Information Processing Systems 31, pp. 10127–10137. Cited by: §4.
-  (2019) Discovering latent covariance structures for multiple time series. Proceedings of the 36th International Conference on Machine Learning 97, pp. 6285–6294. Cited by: §1, §1.
-  Trade weighted USD-index against the currencies of a broad group of trading partners from January 1995 till August 2019. Note: https://fred.stlouisfed.org/series/TWEXBAccessed: 2019-09-01 Cited by: §4.1.
-  (2012) Machine learning in financial crisis prediction: a survey. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42 (4), pp. 421–436. Cited by: §1.
-  (2016) Gaussian process kernels for cross-spectrum analysis in electrophysiological time series. Ph.D. Thesis, Duke University. Cited by: §2, §2, Table 1, Table 2.
-  (2013) Gaussian process kernels for pattern discovery and extrapolation. In Proceedings of the 30th International Conference on Machine Learning, pp. 1067–1075. Cited by: §2, §2, Table 1, §3, Table 2.
-  (2014) Covariance kernels for fast automatic pattern discovery and extrapolation with Gaussian processes. Ph.D. Thesis, University of Cambridge. Cited by: §2, Table 1, Table 2.
-  (2005) Comovement. Journal of Financial Economics 75 (2), pp. 283 – 317. External Links: Cited by: §1.
-  (2019) Financial applications of Gaussian processes and Bayesian optimization. SSRN Electronic Journal, pp. . External Links: Cited by: §1.