## 1 Introduction

The Gaussian process (GP) is a Bayesian nonparametric model for time series, that has had a significant impact in the machine learning community following the seminal publication of

(Rasmussen and Williams, 2006). GPs are designed through parametrizing a covariance kernel, meaning that constructing expressive kernels allows for an improved representation of complex signals. Recent advances extend the GP concept to multiple series (or channels), where both auto-correlations and cross-correlations among channels are designed jointly; we refer to these models as multi-output GP (MOGP) models. A key attribute of MOGPs is that appropriate cross-correlations allow for improved data-imputation and prediction tasks when the channels have missing data. Popular MOGP models include: i) the Linear Model of Coregionalization (LMC) (Goovaerts, 1997), ii) the Cross-Spectral Mixture (CSM) (Ulrich et al., 2015), iii) the Convolutional Model (CONV) (Alvarez and Lawrence, 2009), and iv) the Multi-Output Spectral Mixture (MOSM) (Parra and Tobar, 2017). Training MOGPs is challenging due to the large number of parameters required to model all the cross-correlations, and the fact that most of MOGP models are parametrized in the spectral domain, thus being prone to local minima. Therefore, a unified framework that implements these MOGPs is required both by the the GP research community as well as by those interested in practical applications for multi-channel data.The multi-output Gaussian process toolkit (MOGPTK) aims to address the need for an MOGP computational toolkit in the form of a Python package that implements the mentioned MOGP kernels and provides a natural way to train and use them. MOGPTK is built upon GPflow (Matthews and others, 2017), an extensive GP framework with a wide variety of implemented kernels, likelihoods and training strategies. GPflow is in turn built upon TensorFlow (Abadi and others, 2016)

, a framework that allows for the construction of computational graphs of tensors and operations which can be calculated on either CPUs or GPUs. Needless to say, GPU-training is much desired due to the ability of graphics cards to perform linear operations in parallel.

## 2 Existing MOGP libraries and scope of MOGPTK

Previous toolkits for MOGPs include GPmat (Lawrence and others, 2015) (University of Sheffield) through its module called multigp, a MATLAB library that includes sparse approximations and implements multi-output support through convolution processes (Alvarez and Lawrence, 2009). Another library is GPy (GPy, since 2012)

(University of Sheffield), a Python package that implements the Intrinsic Model of Coregionalization (IMC) and LMC kernels. More recently, GPyTorch (Cornell University) is a Python library for general GP modelling that uses PyTorch to facilitate faster training on GPUs

(Gardner and others, 2018). GPyTorch implements the LMC kernel and the multi-task kernel by Williams (2008). Lastly, GPflow, the framework upon which our work is based, also has multi-output support using the LMC kernel (Matthews and others, 2017).Neither of the above libraries implement the—by now standard—CSM, CONV or MOSM models described in Section 1. Critically, not all libraries even allow for different numbers of data points per channel. Furthermore, existing libraries give little emphasis on improving training through parameter initialization and they usually lack parameter interpretation. MOGPTK, conversely, facilitates the whole process of implementing an MOGP, from data loading and parameter initialization to model training and interpretation. Our toolkit also implements all main MOGP models mentioned in Section 1.

## 3 Functionality

The main pillars of MOGPTK are the included MOGP models, data handling, parameter initialization and parameter interpretation, each discussed below.

### 3.1 Models

MOGPTK considers a base MOGP kernel from which specific kernels are derived. The base kernel provides the functionality to split the input data into multiple channels and process them by sub-kernels. While single-channel kernels implemented in GPflow have input data of shape (with the total number of data points and the number of input dimensions), the MOGPTK base kernel has input data of shape , where the first column contains integers denoting the channel index to which the remaining columns correspond. Then, using the channel indices the base kernel splits the data into its different channels for the sub-kernels to operate. This allows us to manage different amounts of data points per channel, therefore, the total amount of data points can be express as , with the number of channels and the number of data points in channel .

### 3.2 Data handling

MOGPTK features general-purpose classes to perform common data-analysis operations effortlessly. Data can be loaded from various sources (e.g., CSV files, Pandas DataFrames, or generated using Python functions) and formatted if necessary. For instance, data containing date and/or time values can be automatically converted to structured (numerical) representations for compatibility with the rest of the toolkit. Data can also be pre-processed using included transformations such as detrending or logarithm among others, which can be applied in compositional manner in order for the models to be trained effectively. After training, the transformation can be reverted to the original domain in the same vein of Rios and Tobar (2019). Additionally, MOGPTK allows for removing data ranges to simulate missing data or sensor failure and the data can be easily plotted in time or spectral domain.

### 3.3 Parameter initialization

Training MOGPs can be challenging due to their large number of hyperparameters and highly complex objective function. In this sense, MOGPTK features two methods for setting appropriate initial conditions for the hyperparameters. The first one is based on training independent spectral mixture kernels

(Wilson and Adams, 2013)to individual channels and using their spectral Gaussian means and variances to compute the initial parameter values for the multi-output kernels. The second method utilizes the Bayesian Nonparametric Spectral Estimation (BNSE)

(Tobar, 2018), or the Lomb-Scargle method, to identify hyperparametes from the channels spectral content. Both methods are single-output and thus are trained independently on each channel, therefore, subsequent (maximum likelihood) model training can focus on training the inter-channel cross-correlations.### 3.4 Parameter interpretation

Besides training and prediction, MOGPTK also provides interpretation of hyperparameter values via visualization techniques. MOGPTK shows the correlations between channels for different kernels, in the particular case of spectral kernels (e.g., SM, MOSM, CSM, SM-LMC), this reveals the cross-spectral coupling between channels. For instance, when comparing stock values in financial applications we can usually observe correlating quarterly patterns due to the anticipated quarterly reports by stock investors as shown in de Wolff et al. (2020). Additionally, retail markets may also correlate monthly due to salaries being paid at that frequency and thus producing a spike in sales. The interpretation of such relationships is relevant for practical applications that require deeper understanding and analysis beyond mere imputation and extrapolation of data.

## 4 Example

We next provide a short example of how MOGPTK operates on an air-quality time series of four channels. The dataset Vito et al. (2008) contains hourly-average measurements of five metal oxide chemical sensors embedded in an *Air Quality Chemical Multisensor*. The data were collected in a polluted Italian city between March 2004 and February 2005. We considered the MOSM kernel, with three spectral components per channel, initialized using BNSE and optimized using BFGS.

Listing 1 shows the corresponding code. We first loaded the (pre-processed) dataset into MOGPTK creating a four-channel model, then for each channel we removed a range of data to simulate sensor failure and additionally removed 30% of the data points randomly. For improved training results, the internal data representation for all channels were linearly detrended and normalized to have zero mean and unit variance using the transformations provided in the toolkit. Next, we set up the MOSM kernel and initialized the parameters using BNSE. Fig. 1 shows the results of solely initialization (not training), where only the main sinusoidal components can be identified. Lastly, the MOSM model was trained using the L-BFGS-B optimizer. With its three components per channel, MOSM featured 65 hyperparameter to train using 439 training points. Training took less than two minutes on an average CPU and was faster when utilizing a GPU. Fig. 2

shows the results of the trained MOSM kernel, where the model was able to accurately interpolate and extrapolate beyond the observation range.

## 5 Availability and documentation

MOGPTK is released under the MIT license and thus it can be used in both open-source and commercial applications. The source code is publicly available on GitHub at

https://github.com/GAMES-UChile/mogptk/, where contributions are encouraged and issues can be raised. The repository contains tutorials and examples with real-word data in the form of Jupyter notebooks. Additionally, the API documentation describing all methods can be accessed at https://games-uchile.github.io/mogptk. MOGPTK requires atleast Python 3.6 and TensorFlow 2, and can be installed executing pip install mogptk.## Acknowledgements

We are thankful to the Center for Mathematical Modeling (Conicyt AFB #170001), without its invaluable support MOGPTK would only be an idea. We also thank Fondecyt-Iniciación #11171165. We would like to thank Cristóbal Silva, Gabriel Parra, Mario Garrido, and Victor Caro for their feedback on earlier versions of MOGPTK. This document has been formatted using Elsevier’s *elsarticle.cls* LaTeX document class.

## References

- TensorFlow: a system for large-scale machine learning. In Proc. of the 12th USENIX Symposium on Operating Systems Design and Implementation, pp. 265–283. Cited by: §1.
- Sparse convolved Gaussian processes for multi-output regression. In NeurIPS 21, pp. 57–64. Cited by: §1, §2.
- Gaussian process imputation of multiple financial time series. In IEEE ICASSP (to appear), Cited by: §3.4.
- GPyTorch: blackbox matrix-matrix Gaussian process inference with GPU acceleration. In NeurIPS 31, pp. 7576–7586. Cited by: §2.
- Geostatistics for natural resources evaluation. Oxford University Press. Cited by: §1.
- GPy: a Gaussian process framework in Python. Note: http://github.com/SheffieldML/GPy Cited by: §2.
- GPmat. Note: https://github.com/SheffieldML/GPmat Cited by: §2.
- GPflow: A Gaussian process library using TensorFlow. Journal of Machine Learning Research 18 (40), pp. 1–6. Cited by: §1, §2.
- Spectral mixture kernels for multi-output Gaussian processes. In NeurIPS 30, pp. 6681–6690. Cited by: §1.
- Gaussian Processes for Machine Learning. MIT Press. Cited by: §1.
- Compositionally-warped Gaussian processes. Neural Networks 118, pp. 235 – 246. Cited by: §3.2.
- Bayesian nonparametric spectral estimation. NeurIPS 31, pp. 10127–10137. Cited by: §3.3.
- GP kernels for cross-spectrum analysis. In NeurIPS 28, pp. 1999–2007. Cited by: §1.
- On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario. Sensors and Actuators B: Chemical 129 (2), pp. 750 – 757. External Links: ISSN 0925-4005 Cited by: §4.
- Multi-task Gaussian process prediction. In NeurIPS 20, pp. 153–160. Cited by: §2.
- Gaussian process kernels for pattern discovery and extrapolation. In ICML, pp. 1067–1075. Cited by: §3.3.