A Fully Natural Gradient Scheme for Improving Inference of the Heterogeneous Multi-Output Gaussian Process Model
A recent novel extension of multi-output Gaussian processes handles heterogeneous outputs assuming that each output has its own likelihood function. It uses a vector-valued Gaussian process prior to jointly model all likelihoods' parameters as latent functions drawn from a Gaussian process with a linear model of coregionalisation covariance. By means of an inducing points framework, the model is able to obtain tractable variational bounds amenable to stochastic variational inference. Nonetheless, the strong conditioning between the variational parameters and the hyper-parameters burdens the adaptive gradient optimisation methods used in the original approach. To overcome this issue we borrow ideas from variational optimisation introducing an exploratory distribution over the hyper-parameters, allowing inference together with the variational parameters through a fully natural gradient optimisation scheme. We show that our optimisation scheme can achieve better local optima solution with higher test performance rates than adaptive gradient methods or an hybrid strategy that partially use natural gradients in cooperation with the Adam method. We compare the performance of the different methods over toy and real databases.
READ FULL TEXT