DeepAI
Log In Sign Up

Optimally adaptive Bayesian spectral density estimation

This paper studies spectral density estimates obtained assuming a Gaussian process prior, with various stationary and non-stationary covariance structures, modelling the log of the unknown power spectrum. We unify previously disparate techniques from machine learning and statistics, applying various covariance functions to spectral density estimation, and investigate their performance and properties. We show that all covariance functions perform comparatively well, with the smoothing spline model in the existing AdaptSPEC technique performing slightly worse. Subsequently, we propose an improvement on AdaptSPEC based on an optimisation of the number of eigenvectors used. We show this improves on every existing method in the case of stationary time series, and describe an application to non-stationary time series. We introduce new measures of accuracy for the spectral density estimate, inspired from the physical sciences. Finally, we validate our models in an extensive simulation study and with real data, analysing autoregressive processes with known spectra, and sunspot and airline passenger data respectively.

READ FULL TEXT VIEW PDF

page 1

page 2

page 3

page 4

03/01/2021

Posterior consistency for the spectral density of non-Gaussian stationary time series

Various nonparametric approaches for Bayesian spectral density estimatio...
02/09/2019

Bayesian Nonparametric Adaptive Spectral Density Estimation for Financial Time Series

Discrimination between non-stationarity and long-range dependency is a d...
02/25/2018

Evolutionary Spectra Based on the Multitaper Method with Application to Stationarity Test

In this work, we propose a new inference procedure for understanding non...
04/16/2019

Why Are the ARIMA and SARIMA not Sufficient

The autoregressive moving average (ARMA) model and its variants like aut...
06/17/2021

Maximum Entropy Spectral Analysis: a case study

The Maximum Entropy Spectral Analysis (MESA) method, developed by Burg, ...
11/21/2020

Monotonicity of the Trace-Inverse of Covariance Submatrices and Two-Sided Prediction

It is common to assess the "memory strength" of a stationary process loo...
06/22/2019

Multitaper Analysis of Evolutionary Spectra from Multivariate Spiking Observations

Extracting the spectral representations of the neural processes that und...

Abstract

This paper studies spectral density estimates obtained assuming a Gaussian process prior, with various stationary and non-stationary covariance structures, modelling the log of the unknown power spectrum. We unify previously disparate techniques from machine learning and statistics, applying various covariance functions to spectral density estimation, and investigate their performance and properties. We show that all covariance functions perform comparatively well, with the smoothing spline model in the existing AdaptSPEC technique performing slightly worse. Subsequently, we propose an improvement on AdaptSPEC based on an optimisation of the number of eigenvectors used. We show this improves on every existing method in the case of stationary time series, and describe an application to non-stationary time series. We introduce new measures of accuracy for the spectral density estimate, inspired from the physical sciences. Finally, we validate our models in an extensive simulation study and with real data, analysing autoregressive processes with known spectra, and sunspot and airline passenger data respectively.

1 Introduction and background

Spectral density estimation is the key technique for understanding the autocovariance structure of a stationary time series. Parametric techniques, such as MA and ARMA models Brockwell1991, have been used for many years, but tend to perform badly in the absence of appropriate assumptions concerning the underlying data. Even non-parametric methods, which do not assume the data has any particular structure, may introduce bias; for example, Welch’s method Welch1967 assumes a number of segments to divide the data. In this paper, we analyse non-parametric methods for spectral density estimation in a Bayesian framework, while trying to investigate and ameliorate their limitations as well. We begin with a discussion of the accomplishments and drawbacks of the literature from the statistics and machine learning communities, which surprisingly is relatively disjoint.

Within the statistics literature, smoothing splines are one of the most important techniques for spectral density estimation. Wahba Wahba1980 provided an initial framework for using splines to model data, and numerous inspired work and adaptations have been developed, such as B-splines Eilers1996, P-splines Wood2017 and others Wahba1990; Gu2002. Smoothing splines have been used specifically for spectral analysis in a range of implementations Guo2003; Cogburn1974; Ganngopadhyay1999. Within the literature of non-parametric Bayesian approaches to spectral density estimation, Carter and Kohn Kohn1997 placed a prior on the log spectrum with an integrated Wiener process, while Liseo Liseo2001 similarly modelled the spectrum with Brownian motion. More recently, there has been focus on the evaluation of spectra for non-stationary processes such as AdaptSPEC Rosen2012; Rosen2017. Predicated on work by Dahlhaus Dahlhaus1997 and Adak Adak1998, AdaptSPEC uses a smoothing spline covariance structure, divides a time series into locally stationary segments, and uses an eigendecomposition with 10 eigenvectors. Although there is literature in the statistics community using Gaussian Process (GP) priors as flexible prior distributions over functions Rosen2012, there has been a notable absence in the use of complex covariance structures or forms of kernel learning.

By contrast, GPs with complex covariance functions have been well studied in the machine learning community. Rasmussen and Williams Rasmussen2006 provide an overview of a variety of frameworks for modelling complex phenomena with combinations of covariance functions. Paciorek Paciorek2004 performs GP regression and investigates smoothness over a class of non-stationary covariance functions, while Plagemann Plagemann2008 utilizes such functions within a Bayesian regression framework. More recently, non-parametric regression methods with automatic kernel learning have produced exciting results, including a GP regression framework for kernel learning Duvenaud2013, and other frameworks Lu2016; Wilson2013. However, Bayesian regression techniques focusing on spectral density estimation have attracted less interest from the machine learning community than the statistics community.

In this paper, we aim to bridge the two literatures outlined above, investigate the performance of machine learning inspired covariance functions in spectral density estimation, and improve on existing methods.

Spectral analysis for time series using GPs, other than smoothing splines, in a Whittle likelihood framework Whittle1954; Whittle1957

, is absent in both the statistics and machine learning literature. The critical aim in inference via spectral density estimation is correctly estimating the amplitude of peaks at appropriate frequencies. Bayesian methods using priors with a global degree of smoothness may fail to find multiple peaks. The experiments presented in this paper demonstrate that often, and more commonly in real data scenarios, there is a need for more complex covariance structures such as those with non-stationary features. On the other hand, our experiments illustrate that complex covariance structures may take significantly longer to converge in our sampling scheme, and may overfit even complex spectra when used as the sole covariance function. Combinations of covariance functions with the flexibility to make abrupt changes, such as the neural network, combined with covariance functions known for their smoothness, such as the squared exponential, may generalise best.

This paper proposes a model for learning the log spectrum for autoregressive models and real world processes, where we compare the results and candidate spectral densities for stationary, non-stationary and combinations of covariance functions to estimate spectral patterns. The posterior predictive distribution of the data, and the posterior distribution of the hyperparameters, are estimated using an optimisation-initialised

adaptive Metropolis-Hastings algorithm.

We compare our results with the spectral density estimation of AdaptSPEC, in which a Wiener process prior was placed over the unknown log spectrum. We compare different covariance functions, and investigate the smoothness provided by each, with the smoothing spline covariance structure in AdaptSPEC as our point of comparison. Our analysis indicates that the smoothing spline covariance structure alone may not provide the smoothness previously thought, and that smoothness, in AdaptSPEC at least, is induced via modelling procedures, specifically an eigendecomposition.

In addition, we demonstrate an improvement over AdaptSPEC if the number of eigenvectors is treated as a variable and optimised. In this case, its performance generalises just as well as other covariance functions in the machine learning literature. Our findings suggest that covariance functions that combine stationary and non-stationary features may generalise best across various spectral densities, and that our optimisation of eigenvalues performs even better with more complex spectra. We validate our models in a simulation study where we compare GP spectral density estimates against theoretical spectra of respective autoregressive (AR) processes and apply the proposed class of GP spectral density models to two real data sets. While introducing original validation metrics and using existing metrics, we demonstrate improved metrics and more visually plausible spectral density estimates in our modelling.

There are several key contributions of this paper:

  1. We bridge the statistics and machine learning literatures by applying new covariance structures in the Whittle likelihood framework for spectral density estimation. We compare and investigate the source of smoothness, provided by either the covariance functions or the modelling procedures, such as eidencomposition. We compare the performance of these covariance structures, and discuss the scenarios in which individual covariance functions or mixtures of covariance functions perform best. Our real data section is crucial here, as we believe there is an over-reliance in the statistics literature on validation against AR processes. Moreover, rather than merely optimising the hyperparameters of various covariance functions, as is often done in the machine learning literature, we quantify the uncertainty over the choice of hyperparameters and the spectrum within an Markov chain Monte Carlo (MCMC) scheme.

  2. The number of eigenvectors in the AdaptSPEC algorithm is hard-coded, not estimated from the data. We demonstrate improvement in all spectral density estimates by the AdaptSPEC model when the Whittle likelihood is optimised with respect to the number of eigenvectors. In fact, we demonstrate that when the appropriate number of eigenvectors is used, the AdaptSPEC model generalises as well as sophisticated covariance structures in the machine learning literature. We investigate this improvement in both the stationary and non-stationary cases.

  3. There is little consistency in the validation metrics used in spectral density estimation experiments in the statistics community. We propose new metrics to validate spectral density estimates, which better reflect factors that are important when making inference about a spectrum. We include an extensive simulation study in our results. For our analysis of sunspots and airline passenger data, we perform an original transformation to make the latter stationary, and analyse the resulting time series.

This paper is organized as follows: Section 2 describes the proposed model, priors and framework for estimation and inference. We also investigate the AdaptSPEC model and how the algorithm can be significantly improved by estimating the appropriate number of eigenvectors from the data. Sections 3 and 4 display the results from our experiments on simulated and real data respectively. We conclude in Section 5.

2 Mathematical model

2.1 Spectral density estimation

Let be a discrete real valued time series.

Definition 2.1.

is stationary if

  1. Each is integrable with a finite common mean for each . By subtracting the mean, we may assume henceforth .

  2. The autocovariance is a function only of , which we denote .

Spectral analysis allows us to study the second-order properties of a stationary time series expressed in the autocovariance structure. The power spectral density function (PSD) is defined as follows:

Most important are the Fourier frequencies . Define the

discrete Fourier transform

(DFT) of the time series:

Whittle initially showed that under certain conditions, the DFTs have a complex normal (CN) distribution Whittle1954; Whittle1957:

For each Fourier frequency component, a noisy but unbiased estimate of the PSD

is the periodogram Choudhuri2004, defined by . By symmetry, the periodogram contains effective observations, corresponding . Rosen et al. Rosen2009; Rosen2012 outline a signal plus noise representation of the periodogram:

(1)

Indeed, for a wide class of theoretical models, Theorem 10.3.2 of Brockwell1991

proves that the vector of quotients

converges in distribution to a vector of independent random variables.

For our applications, we assume the quotient

is approximately exponentially distributed with mean

. This is a simpler representation of the second moment of the distribution, reducing the problem of covariance estimation to a simpler problem of mean estimation. Note the periodogram oscillates around the true spectral density, so there is a delicate balance between inferring the spectrum of a process and excessive smoothing, leading to negligible inference.

2.2 Model and priors

Definition 2.2.

A Gaussian process with mean and covariance of some process is written . A GP is entirely determined by its mean and covariance, Rasmussen2006; Paciorek2004; Plagemann2008.

Most Bayesian methods for spectral density estimation exploit the Whittle likelihood function. Following Choudhuri2004; Rosen2012, we use the Whittle likelihood to model the periodogram within a regression framework.

Let and be the log of the periodogram and the log of the PSD respectively. The likelihood of the log periodogram given the true spectrum can be approximated by

Following Rosen2009; Rosen2012 we rewrite equation 1 as

To obtain a flexible estimate of the spectrum, we place a GP prior over the unknown function . That is, . Note this is determined by its covariance function. The literature to date has used smoothing splines Wahba1980 or other spline varieties for this covariance structure. This has proved suitable for smooth processes with mild peaks such as autoregressive processes, against which statisticians frequently validate their models. However, real data, such as the data examined in this paper, tends to exhibit more complicated spectra and display sharp peaks and proceeding harmonics, which conceivably could be difficult to estimate using stationary covariance functions. In the proceeding simulation and real data study, we use a variety of covariance functions including the squared exponential, Matern, Matern, neural network, and combinations of these such as the squared exponential + neural network.

2.3 Estimation and inference using Monte Carlo methods

We take a Bayesian approach and estimate the log of the spectral density by its posterior mean via an adaptive MCMC algorithm:

Here, is the number of post burn-in iterations in the MCMC scheme; are samples taken from the posterior distribution ;

is taken from a Gaussian distribution

centred around that maximises the log marginal likelihood; is chosen arbitrarily; and is the forecast log spectrum.

We also generate and include error bounds around the observed data periodogram. The quotient has approximate distribution Whittle1954; Whittle1957 and hence cumulative density function . Solving equations yields . Hence
With this in mind, we include uncertainty bounds , in our algorithm and final plots, which reflect the underlying noise in the data of the log periodogram random variable.

Monte Carlo algorithms have been highly prominent for estimation of hyperparameters and spectral density in a nonparametric Bayesian framework. Metropolis Metropolis1953 first proposed the Metropolis algorithm; this was generalised by Hastings in a more focused, statistical context Hastings1970. The random walk Metropolis-Hastings algorithm aims to sample from a target density

, given some candidate transition probability function

. In our context, represents the Whittle likelihood function multiplied by respective priors. The acceptance ratio is:

Our MCMC scheme calculates the acceptance ratio every 50 iterations; based on an ideal acceptance ratio Roberts2004, the step size is adjusted for each hyperparameter. The power spectrum and hyperparameters for the GP covariance function are calculated in each iteration of our sampling scheme and integrated over. Our proposed MCMC scheme is below:

1:Initialize . Throughout, for current and proposed values.
2:, where are GP kernel hyperparameters. Priors are centred around the value of the log marginal likelihood.
3: is current draw from the GP posterior modelling log PSD.
4: is perturbation (step size) for random walk proposal
5: is adaptive adjustment for step size based on trailing acceptance ratio.
6: refers to a respective kernel, where refers to training data and refers to test data following Rasmussen2006.
7: is the log periodogram (data).
8:for  do
9:      is a candidate covariance function given GP hyperparameters .
10:      =
11:      =
12:     Propose . is a Gaussian centred on .
13:      =
14:      =
15:     Propose
16:     if u < min  then
17:         
18:         
19:         , where
20:         
21:     else
22:         
23:         
24:         , where
25:               
26:     if u < min  then
27:         
28:     else
29:               
30:     if acceptance rate < 0.234 then
31:         
32:     else
33:         
34:         Determine trailing acceptance rate over the past 50 values and adjust step-size accordingly.      
35:     End for
36: = median
37: = median
Algorithm 1 Adaptive MCMC algorithm: sample log PSD, GP hyperparameters and uncertainty bounds from respective posterior distributions.

2.4 Kernel definitions

In the table below, S/NS signify stationary and non-stationary respectively.

Name Hyperparameters S/NS
Spline* NS
Squared exponential S
Matern S
Matern S
Multilayer perceptron (Neural network) tanh NS

*This is a Brownian motion type prior.

2.5 AdaptSPEC model analysis and improvement

The AdaptSPEC algorithm performs change point detection and spectral density estimation via an eigendecomposition. The log PSD is modelled by a Gaussian process with covariance matrix . The eigendecomposition of is a factorization ; the model only keeps the basis functions corresponding to the 10 largest eigenvalues for computational speed. Let be the truncated diagonal matrix; then let be the design matrix and be a vector of regression coefficients. Then has the required distribution.

The precise fit of an eigendecomposition is a function of the number of eigenvalues used, as well as the flexibility of the basis functions within any candidate covariance function. A lower number of eigenvalues produces computational savings, as well as a smoother estimate of the spectrum. Unfortunately, the chosen number of eigenvalues in AdaptSPEC is hardcoded at 10, and frequently underfits the data. In Figures 0(e), 0(f), 3(b), 3(c), 3(d), the spectral density estimation underestimates the amplitude at peaks in the data; in 5(e) and 5(f) it drastically underfits the many peaks in the complex spectrum. To produce the best fit of the data, we maximise the marginal likelihood with respect to the number of eigenvalues.

Definition 2.3.

The optimal number of eigenvalues is chosen by analysing posterior samples of the log spectrum and integrating over all possible values of regression coefficients to maximise the likelihood:

(2)

where is the log periodogram data, is the design matrix under an eigendecomposition using eigenvalues, and is a vector of coefficients determining the weight for respective eigenvectors.

3 Simulation study

In this section, we study the frequentist properties of our estimators with a detailed simulation study. We simulate three stationary autoregressive processes, AR(1), AR(2) and AR(4), with known power spectra, and validate our spectral density estimates against these known spectra. For each covariance function analysed, we run 50 simulations with our adaptive MCMC scheme. Each scheme consists of 10 000 iterations, with 5000 used for burn-in. Each process contains data points. Following the conventions of Section 2, we set and consider Fourier frequencies . Removing redundancy, this is the appropriate range of the periodogram and spectrum. As such, the range of the frequency axis in all plots is Henceforth,

denotes a white noise random variable with distribution

.

3.1 Validation metrics

As seen in Table 1, we use five metrics to compare our spectral density estimates with the known analytic log PSD . First, root mean squared error (RMSE) is an averaged norm of , namely . Mean absolute error (MAE) is an analogous averaged norm . Third, we include the Wasserstein distance between the uniform probability measures on and respectively, an effective measure between finite sets. In practical settings, spectral densities often exhibit a single dominant peak that is of the greatest importance to observing scientists, such as the base peak of a mass spectrometry reading ToddChem. Such a peak has two quantities, its value, and the frequency at which it occurs, . With this in mind, we introduce two validation metrics, and , measuring how well the spectral density estimation determines the amplitude and frequency of the greatest peak.

(a) AR(1)
(b) AR(2)
(c) AR(4)
(d) AR(1) splines
(e) AR(2) splines
(f) AR(4) splines
Figure 1: Log likelihood vs number of eigenvectors and spline estimates with 10 and optimal number of eigenvectors for AR(1), AR(2) and AR(4) spectra.
AR(1) model
Covariance function RMSE MAE Wasserstein
Spline 0.26 0.20 0.18 0 0.90
Spline 0.25 0.19 0.16 0 0.93
Squared exponential 0.16 0.12 0.12 0 0.49
Matern 0.19 0.15 0.13 0 0.51
Matern 0.19 0.14 0.12 0 0.49
Neural network 0.15 0.11 0.11 0 0.34
SQE+NN 0.18 0.14 0.13 0 0.40
AR(2) model
Covariance function RMSE MAE Wasserstein
Spline 0.33 0.26 0.24 0.0020 0.73
Spline 0.21 0.17 0.14 0.0020 0.36
Squared exponential 0.22 0.17 0.15 0.0020 0.56
Matern 0.21 0.17 0.14 0.0020 0.24
Matern 0.19 0.15 0.13 0.0020 0.37
Neural network 0.32 0.24 0.22 0.0040 0.85
SQE+NN 0.21 0.17 0.15 0.0020 0.53
AR(4) model
Covariance function RMSE MAE Wasserstein
Spline 0.82 0.63 0.54 0.20 2.23
Spline 0.43 0.33 0.26 0.20 1.57
Squared exponential 0.45 0.31 0.27 0.20 1.98
Matern 0.37 0.27 0.21 0.20 1.75
Matern 0.40 0.29 0.24 0.20 1.90
Neural network 0.97 0.75 0.41 0.21 2.12
SQE+NN 0.45 0.32 0.26 0.20 2.06
Table 1: Results for synthetic data experiments. Median error over 50 simulations is recorded.

3.2 Ar(1)

First, we generate data from an AR(1) process . We calculate the log periodogram from the observed data and estimate the log PSD and GP hyperparameters with our MCMC scheme. Table 1 indicates that the best performing covariance functions the squared exponential and neural network. The log PSD has high power at low frequency and gradually declines in power when moving toward higher frequency components. The spectrum does not exhibit any sharp peaks and it is unsurprising that the highly smooth squared exponential performs well in estimating the log PSD, as shown in Figure 1(a). The worst performing covariance functions is the smoothing spline with 10 basis functions, the same as AdaptSPEC. When optimising the marginal Whittle likelihood for the number of basis functions, there is limited improvement in the spline model, as seen in Figure 0(d). This experiment suggests that for a simple spectrum, there is limited improvement in optimising the number of basis functions in the smoothing spline and alternative GP covariance functions provide superior performance.

3.3 Ar(2)

Next, we generate data from an AR(2) process , compute the log periodogram and estimate the log PSD and GP hyperparameters with the proposed MCMC scheme. Table 1 indicates that several covariance functions perform similarly well: SQE+NN (Figure 1(b)), squared exponential, both Matern, and the optimised spline. The neural network and spline with 10 basis functions perform worse. The log PSD has a relatively flat gradient and a mild peak at . Once again, smoother covariance functions tend to perform best, while the neural network covariance function, popularised for its ability to model abrupt changes in data modelling problems, is a poor performer. Figure 0(e) demonstrates a substantial improvement in estimating the log PSD when an optimal number, larger than 10, of basis functions is used. In particular, the estimate improves at detecting the maximum amplitude of the power spectrum.

3.4 Ar(4)

Finally, we follow Edwards2018, generate data from an AR(4) process , and again estimate the log PSD and GP hyperparameters with the MCMC scheme. The log PSD of the AR(4) process is more complex than the prior two experiments. First, there are two peaks rather than one, of different amplitudes. Second, the spectrum changes more abruptly than the AR(1) and AR(2) spectra. Table 1 indicates that the best performing covariance functions are the smoothing spline with an optimal number of basis functions and the two Matern covariance functions (seen in Figure 1(c)). The worst performing covariance functions are the smoothing spline with 10 basis functions and the neural network. Once again we see a substantial improvement in estimation when an optimal number of basis functions is used, Figure 0(f).

For all three AR processes, note Figures 0(a), 0(b) and 0(c) for the choice of optimal number of basis functions. There we see that consistently more than 10 basis functions are required to best model the data. This is concerning given that AR processes have spectra that are in general much less complicated than the multiple periodic components of real world data.

(a) AR(1) and squared exponential
(b) AR(2) and SQE+NN
(c) AR(4) and Matern
Figure 2: Best performing covariance functions, analytic spectrum, periodogram and uncertainty bounds for each autoregressive process

3.5 Non-stationary data

In this section, we propose a methodological improvement for estimating a time varying power spectrum using a reversible jump MCMC algorithm, building on the process in AdaptSPEC Rosen2017. This algorithm involves both change point detection and spectral density estimation; previously we had only been experimenting with spectral density estimation. In the previous section, our simulation study demonstrated that optimising the number of basis functions associated to the smoothing spline covariance function generalised best across a range of AR processes. In particular, the optimised smoothing spline provided markedly improved estimation of the amplitude of peak frequency components. We generate a piecewise autoregressive process, where each segment is of length . Displayed in Figure 3, this process is given by:

In this experiment, we similarly optimise the number of basis functions based on the samples from the AdaptSPEC algorithm. Having optimised the number of basis functions over the PSD, in each segment, we compare the forecasts between the optimal number and 10 basis functions against the log PSD across three segments. That is, the optimal number of basis functions is computed by optimising over the entire time-varying spectral surface. Figure 3(a) demonstrates the need for optimising the number of eigenvectors. Hard-coding 10 basis functions into the AdaptSPEC model provides the lowest likelihood of the data, with 39 basis functions being optimal. Figures 3(b), 3(c) and 3(d) show that the optimised smoothing spline model provides improved performance over the model with 10 basis functions over each segment.

In Figure 3(b), the optimised model more effectively captures abrupt peaks in the PSD. One should note that the slight overfitting exhibited by the optimised spline model would not lead to erroneous inference, as this overfitting occurs at frequency components with limited power. In Figure 3(c), where the PSD exhibits two peaks at differing amplitudes. The optimised model does a superior job at estimating both peaks in amplitude, while avoiding overfitting at frequency components that exhibit the most power. In Figure 3(d) we see two peaks in the PSD with significantly varying amplitude. There, the optimised model more appropriately determines the amplitude and frequency of the greatest peaks, just like our validation measures in Table 1 recorded. Absent from the spectral density estimation literature, they are of great importance when making inference about periodic phenomena in the physical sciences.

Figure 5 displays the two time-varying PSD plots generated using the original AdaptSPEC model, and our optimised model. Figure 4(b) clearly generates a less smooth and more intricate posterior surface than Figure 4(a). In particular, the scale of the -axis demonstrates the optimised model’s improvement when estimating appropriate peaks at respective frequencies in the PSD.

Figure 3: Synthetic locally stationary AR process
(a) Log likelihood vs eigenvectors
(b) Segment 1
(c) Segment 2
(d) Segment 3
Figure 4: Optimal number of eigenvectors, spline vs spline for respective locally stationary segments
(a) 10 basis functions
(b) Optimal number of basis functions
Figure 5: Time varying power spectra

4 Real data

In this section, we analyse the spectra of two canonical datasets, sunspots and airline passenger data. Both time series are well studied, with the former being particularly well studied in spectral density estimation literature. We compare the performance of the optimised smoothing spline model and the smoothing spline with 10 basis functions. Unlike AR processes, real data problems do not have an analytic PSD to validate spectral estimates against. Both time series require transformations to ensure they are stationary. For the sunspots data, following Choudhuri Choudhuri2004, we perform the following transformation

That is, we take the square root of the observations and then mean-centre the resulting values. For the airline passenger data, we perform an original transformation based on first differences of fourth roots:

The two transformed time series can be seen in Figures 5(a) and 5(b). As before, the number of basis functions is optimised to maximise the marginal Whittle likelihood. For the sunspots and airline passenger data respectively, the optimal number of basis functions is 24 and 49, seen in Figures 5(c) and 5(d) respectively. Figures 5(e) and 5(f) demonstrate that the smoothing spline with 10 basis functions does not have the complexity to model such processes. Figure 5(e) has one significant peak in the log periodogram; the 10 basis function spline fails to capture this, while the optimised smoothing spline captures the abruptness and amplitude of the peak, removing some of the noise characteristic of the periodogram. Figure 5(f) has 5 peaks; the smoothing spline with 10 functions smooths over all variability in the data, failing to provide any meaningful or accurate inference, while the optimised smoothing spline detects the 5 peaks in amplitude at the appropriate frequency components with impressive accuracy.

(a) Transformed sunspot time series
(b) Transformed airline time series
(c) Optimise eigenvectors: sunspot
(d) Optimise eigenvectors: airline
(e) Spline estimates: sunspot spectrum
(f) Spline estimates: airline spectrum
Figure 6: Transformed time series, optimising marginal Whittle likelihood, spectral density estimates for spline models

5 Conclusion

The research in spectral density estimation to this date within the machine learning and statistics communities differs widely. We seek to bridge this gap, using a variety of GP covariance structures that are well studied in the machine learning literature, and compare these with alternative methods of smoothing currently used within statistics. We study the frequentist properties of our estimates via simulation study and analyse performance using new and existing validation metrics. We introduce two new validation metrics, and , which are inspired from spectral inference within the physical sciences.

Our simulation study confirms that most well-known covariance functions perform acceptably well when analysing the power spectral density for stationary time series. Generally, smoother covariance functions such as the squared exponential and Matern perform better in the cases of smoother and simpler spectra, while models that combine stationary and non-stationary covariance functions such as the SQE+NN appear to generalise best across more complex spectral densities. The smoothing spline with 10 basis functions, as used within AdaptSPEC, is the worst performing model.

There has been a heavy reliance on the use of AR processes’ known spectral density when validating various models. This has two major drawbacks. First, real world spectra are often far more complex than AR spectral experiments, as shown in our experiments with sunspots and airline passenger data. This may have introduced a bias in many models within the statistics literature, such as AdaptSPEC, which performs well on relatively simple spectra, but performs badly when presented with more complex spectra. To demonstrate, we compare the spectral density estimates from two models in these real data experiments, using the smoothing spline with 10 basis functions and the smoothing spline with an optimal number of basis functions. We demonstrate the need for optimising the number of basis functions, as the function may require varying degrees of complexity based on the dynamics of the underlying periodicities.

Second, the vast majority of real world phenomena are non-stationary, rendering stationary methods for spectral density estimation useless. We have proposed a new method for non-stationary spectral analysis, where the number of basis functions is optimised (using the Whittle likelihood) to maximise the probability of the posterior spectrum. Our experiments demonstrate vast improvement in spectral density estimation over the existing AdaptSPEC technique for all locally stationary segments within our piecewise AR process.

There are a variety of interesting directions for future research. We could use more sophisticated MCMC schemes for our experiments. In particular, one could partition a locally stationary time series while at the same time optimising the number of basis functions used in the MCMC scheme within each segment, rather than over the entire time series period, as we have done. Future modelling frameworks could consider Bayesian and frequentist methods for identifying and learning the most appropriate covariance function, or combinations thereof, to use within each segment of the time series. Such kernel learning could consider how stationary and smooth the segments are, and other properties, to determine the best method for spectral density estimation in varying scenarios. This could be performed within a Bayesian framework, or with alternative techniques such as hypothesis testing, which has been used in other change point detection algorithms.

To our knowledge, kernel learning has not yet been performed within the context of spectral density estimation, and we believe our learning of the optimal number of eigenvectors is a first step in this direction. We could also combine the eigendecomposition, a statistical method, with the other covariance functions we have explored from the machine learning literature, including a learned choice of best covariance function. Going forward, we believe we have demonstrated conclusively that the number of eigenvectors should always be treated as a variable to be optimised.

Acknowledgements

Many thanks to Alex Judge and Sally Cripps for helpful discussions.

References