Transformed series, the UK spectrum and fusing the world and Europe
We transformed the number of new daily COVID19 cases by applying a signed log transform to the first differences of the new case time series (see Methods). The transformed number of new daily cases for 16 countries are shown in Figure 1 each showing a distorted noisy, but characteristic sinusoidal trace.
The estimated logspectrum for the UK transformed new daily cases is shown in Figure 2 and for all other countries we analysed in the Extended Data figures. Spectral estimates are commonly displayed on a logarithmic scale[16]
. Spectral peaks can be observed at wavelengths of 6.7, 3.2 and 2.3 days, respectively. Although the peaks are visible, the credible intervals indicate that there is a fairly large degree of uncertainty, because this time series contains 52 observations. A frequentist analysis, e.g. using the
spectrum function in R[16], produces a similar result, but with even wider confidence bands.Similar spectral analyses for each country indicate three similar spectral peaks, although not always as welldefined nor in precisely the same location.
Figure 3 shows an estimate that is the result of coherently fusing spectra from 18 countries, giving an an effective sample size of 916 days. Here, the clear spectral peaks have narrow credible intervals, due to the large effective number of days afforded by using 18 countries together. The spectral peaks are located at wavelengths of 6.7, 4.1 and 2.7 days. The peak around 6.7 days is observed in the spectral plots for individual countries and we interpret it to be a weekly effect. Such a weekly effect could be produced by reporting artefacts (e.g. paperwork being delayed until Monday, or carried out differently at the weekend) or due to the behaviour differences of people at weekends. All countries analysed have a 5+2 working week/weekend pattern, although not necessarily the same days of the week (the actual days for a weekend are a phase effect, which does not effect the spectrum).
Clustering spectra and groups of countries with similar spectra
We next clustered our 18 countries based on their spectrum, by calculating a dissimilarity between the spectra for each pair of countries, and then performing both a hierarchical cluster analysis and multidimensional scaling on the dissimilarity matrix. The scaling solution indicated that only two dimensions were required to encapsulate 72% of variation in the data. Figure
4 shows the resultant twodimensional solution.Attaching a meaning to the scaling axes in Figure 4 is not easy. We hypothesise that Axis 1 might indicate how badly a country has been perceived to have been affected by the virus with Australia, New Zealand and Sweden less so and those on the left of the plot considerably more so. However, Germany is the obvious anomaly to this interpretation as, currently, it has perhaps been perceived to have handled the crisis well so far.
Figure 5 show the spectral estimates for the three groups of countries identified in Figure 4, using the clustering techniques mentioned in Methods.
The peak frequencies for each of these groups is listed in Table 1, which shows differences between them. However, each group possesses a possible weekly peak and higherfrequency peaks labelled a., of around three to four days, and b., around 2.6 days.
Peak  Group 1  Group 2  Group 3 

Weekly  6.48  7.27  6.59 
a.  3.31  4.30  4.09 
b.  2.52  2.77  2.70 
Spectral changes after lockdown
Many countries experiencing the COVID19 pandemic have instituted a lockdown procedure to dramatically reduce virus transmission. At the time of writing, these countries have observed new daily COVID19 cases for between 43 and 54 days. We assume that, on average, it takes about seven days for the virus to incubate, for a person to seek attention and then be tested positive for the SARSCoV2 coronavirus. For each country, the number of days prior to and after the lockdown (including incubation time) are the UK (38, 14), Italy (26,28), France (23, 26), Germany (24, 25), Spain (22, 27), Switzerland (22, 25), Belgium (18, 25) and the Netherlands (19, 26). For some of these countries the lockdown was applied over a period of two of three days and we took the median of these as the lockdown start date.
The number of days before and after the lockdown are, in each case, too small to carry out anything other than the most simplistic time series to maintain statistical reliability. In particular, a spectral estimate in this situation would be subject to a high degree of uncertainty. However, Figure 6 shows our coherently fused spectral estimates[3] across these countries before and after the lockdown period, making use of 192 effective days prior to lockdown and 196 days afterwards.
The weekly peak is clearly visible in both estimates. The second and third peaks (labelled a. and b. in Table 1) are visible prelockdown, but have all but disappeared postlockdown. The spectrum is flat in the location where peak a. was previously, and spectral power declines considerably, relatively, where peak b. was located previously.
This result is particularly interesting as it suggests that peaks a. and b. have been disrupted by the lockdown. The weekly effect seems relatively unchanged by the lockdown, indicating that perhaps it was strongly driven by nonepidemic effects, such as recording/paperwork or bureaucracy caused by weekends.
The postlockdown spectrum is higher overall than the prelockdown spectrum, this is due to the larger variation associated with the larger number of cases identified during the progress of the epidemic. Our transformation suppresses this variation, but does not remove it entirely.
Forecasting daily cases
We have had varied success in forecasting daily cases using a sum of two timemodulated cosine waves model, described in Methods, and more research is required. We used the NelderMead[17] optimisation routine built into R[16], with starting frequencies of and taken from our UK spectral estimate plots, and built the model on the transformed cases up to April 11th. After optimisation, the fitted model resulted in modified frequencies of , close to the starting frequencies (the other estimated parameters were ).
Figure 7 shows transformed new daily UK cases, the model fit and forecasts. The model fit does not look too bad, many spectral peaks are being identified, but perhaps the amplitudes of them could be better matched.
The untransformed forecasts for April 12th, 13th, 14th and 15th were 5250, 5200, 6373 and 6164, all with approximate 95% confidence interval of
. The actual number of cases for April 12th turned out to be 5288. In this case, the forecast was good. However, the twostep ahead forecast of 5200 was wrong — the true value turned out to be 4342 on April 13th. We also used several stochastic forecasting methods based on autoregressive integrated moving average modelling and exponential smoothing, but nothing that we tried was particularly successful. The series is difficult as its amplitude/variance is not constant and we suspect that frequencies are changing over time (as, e.g., the lockdown plot Figure
6 indicated).However, rather than point forecasts, the general sinusoidal nature of the transformed cases suggests a further, perhaps more reasonable strategy. At this stage, the UK Government and media are looking expectantly at the daily case numbers to try and detect a sustained downward trend in cases. Excitement has been generated by a drop in cases two days in a row. This happened on April 5th with 5903 cases, followed by a drop to 3802 and then 3634 on April 6th and 7th and then, unfortunately, increasing to 5491 on the 8th. However, the general sinusoidal patten, with a wavelengths of about 2.7 and four days shows that we should only perhaps start believing that a downward turn is a downward trend after a sustained decrease of four days or more. However, caution needs to be applied here as there is no guarantee that the dynamics will remain unchanged.
Discussion
We analysed numbers of deaths using similar methods described here and found similar cycles. Although we have not carried out a detailed analysis, if the number of deaths process can be approximated by a linear system[1, 10] with the numbers of cases as input, then similar cycles are to be expected.
A time series with a fixed sampling rate and length has a minimum and maximum (Nyquist) frequency that can be observed.[1, 10] Although our spectral fusion methods[3] provide more accurate estimates of the spectrum in the normal range (equivalent to having a larger sample size), they can not provide information on frequencies outside of the normal range. To estimate lower frequencies, we would need a genuinely longer series and, for higher frequencies, we would require cases more frequently than once a day, which are arguably not really necessary for any practical purpose.
Our analyses assume approximate stationarity and linearity for the transformed series, which is unlikely to be exactly true in practice. For example, in the UK transformed case series in Figure 7, there are hints of the series oscillation speeding up over the last ten days. Practically speaking, changes in the testing regime, recording practices, the lockdowns or other interventions will change the dynamics of either the pandemic itself or recording of it. Ideally, it would be of interest to use methods for nonstationary time series[18, 19], but the current series available to us are far too short for such analyses.
Methods
All computations were executed in R[16] and packages that are mentioned specifically below.
COVID19 New Daily Cases Transformation.
Let , for represent the number of new daily cases for days for country . The spectral dynamics of the number of daily cases for different countries are all countries masked by the wellknown and characteristic exponential increases (and decrease, for those countries that locked down and have now passed their peak). Hence, we transform our number of daily cases series to reveal the spectral dynamics. After exploration[10] the following transform was used for all series , where the sign function is , if is positive or , if is negative, and for . The transform is easily inverted, which is essential for forecasting the number of daily cases.
Bayesian Spectral Estimation and Fusion: Regspec
We use the regspec[3, 20] Bayesian spectral estimation method with a neutral white noise prior with prior variance of and all default arguments, except for a smoothing parameter of , although the results are not sensitive to the latter. Regspec straightforwardly enables the production of spectral estimates using multiple data sets, with each having different lengths and produces coherent credible intervals to properly ascertain the uncertainty inherent in the estimation process. Regspec can also fuse spectra for multiple series recorded at different sampling rates, but we do not need to use this aspect of its functionality here as all our case time series are reported daily. However, if a country decided to release case numbers on some other sampling plan (e.g. every two days, or weekly) then Regspec would be able to fuse the spectral estimates as described here. Such a feature might be of use when dealing with reporting structures that are not equipped to provide daily reporting of cases or where weekly cases are thought to be more accurate. For example, this might apply to regions with fragile health or reporting systems or populations that are spread across widely dispersed geographical regions with poor communications.
Clustering of Spectra.
Although the number of cases transformed time series show similar spectral behaviour, it is possible to observe closer similarities within certain subgroups of countries. We used unsupervised clustering and scaling techniques[21, 22] to depict the relationship between different countries and suggest a clustering for them. First, for each country we produced a spectral estimate using regspec as mentioned above, and then formed a dissimilarity for each pair of countries by computing the Euclidean distance between their spectral values (using the dist function in R[16]). Classical multidimensional scaling was then used to produce an estimated configuration using the cmdscale function in R[16]. For clustering we use hierarchical cluster analysis on the dissimilarity matrix we computed. It is wellknown that dendrograms are sensitive to the input dissimilarity matrix, so we used the clusterwise cluster stability assessment by resampling method to produce a stable clustering[15].
Forecasting
Given the form of the transformed new daily cases we propose a model, , that is the sum of two timemodulated cosine waves, , each with formula
(1) 
where indexes the two waves and . Initial values for forecast model fitting we used , for
. For model evaluation we put more weight on getting later observations correct and use a residual weight vector
where and is the number of cases. For short term forecasting, we fit to the transformed daily cases by weighted leastsquares using standard R[16] optimisation functions and then extrapolate , using recent weighted residuals to estimate the forecasting error.Data Availability
The number of daily COVID19 cases for countries can be found at the website of the European Centre for Disease Prevention and Control[2].
References
 [1] Priestley, M.B. Spectral analysis and time series. (Academic Press, 1983).
 [2] European Centre for Disease Prevention and Control, COVID19 cases worldwide. https://www.ecdc.europa.eu/en/publicationsdata/downloadtodaysdatageographicdistributioncovid19casesworldwide, April 14th 2020.
 [3] Nason, G. P. et al. Should we sample a time series more frequently? decision support via multirate spectrum estimation (with discussion). J. Roy. Stat. Soc. A, 180, 353–407 (2016).
 [4] Drewett, Z. We don’t know if coronavirus deaths will peak this week. https://metro.co.uk/2020/04/06/dontknowcoronavirusdeathswillpeakweek12517835/ (2020).
 [5] Percival, D.B. & Walden, A.T. Spectra analysis for physical applications. (Cambridge University Press, 2010).
 [6] Grenfell, B.T. et al. Travelling waves and spatial hierarchies in measles epidemics. Nature, 414, 716–723 (2001).
 [7] Conlan, A.J.K. & Grenfell, B.T. Seasonality and the persistence and invasion of measles. Proc. Roy. Soc. B, 274, 1133–1141 (2007).
 [8] Ferrari, M.J. et al. The dynamics of measles in subSaharan Africa. Nature, 451, 679–684 (2008).
 [9] Benvenuto, D. Application of the ARIMA model on the COVID19 epidemic dataset. Data in Brief, 29, 105340 (2020).
 [10] Chatfield, C. The Analysis of Time Series: An Introduction, (Chapman and Hall/CRC Press, 2003).
 [11] Hyndman, R.J. & Athanasopoulos, G. Forecasting: Principles and Practice, (OTexts, 2013).
 [12] Nason, G.P. & Savchev, D. White noise testing using wavelets. Stat, 3, 351–362 (2014).
 [13] Chowell, G. et al. A novel subepidemic modeling framework for shortterm forecasting epidemic waves. BMC Medicine, 17, 164 (2019).
 [14] Roosa, K. et al. Shortterm forecasts of the COVID19 epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. J. Clin. Med., 9, 596 (2020).
 [15] Hennig, C. Flexible Procedures for Clustering, R package version 2.25. https://CRAN.Rproject.org/package=fpc (2020).
 [16] R Core Team. R: a language and environment for statistical computing (R Foundation for Statistical Computing, 2020).
 [17] Nelder, J.A. & Mead, R. A simplex algorithm for function minimization. Comput. J., 7, 308–313 (1965).
 [18] Dahlhaus, R. Locally stationary processes. In Handbook of Statistics, Subba Rao, T., Subba Rao, S. & Rao, C. (eds), 351–413. (Elsevier, Amsterdam, 2012).
 [19] Nason, G. P. A test for secondorder stationarity and approximate confidence intervals for localized autocovariances for locally stationary time series. J. Roy. Stat. Soc. B, 75, 879–904 (2013).
 [20] Powell, B. et al. Nonparametric Bayesian spectrum estimation for multirate data. R package version 2.4, https://CRAN.Rproject.org/package=regspec (2016).

[21]
Chatfield, C. & Collins, A.J.
An Introduction to Multivariate Analysis
(Chapman and Hall/CRC, 1980).  [22] Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning, (Springer, 2009).
Comments
There are no comments yet.