Transformed series, the UK spectrum and fusing the world and Europe
We transformed the number of new daily COVID-19 cases by applying a signed log transform to the first differences of the new case time series (see Methods). The transformed number of new daily cases for 16 countries are shown in Figure 1 each showing a distorted noisy, but characteristic sinusoidal trace.
The estimated log-spectrum for the UK transformed new daily cases is shown in Figure 2 and for all other countries we analysed in the Extended Data figures. Spectral estimates are commonly displayed on a logarithmic scale
. Spectral peaks can be observed at wavelengths of 6.7, 3.2 and 2.3 days, respectively. Although the peaks are visible, the credible intervals indicate that there is a fairly large degree of uncertainty, because this time series contains 52 observations. A frequentist analysis, e.g. using thespectrum function in R, produces a similar result, but with even wider confidence bands.
Similar spectral analyses for each country indicate three similar spectral peaks, although not always as well-defined nor in precisely the same location.
Figure 3 shows an estimate that is the result of coherently fusing spectra from 18 countries, giving an an effective sample size of 916 days. Here, the clear spectral peaks have narrow credible intervals, due to the large effective number of days afforded by using 18 countries together. The spectral peaks are located at wavelengths of 6.7, 4.1 and 2.7 days. The peak around 6.7 days is observed in the spectral plots for individual countries and we interpret it to be a weekly effect. Such a weekly effect could be produced by reporting artefacts (e.g. paperwork being delayed until Monday, or carried out differently at the weekend) or due to the behaviour differences of people at weekends. All countries analysed have a 5+2 working week/weekend pattern, although not necessarily the same days of the week (the actual days for a weekend are a phase effect, which does not effect the spectrum).
Clustering spectra and groups of countries with similar spectra
We next clustered our 18 countries based on their spectrum, by calculating a dissimilarity between the spectra for each pair of countries, and then performing both a hierarchical cluster analysis and multidimensional scaling on the dissimilarity matrix. The scaling solution indicated that only two dimensions were required to encapsulate 72% of variation in the data. Figure4 shows the resultant two-dimensional solution.
Attaching a meaning to the scaling axes in Figure 4 is not easy. We hypothesise that Axis 1 might indicate how badly a country has been perceived to have been affected by the virus with Australia, New Zealand and Sweden less so and those on the left of the plot considerably more so. However, Germany is the obvious anomaly to this interpretation as, currently, it has perhaps been perceived to have handled the crisis well so far.
The peak frequencies for each of these groups is listed in Table 1, which shows differences between them. However, each group possesses a possible weekly peak and higher-frequency peaks labelled a., of around three to four days, and b., around 2.6 days.
|Peak||Group 1||Group 2||Group 3|
Spectral changes after lockdown
Many countries experiencing the COVID-19 pandemic have instituted a lockdown procedure to dramatically reduce virus transmission. At the time of writing, these countries have observed new daily COVID-19 cases for between 43 and 54 days. We assume that, on average, it takes about seven days for the virus to incubate, for a person to seek attention and then be tested positive for the SARS-CoV-2 coronavirus. For each country, the number of days prior to and after the lockdown (including incubation time) are the UK (38, 14), Italy (26,28), France (23, 26), Germany (24, 25), Spain (22, 27), Switzerland (22, 25), Belgium (18, 25) and the Netherlands (19, 26). For some of these countries the lockdown was applied over a period of two of three days and we took the median of these as the lockdown start date.
The number of days before and after the lockdown are, in each case, too small to carry out anything other than the most simplistic time series to maintain statistical reliability. In particular, a spectral estimate in this situation would be subject to a high degree of uncertainty. However, Figure 6 shows our coherently fused spectral estimates across these countries before and after the lockdown period, making use of 192 effective days prior to lockdown and 196 days afterwards.
The weekly peak is clearly visible in both estimates. The second and third peaks (labelled a. and b. in Table 1) are visible pre-lockdown, but have all but disappeared post-lockdown. The spectrum is flat in the location where peak a. was previously, and spectral power declines considerably, relatively, where peak b. was located previously.
This result is particularly interesting as it suggests that peaks a. and b. have been disrupted by the lockdown. The weekly effect seems relatively unchanged by the lockdown, indicating that perhaps it was strongly driven by non-epidemic effects, such as recording/paperwork or bureaucracy caused by weekends.
The post-lockdown spectrum is higher overall than the pre-lockdown spectrum, this is due to the larger variation associated with the larger number of cases identified during the progress of the epidemic. Our transformation suppresses this variation, but does not remove it entirely.
Forecasting daily cases
We have had varied success in forecasting daily cases using a sum of two time-modulated cosine waves model, described in Methods, and more research is required. We used the Nelder-Mead optimisation routine built into R, with starting frequencies of and taken from our UK spectral estimate plots, and built the model on the transformed cases up to April 11th. After optimisation, the fitted model resulted in modified frequencies of , close to the starting frequencies (the other estimated parameters were ).
Figure 7 shows transformed new daily UK cases, the model fit and forecasts. The model fit does not look too bad, many spectral peaks are being identified, but perhaps the amplitudes of them could be better matched.
The untransformed forecasts for April 12th, 13th, 14th and 15th were 5250, 5200, 6373 and 6164, all with approximate 95% confidence interval of
. The actual number of cases for April 12th turned out to be 5288. In this case, the forecast was good. However, the two-step ahead forecast of 5200 was wrong — the true value turned out to be 4342 on April 13th. We also used several stochastic forecasting methods based on autoregressive integrated moving average modelling and exponential smoothing, but nothing that we tried was particularly successful. The series is difficult as its amplitude/variance is not constant and we suspect that frequencies are changing over time (as, e.g., the lockdown plot Figure6 indicated).
However, rather than point forecasts, the general sinusoidal nature of the transformed cases suggests a further, perhaps more reasonable strategy. At this stage, the UK Government and media are looking expectantly at the daily case numbers to try and detect a sustained downward trend in cases. Excitement has been generated by a drop in cases two days in a row. This happened on April 5th with 5903 cases, followed by a drop to 3802 and then 3634 on April 6th and 7th and then, unfortunately, increasing to 5491 on the 8th. However, the general sinusoidal patten, with a wavelengths of about 2.7 and four days shows that we should only perhaps start believing that a downward turn is a downward trend after a sustained decrease of four days or more. However, caution needs to be applied here as there is no guarantee that the dynamics will remain unchanged.
We analysed numbers of deaths using similar methods described here and found similar cycles. Although we have not carried out a detailed analysis, if the number of deaths process can be approximated by a linear system[1, 10] with the numbers of cases as input, then similar cycles are to be expected.
A time series with a fixed sampling rate and length has a minimum and maximum (Nyquist) frequency that can be observed.[1, 10] Although our spectral fusion methods provide more accurate estimates of the spectrum in the normal range (equivalent to having a larger sample size), they can not provide information on frequencies outside of the normal range. To estimate lower frequencies, we would need a genuinely longer series and, for higher frequencies, we would require cases more frequently than once a day, which are arguably not really necessary for any practical purpose.
Our analyses assume approximate stationarity and linearity for the transformed series, which is unlikely to be exactly true in practice. For example, in the UK transformed case series in Figure 7, there are hints of the series oscillation speeding up over the last ten days. Practically speaking, changes in the testing regime, recording practices, the lockdowns or other interventions will change the dynamics of either the pandemic itself or recording of it. Ideally, it would be of interest to use methods for non-stationary time series[18, 19], but the current series available to us are far too short for such analyses.
All computations were executed in R and packages that are mentioned specifically below.
COVID-19 New Daily Cases Transformation.
Let , for represent the number of new daily cases for days for country . The spectral dynamics of the number of daily cases for different countries are all countries masked by the well-known and characteristic exponential increases (and decrease, for those countries that locked down and have now passed their peak). Hence, we transform our number of daily cases series to reveal the spectral dynamics. After exploration the following transform was used for all series , where the sign function is , if is positive or , if is negative, and for . The transform is easily inverted, which is essential for forecasting the number of daily cases.
Bayesian Spectral Estimation and Fusion: Regspec
We use the regspec[3, 20] Bayesian spectral estimation method with a neutral white noise prior with prior variance of and all default arguments, except for a smoothing parameter of , although the results are not sensitive to the latter. Regspec straightforwardly enables the production of spectral estimates using multiple data sets, with each having different lengths and produces coherent credible intervals to properly ascertain the uncertainty inherent in the estimation process. Regspec can also fuse spectra for multiple series recorded at different sampling rates, but we do not need to use this aspect of its functionality here as all our case time series are reported daily. However, if a country decided to release case numbers on some other sampling plan (e.g. every two days, or weekly) then Regspec would be able to fuse the spectral estimates as described here. Such a feature might be of use when dealing with reporting structures that are not equipped to provide daily reporting of cases or where weekly cases are thought to be more accurate. For example, this might apply to regions with fragile health or reporting systems or populations that are spread across widely dispersed geographical regions with poor communications.
Clustering of Spectra.
Although the number of cases transformed time series show similar spectral behaviour, it is possible to observe closer similarities within certain subgroups of countries. We used unsupervised clustering and scaling techniques[21, 22] to depict the relationship between different countries and suggest a clustering for them. First, for each country we produced a spectral estimate using regspec as mentioned above, and then formed a dissimilarity for each pair of countries by computing the Euclidean distance between their spectral values (using the dist function in R). Classical multidimensional scaling was then used to produce an estimated configuration using the cmdscale function in R. For clustering we use hierarchical cluster analysis on the dissimilarity matrix we computed. It is well-known that dendrograms are sensitive to the input dissimilarity matrix, so we used the clusterwise cluster stability assessment by resampling method to produce a stable clustering.
Given the form of the transformed new daily cases we propose a model, , that is the sum of two time-modulated cosine waves, , each with formula
where indexes the two waves and . Initial values for forecast model fitting we used , for
. For model evaluation we put more weight on getting later observations correct and use a residual weight vectorwhere and is the number of cases. For short term forecasting, we fit to the transformed daily cases by weighted least-squares using standard R optimisation functions and then extrapolate , using recent weighted residuals to estimate the forecasting error.
The number of daily COVID-19 cases for countries can be found at the website of the European Centre for Disease Prevention and Control.
-  Priestley, M.B. Spectral analysis and time series. (Academic Press, 1983).
-  European Centre for Disease Prevention and Control, COVID-19 cases worldwide. https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide, April 14th 2020.
-  Nason, G. P. et al. Should we sample a time series more frequently? decision support via multirate spectrum estimation (with discussion). J. Roy. Stat. Soc. A, 180, 353–407 (2016).
-  Drewett, Z. We don’t know if coronavirus deaths will peak this week. https://metro.co.uk/2020/04/06/dont-know-coronavirus-deaths-will-peak-week-12517835/ (2020).
-  Percival, D.B. & Walden, A.T. Spectra analysis for physical applications. (Cambridge University Press, 2010).
-  Grenfell, B.T. et al. Travelling waves and spatial hierarchies in measles epidemics. Nature, 414, 716–723 (2001).
-  Conlan, A.J.K. & Grenfell, B.T. Seasonality and the persistence and invasion of measles. Proc. Roy. Soc. B, 274, 1133–1141 (2007).
-  Ferrari, M.J. et al. The dynamics of measles in sub-Saharan Africa. Nature, 451, 679–684 (2008).
-  Benvenuto, D. Application of the ARIMA model on the COVID-19 epidemic dataset. Data in Brief, 29, 105340 (2020).
-  Chatfield, C. The Analysis of Time Series: An Introduction, (Chapman and Hall/CRC Press, 2003).
-  Hyndman, R.J. & Athanasopoulos, G. Forecasting: Principles and Practice, (OTexts, 2013).
-  Nason, G.P. & Savchev, D. White noise testing using wavelets. Stat, 3, 351–362 (2014).
-  Chowell, G. et al. A novel sub-epidemic modeling framework for short-term forecasting epidemic waves. BMC Medicine, 17, 164 (2019).
-  Roosa, K. et al. Short-term forecasts of the COVID-19 epidemic in Guangdong and Zhejiang, China: February 13–23, 2020. J. Clin. Med., 9, 596 (2020).
-  Hennig, C. Flexible Procedures for Clustering, R package version 2.2-5. https://CRAN.R-project.org/package=fpc (2020).
-  R Core Team. R: a language and environment for statistical computing (R Foundation for Statistical Computing, 2020).
-  Nelder, J.A. & Mead, R. A simplex algorithm for function minimization. Comput. J., 7, 308–313 (1965).
-  Dahlhaus, R. Locally stationary processes. In Handbook of Statistics, Subba Rao, T., Subba Rao, S. & Rao, C. (eds), 351–413. (Elsevier, Amsterdam, 2012).
-  Nason, G. P. A test for second-order stationarity and approximate confidence intervals for localized autocovariances for locally stationary time series. J. Roy. Stat. Soc. B, 75, 879–904 (2013).
-  Powell, B. et al. Non-parametric Bayesian spectrum estimation for multirate data. R package version 2.4, https://CRAN.R-project.org/package=regspec (2016).
Chatfield, C. & Collins, A.J.
An Introduction to Multivariate Analysis(Chapman and Hall/CRC, 1980).
-  Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning, (Springer, 2009).