Time Series Using Exponential Smoothing Cells

by   Avner Abrami, et al.

Time series analysis is used to understand and predict dynamic processes, including evolving demands in business, weather, markets, and biological rhythms. Exponential smoothing is used in all these domains to obtain simple interpretable models of time series and to forecast future values. Despite its popularity, exponential smoothing fails dramatically in the presence of outliers, large amounts of noise, or when the underlying time series changes. We propose a flexible model for time series analysis, using exponential smoothing cells for overlapping time windows. The approach can detect and remove outliers, denoise data, fill in missing observations, and provide meaningful forecasts in challenging situations. In contrast to classic exponential smoothing, which solves a nonconvex optimization problem over the smoothing parameters and initial state, the proposed approach requires solving a single structured convex optimization problem. Recent developments in efficient convex optimization of large-scale dynamic models make the approach tractable. We illustrate new capabilities using synthetic examples, and then use the approach to analyze and forecast noisy real-world time series. Code for the approach and experiments is publicly available.



There are no comments yet.


page 1


Orbit: Probabilistic Forecast with Exponential Smoothing

Time series forecasting is an active research topic in academia as well ...

Variable and Fixed Interval Exponential Smoothing

Exponential smoothers are a simple and memory efficient way to compute r...

Protecting Time Series Data with Minimal Forecast Loss

Forecasting could be negatively impacted due to anonymization requiremen...

Forecasting the Remittances of the Overseas Filipino Workers in the Philippines

This study aims to find a Box-Jenkins time series model for the monthly ...

Can automated smoothing significantly improve benchmark time series classification algorithms?

tl;dr: no, it cannot, at least not on average on the standard archive pr...

Flexible and objective time series analysis: a loss-based approach with two-piece location-scale distributions

Two-piece location-scale models are used for modeling data presenting de...

Identification of Spikes in Time Series

Identification of unexpectedly high values in a time series is useful fo...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Exponential smoothing (ES) methods model current and future time series observations as a weighted combinations of past observations, with more weight given to recent data. The word ‘exponential’ reflects the exponential decay of weights for older observations. ES methods have been around since the 1950s, and are still very popular forecasting methods used in business and industry, including supply chain forecasting [4], stock market analysis [15, 12, 3], weather prediction [16, 13], and electricity demand forecasting [14, 11].

In contrast to many techniques in machine learning, ES provides simple and interpretable models and forecasting capability by assuming a fixed structure for the evolution of the time series. For example, a simple (level only) model is


where is an observation at time , and

is the estimate of

at time given . The forecast at is adjusted by a fraction of the error at time ; larger means greater adjustment. Iterating (1), we have


illustrating the exponential decay.

(a) Noisy series fit by HW (green), Robust
HW (red transparent), and ES cell (red solid).
(b) Level (c) Trend
(d) Seasonality (e) Forecast (100 steps).
Figure 1: Limitations of classic ES smoothing (e.g. Holt-Winters and robust variants) are eliminated by the new ES-Cells approach. In all panels, standard Holt-Winters (HW) estimates are shown in green. Robust Holt-Winters (RHW) [6] results, obtained by pre-filtering using a robust loss, are shown in transparent red. ES cell results are shown in solid red. Panel (a) displays the full time series (outliers plotted as triangles). HW results are noisy; RHW improves after a ‘burn-in’ period; ES cell recovers the underlying time series. Panels (b), (c), and (d) compare truth (blue) to estimates of level, trend, and seasonality components. Panel (e) shows mean-only forecast for 100 steps ahead.

The in (1) is fit using available data and used across the entire period of interest. More generally, a time series model also includes trend (long-term direction) and periodic seasonality components, with additional smoothing terms (

for these components. Classic ES methods use observed data to fit smoothing components, error variance, and initial state, and then use the quantities to provide point forecasts and quantify uncertainty. Every additive ES can be formulated using the compact single source of error (SSOE) representation 

[9, 8]:


where is a linear measurement model, is a linear transition function and

is the vector of smoothing parameters. In (

1), we have , , and , and . More generally, tracks the deterministic components of the time series (level, trend, seasonality) while adjusts for stochastic disturbances.

Flexibility of ES models

To show how ES models are constructed and transformed into (3), we compare the simple linear, Holt’s linear, and Holt-Winters models. The simple linear model from (1) tracks the level using a zero-order polynomial approximation:


Holt’s Linear Model uses a first order (tangent) line approximation, and tracks both level and trend :


Finally, the Holt-Winters model adds a seasonality component with known periodicity , giving the augmented state , see (6).


To write the Holt-Winters model in form of (3), take

(a) Local estimation by a single ES cell (b) Overall approach: ES-Cells linked by a dynamic model.
Figure 2: Time series using ES-Cells. Left: a single ES cell, estimating parameters over a particular window of size . Right: overall approach, with multiple ES-Cells linked by a dynamic process model.

Limitations of classic ES

SSOE models are attractive in their simplicity, since they use a single parameter vector to model perturbations. However, the same must applied at every time point; this limits model flexibility and ensures estimates of are strongly affected by artifacts in the data, including noise and outliers. Consider a toy example (3) with two measurements; and are found by solving


If is an outlier, it affects both and in the fit. Robust statistics are powerless for SSOE models: ignoring necessarily leaves a large , which affects . The propagation is recursive:

Each appears in all , , so an outlier at any point has to affect and , which control the entire time series. This phenomenon does not occur in other time series formulations, including AR models, where robust methods have been developed, see e.g. [10].

Earlier work in robust Holt-Winters (RHW) uses M-estimators to filter the observations  [5, 6]

, and then applies standard HW. In contrast, ES-Cells formulates a single problem to analyze the entire time series, simultaneously denoising, decomposing, and imputing.

The classic approach (8) is nonconvex, and has weak guarantees: stationarity conditions (such as ) do not imply global optimality, and solutions found by iterative methods depend on the initial point. As the level of noise and outliers increases, the ability of black-box optimizers to get reasonable and breaks down. The ES-Cells approach uses a strongly convex formulation; it has a unique global minimum and no other stationary points.

We created a synthetic time series with trend and level shifts, as well as heteroscedastic noise and outliers in the observations. Figure 

1 compares the performance of the HW model111Implemented in the standard Holt-Winters R module [7] and RHW [6]222We implemented the approach. In[6, Eqs. 13,14], we take , and . Automated methods for obtaining smoothing parameters and failed in the presence of noise; so for the HW model, we use hand-tuned parameters , , , with the first 50 elements of the noisy . to the ES-Cells approach333https://github.com/UW-AMO/TimeSeriesES-Cell. HW propagates outliers and is adversely affected by heteroscedastic noise, affecting the estimates of level, trend, and especially seasonality components (see Figure 1 (b-d)). This lack of robustness gives a poor understanding of the overall time series (Figure 1 (a)) and leads to low forecasting accuracy, as corrupted errors are propagated in future times (see Figure 1 (e)). For RHW, automatic approaches to find failed, and we had to hand-tune parameters; the final result improves on HW but requires a long ‘burn-in’ period, and still produces a somewhat noisy forecast. The ES-Cells approach captures and removes outliers and heteroscedastic noise, and correctly identifies the components.

Time series estimation using ES-Cells

The ES-Cells approach is constructed from interconnected building blocks. The basic cell consists of local ES estimation over a fixed window, equipped with a convex regularization term (for denoising) and a robust loss function (to guard against outliers), see Fig. 

2 (a). The cells are then linked together by the time series dynamics, but allowing discrepancies between and , see Fig. 2(b). These differences are treated as samples of

, analyzed, and used to build forecasting confidence intervals. Fitting the entire ES-Cells model is a convex problem, and can be done at scale using efficient methods for dynamic optimization 

[1, 2].

1.1 ES cell model

First we formulate inference for a single ES cell. Given a time point and an integer , we take a window of size that includes all the points in the interval . Some measurements can be missing; and no time point outside has measurements. To model these cases, we introduce indicator variables

We also define a unimodal sequence of weights , with

The estimate depends only on observations in the times , and is obtained by propagating the estimate at the start of the window at time to the middle of the window at time , where :


The estimate solves the optimization problem


where , and extracts the difference of two seasonality components from the state . The objective function (10) extends the classic ES approach (8) in three respects.

  1. The terms keep track of missing observations.

  2. The loss used to compare to is robust to outliers.

  3. The term adds total variation regularization for the seasonality components.

The objective function (10) is convex, as long as the loss and regularizer are both chosen to be convex.

Before presenting the fully linked dynamic model, we rewrite (10) more compactly, avoiding sums. Define

We can now write (10) as


1.2 Linking the ES-Cells

Each cell estimate only depends on local data. To connect and , we assume that the estimates satisfy

where, in contrast to the error term , are i.i.d. Gaussian errors. This is equivalent to adding the penalty

see (9). This links together objectives of form (11) to generate a single objective over the entire sequence


The problem in (12) is nonsmooth but convex, and has dynamic structure. It has far more variables than the classic nonconvex ES formulation in (8). Nonetheless, it can be efficiently solved at scale using recent algorithms for generalized Kalman smoothing [1, 2]. Given , the final time series estimate is given by

(a) Level: Forecast + 99% CI.   (b) Trend: Forecast + 99% CI.
(c) Seas.: Forecast + 99% CI.   (d) TS: Forecast + 99% CI .
Figure 3: Forecasting in the ES-Cells framework. Estimated smoothing parameters are sampled to provide 99% confidence intervals for all components; panels (a), (b), (c) show forecasts for level, trend, and seasonality forecasts, while (d) shows combined forecast for time series. Filtered signals are blue; mean forecast is green, with the 99% CI shown in red.

Time series forecasting using ES-Cells

ES-Cells capture two main sources of uncertainty that are important for forecasting future values of a time series: uncertainty in the residuals , and in the smoothing parameters . ES-Cells track these two sources of uncertainty and can be used to create two separate confidence intervals: one representing the variability of each component of the signal, and the other capturing the structure of the residual.

Solving the full problem (12), we obtain the entire sequence , as well as corresponding estimates of residuals and smoothing parameters :

In order to obtain the prediction distribution, we simulate sample paths from the models, using the empirical distribution of and , and conditioned on the final state. This allows us to estimate any desired characteristics of the prediction distribution at a specific forecast horizon, and in particular to estimate confidence intervals that incorporate smoothing parameter and residual uncertainties. We can also incorporate model-based residuals (instead of using the empirical distribution) by generating forecasted values from any given distribution.

To illustrate the ES forecasting framework, Figure 3 presents forecasts for the noisy synthetic model introduced in Figure 1. In particular, 100 step ahead forecasts and their 99% confidence intervals (using 10000 Monte Carlo runs) are shown for trend and seasonality (panels (a) and (b)). These are obtained by using the empirical distribution. The forecast for the full time series (and a zoomed plot) are shown in panels (c) and (d). The inner 99% CI (strictly inside the shaded region) takes into account only uncertainty in smoothing parameters , while the outer CI (the border for the shaded region) takes into account uncertainty in . Since the time series is contaminated by outliers, the outer CI is very wide in this case.

Real world Time Series : Twitter’s user engagement dataset

To test our algorithm we examine an anonymized time series representing user engagement on Twitter. This dataset is publicly available on its official blog444https://blog.twitter.com/2015/introducing-practical-and-robust-anomaly-detection-in-a-time-series and is fully representative of the challenges tackled in this paper:

  • Distinct seasonal patterns due to user behavior across different geographies

  • An underlying trend which could be interpreted as organic growth (new people engaging with the social network)

  • Anomalies or outliers due to either special events surrounding holidays (christmas, breaking news) or unwanted behavior (bots or spammers)

  • Underlying heteroscedastic noise. embodying the variance of the signal.

(a) Classic Holt-Winters Analysis and Forecast (b) ES-Cells Analysis and Forecast.
Figure 4: Analysis of Twitter user engagement data, using classic Holt-Winters Model (a) and ES-Cells approach (b). Classic approach (a) fits the outliers, obtaining very wide 99% CI and, and forecasts sharp growth of the average user engagement. the ES-Cells approach does not fit the outliers; obtains tighter 99% CI, and correctly forecasts a decrease in user engagement. Moreover, the 99% CI in (b) is not symmetric; it is far tighter above than below.
Figure 5: A comparison of the mean absolute percentage error (MAPE), to quantify the improvement of ES-Cells (red) over H-W (purple) for the twitter data in Figure 4.

The dataset was originally put online to showcase a robust anomaly detection procedure. With the ES-Cells framework, we can go much further, decomposing the time series into interpretable components, and then forecasting both the components and the entire time series under uncertainty. The original aim (anomaly detection) is easily accomplished by studying the tail of the empirical residual distribution, as discovered by the approach.

The classic Holt-Winters model fits outliers, forecasting sharp growth of engagement, which misses the observed trend (Figure 4(a)), and finds a very wide 99% confidence interval, In contrast, the ES-Cells approach (Figure 4(b)) avoids fitting the outliers; the average forecast correctly captures the decrease in the trend, and provides a much tighter asymmetric 99% CI. The improvement can be quantitatively assessed by looking at the Mean Absolute Percentage Error (MAPE) for the forecasted time series over a sliding window of 10 observations, Figure 5. Traditional Holt-Winters has a higher MAPE than ES-Cell at every time point; moreover, the MAPE of the ES-Cells method is stable over time, while the MAPE of ES increases, illustrating its failure to robustly capture the long term trend of the time series. Trend, level and seasonality are shown in Figure 7. There is a clear decrease in level and trend, which are detected despite the large amounts of noise in the data.

Figure 6: Anomaly detection from the ES-Cells fit. The 1.5% most extreme observations are highlighted using red dots.

1.3 Anomaly Detection

After fitting the ES procedure, we are left with a residual that we can analyze to understand anomalies in the time series. Figure 6

shows an example of outlier detection by looking at the 1.5% tails of the residual distribution.

(a) Trend, Forecast, & 99% CI.
(b) Level, Forecast, & 99% CI.
(c) Seas., Forecast, & 99% CI
Figure 7: Twitter dataset: Forecasts and 99% CI for trend (a), level (b) and seasonality (c) obtained by the ES-Cells approach. There is a clear downward direction in level and trend.

1.4 Robust auto completion of missing data

The ES-Cell algorithm is also robust to missing observations. Whether the data is missing at random, or in significant contiguous batches, it is automatically filled in by the ES-Cells algorithm. Since the problem is solved globally, nearby outliers do not affect the interpolated values, in contrast to local interpolation methods. Figure 

9 shows the result obtained by removing two contiguous chunks of 100 observations each in two distinct parts of the time series. The data is automatically ‘in-filled’ using the ES-Cell procedure.

(a) Classic approach (b) Global approach
Figure 8: Classic TS analysis is sequential (a); data are pre-processed, then decomposed into components using models such as Holt-Winters. The approach is limited, because information about underlying structures such as level, trend, and seasonality is not available during the pre-processing. The new approach (b) is global; cleaning, interpolation, and decomposition are done in a unified context. Downstream applications, including anomaly detection and forecasting, significantly improve.


ES-Cells is a new model for time series inference, capable of fitting and forecasting data in situations with high noise, frequent outliers, and large contiguous portions of missing data. These features are present in many real-world large-scale datasets. The ES-Cells formulation differs from previous model in its global approach, as shown in Figure 8. We simultaneously denoise, impute, and decompose the time series by solving a single convex optimization problem with dynamic structure; then use sampling-based strategies for forecasting and uncertainty quantification. The results are illustrated on simulated and real data, where the proposed method yields a 5-fold improvement in MAPE for the forecasting. The simplicity and versatility of the ES-Cells formulation makes it a superior alternative to Holt-Winters and related time series models. Code for the approach and experiments is publicly available555https://github.com/UW-AMO/TimeSeriesES-Cell.

(a) Deleted data

(b) Auto fill-in.

Figure 9: ES-Cells can fill in data missing both at random and in contiguous chunks. In panel (a), two sections of 100 observations each have been deleted. Panel (b) shows the in-filled values (green) plotted over the data from panel (a) (red). The deleted data from panel (a) is shown in blue in panel (b).


The work of A. Aravkin was supported by the Washington Research Foundation Data Science Professorship.


  • [1] A. Aravkin, J. V. Burke, L. Ljung, A. Lozano, and G. Pillonetto. Generalized kalman smoothing: Modeling and algorithms. arXiv preprint arXiv:1609.06369, Survey to appear in Automatica, 2017.
  • [2] A. Y. Aravkin, J. V. Burke, and G. Pillonetto. Sparse/robust estimation and kalman smoothing with nonsmooth log-concave densities: Modeling, computation, and theory. Journal of Machine Learning Research, 14:2689–2728, 2013.
  • [3] R. G. Brown and R. F. Meyer. The fundamental theorem of exponential smoothing. Operations Research, 9(5):673–685, 1961.
  • [4] F. Chen, J. K. Ryan, and D. Simchi-Levi. The impact of exponential smoothing forecasts on the bullwhip effect. Naval Research Logistics, 47(4):269–286, 2000.
  • [5] T. Cipra. Robust exponential smoothing. Journal of Forecasting, 11(1):57–69, 1992.
  • [6] S. Gelper, R. Fried, and C. Croux. Robust forecasting with exponential and holt–winters smoothing. Journal of forecasting, 29(3):285–300, 2010.
  • [7] C. C. Holt. Forecasting seasonals and trends by exponentially weighted moving averages. International journal of forecasting, 20(1):5–10, 2004.
  • [8] R. Hyndman, A. B. Koehler, J. K. Ord, and R. D. Snyder. Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008.
  • [9] R. J. Hyndman, A. B. Koehler, R. D. Snyder, and S. Grose. A state space framework for automatic forecasting using exponential smoothing methods. International Journal of Forecasting, 18(3):439–454, 2002.
  • [10] R. Maronna, R. D. Martin, and V. Yohai. Robust statistics. John Wiley & Sons, Chichester. ISBN, 2006.
  • [11] I. Moghram and S. Rahman. Analysis and evaluation of five short-term load forecasting techniques. Power Systems, IEEE Transactions on, 4(4):1484–1491, 1989.
  • [12] R. S. Pindyck. Risk, inflation, and the stock market. Technical report, National Bureau of Economic Research, 1983.
  • [13] S. S. Soman, H. Zareipour, O. Malik, and P. Mandal. A review of wind power and wind speed forecasting methods with different time horizons. In North American Power Symposium (NAPS), 2010, pages 1–8. IEEE, 2010.
  • [14] J. W. Taylor. Short-term electricity demand forecasting using double seasonal exponential smoothing. Journal of the Operational Research Society, 54(8):799–805, 2003.
  • [15] J. W. Taylor. Volatility forecasting with smooth transition exponential smoothing. International Journal of Forecasting, 20(2):273–286, 2004.
  • [16] J. W. Taylor and R. Buizza. Neural network load forecasting with weather ensemble predictions. Power Systems, IEEE Transactions on, 17(3):626–632, 2002.