In typical forecasting problems, we make probabilistic estimates of future outcomes based on the previous observations. Recently, it has been shown that forecasting models can be complex nonconvex modelsFlunkert et al. (2017); Wen et al. (2017). Frequent update of these models is desired as the relationship between the targets and outputs might change over time. However, re-training these models can be time consuming.
Online learning is a method of updating the model on each pattern as it is observed as opposed to batch learning where the training is performed over groups of pattern. It is a common technique to dynamically adapt to new patterns in the data or when training over the entire data set is infeasible. The literature in online learning is rich with interesting theoretical and practical applications but it is usually limited to the convex problems where global optimization is computationally tractable Zinkevich (2003). On the other hand, it is NP-hard to compute the global minimum of nonconvex functions over a convex domain Hazan et al. (2017); Hsu et al. (2012).
Due to the intractability of the nonconvex problems, various assumptions on the input have been used to design polynomial-time algorithms Arora et al. (2014); Hsu et al. (2012). However, these were too specific to the models and more generic approach was needed. One way to achieve this is by replacing the “global optimality” requirement with a more modest requirement of stationarity Allen-Zhu and Hazan (2016).
The idea of online learning was borrowed from game theory where an online player answers a sequence of questions. The true answers to the questions are unknown to the player at the time of each decision and the player suffers a loss after committing to a decision. These losses are unknown to the player and the performance of the sequence of decisions will be evaluated by the difference between this accumulated loss and the best fixed decision in hindsight. Most recently,Hazan et al. (2017) proposed a notion of gradient based local regret for nonconvex games.
Inspired by Hazan’s approach and incorporating the notion of calibration, we introduce a novel gradient based local regret for forecasting problems. Calibration is a well-studied concept in forecasting Foster and Vohra (1998). From game theoretic point of view, we call a forecasting procedure “calibrated” if the forecasts are consistent in hindsight. To the best of our knowledge, such definition of regret is new. We show that the proposed regret has logarithmic bound under certain circumstances and we provide insights to the proposed regret. We conjecture that more efficient algorithms can be developed that minimizes our regret.
In online forecasting, our goal is to update at each in order to incorporate the most recently available information. Assume that represents a collection of consecutive points where is an integer and represents an initial forecast point.
are nonconvex loss functions on some convex subset. To put in another way,
represents the parameters of a machine learning model at time, represents the loss function computed using the available data at time given the model parameters .
2.1 Regret Analysis
The performance of online learning algorithms is commonly evaluated by the regret, which is defined as the difference between the real cumulative loss and the minimum cumulative loss across :
If the regret grows linearly with , it can be concluded that the player is not learning. If, on the other hand, the regret grows sub-linearly, the player is learning and its accuracy is improving. While such definition of regret makes sense for convex optimization problems, it is not appropriate for nonconvex problems, due to NP-hardness of nonconvex global optimization even in offline settings. Indeed, most research on nonconvex problems focuses on finding local optima. In literature on nonconvex optimization algorithms, it is common to use the magnitude of the gradient to analyze convergence. Hazan et al. (2017) introduced a local regret measure - a new notion of regret that quantifies the objective of predicting points with small gradients on average. At each round of the game, the gradients of the loss functions from where most recent rounds of play are evaluated at the forecast, and these gradients are then averaged. Hazan et al. (2017)’s local regret is defined to be the sum of the squared magnitude of the gradients averages.
(Hazan’s local regret) The -local regret of an online algorithm is defined as:
when and . Hazan et al. (2017) proposed various gradient descent algorithms where the regret is sublinear.
2.2 Proposed Local Regret
In order to introduce the concept of calibration Foster and Vohra (1998), let’s consider the first order Taylor series expansion of the cumulative loss:
where for any . If the forecasts are well-calibrated, then perturbing by any cannot substantially reduce the cumulative loss. Hence, we can say that the sequence is asymptotically calibrated with respect to , if:
(Proposed Regret) We propose a -local regret as:
where for . To motivate equation 5, we use the following equality:
which holds for the interior points. Using our definition of regret, we effectively evaluate an online learning algorithm by computing the average of losses at the corresponding forecast values over a sliding window. Hazan et al. (2017)’s local regret, on the other hand, computes average of previous losses computed on the most recent forecast. We believe that our definition of regret is more applicable to forecasting problems as evaluating today’s forecast on previous loss functions might be misleading.
3 Bound Analysis
We provide bound for different scenarios for the proposed regret in equation 5 for the interior points in the feasible set with the following assumptions: ; ; parameter update at is: where is the learning rate for some small . We consider three scenarios: (i) , is constant and , (ii) and , (iii) and is constant. We also note the following Theorem whose proof is provided in section 5.1.
3.1 Scenario 1: , is constant and
Since , the update rule becomes ; in other words, no projection operator is necessary. Hence we can write:
as a unit vector such that, we can write . Hence; the bound for the proposed regret becomes:
which can be made sublinear in if is selected large enough.
3.2 Scenario 2: and
Assuming is interior of the feasible set for all and and setting , we can write the result in theorem 3.1 as:
where is set to . Hence, we get:
Summing this over yields:
which concludes the logarithmic bound for the proposed regret for interior points when and .
3.3 Scenario 3: and is constant
Similar to 3.2, we can write:
Summing this result across yields:
which is quadratic in but can be selected accordingly to make the upper bound sub-linear.
We introduced a new definition of a local regret to study nonconvex problems in forecasting. We used the concept of a calibration and showed that our regret can be written as a local regret for the interior points in the feasible set. Our regret differs from Hazan’s regret in the sense that it emphasizes today’s reward as opposed to past reward. We also showed that our definition of regret has a logarithmic bound under some constraints. As a future direction, we plan to study the insights of our regret for the boundary points in the feasible set and propose efficient machine learning algorithms for nonconvex online learning that are optimal in terms of our definition of regret.
- Allen-Zhu and Hazan  Zeyuan Allen-Zhu and Elad Hazan. Variance reduction for faster non-convex optimization. In International Conference on Machine Learning, pages 699–707, 2016.
- Arora et al.  Sanjeev Arora, Rong Ge, and Ankur Moitra. New algorithms for learning incoherent and overcomplete dictionaries. In Conference on Learning Theory, pages 779–806, 2014.
- Flunkert et al.  Valentin Flunkert, David Salinas, and Jan Gasthaus. Deepar: Probabilistic forecasting with autoregressive recurrent networks. arXiv preprint arXiv:1704.04110, 2017.
- Foster and Vohra  Dean P Foster and Rakesh V Vohra. Asymptotic calibration. Biometrika, 85(2):379–390, 1998.
- Hazan et al.  Elad Hazan, Karan Singh, and Cyril Zhang. Efficient regret minimization in non-convex games. arXiv preprint arXiv:1708.00075, 2017.
Hsu et al. 
Daniel Hsu, Sham M Kakade, and Tong Zhang.
A spectral algorithm for learning hidden markov models.Journal of Computer and System Sciences, 78(5):1460–1480, 2012.
- Wen et al.  Ruofeng Wen, Kari Torkkola, and Balakrishnan Narayanaswamy. A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:1711.11053, 2017.
- Zinkevich  Martin Zinkevich. Online convex programming and generalized infinitesimal gradient ascent. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pages 928–936, 2003.
where , for any such that .
Let and recall that . Then we have:
Plugging , we have:
where equation 20 is a result of . By rewriting as , we get:
Note that by replacing with and with in Figure 1, we can see that . Since , we get:
The first term can be rewritten as
The bound for the second term can be written as:
as a result of . The bound for the third term can be rewritten as:
where equation 5 is a result of . Hence, we have:
now, let’s explore the bound for for any . By definition of , we can write: