1 Introduction
In this paper, we propose to use a matrix recovery algorithm using nonnegative matrix factorization (NMF, [1]) on temporal aggregates on multiple time series. This is motivated by the use case of residential electricity metering. Residential utility consumption used to be recorded only once every few months. Nowadays, with smart meters, measurements may be recorded locally every minute or second but only transmitted as daily sums, due to data transmission, processing costs and/or privacy issues. Recent advances in matrix completion have made it clear that when a large number of individuals and features are involved, even partial data could be enough to recover much of lost information, thanks to the lowrank property: although the whole data matrix is only partially known, if , where , with much smaller than both and , one might be able to recover entirely.
Consider a nonnegative matrix of time series of periods. An entry of this matrix, represents, for example, the electricity consumption of Client for Period . We consider the case where the observations consist of
temporal aggregates, represented by a data vector
, where is a dimensional linear operator. To recover from the measurements , we search a lowrank NMF of : , where . The columns of are nonnegative factors, interpreted as typical profiles of the time series, and the columns of contain the weights of each individual. To estimate and, we minimize a quadratic loss function under nonnegativity and data constraints:
(1)  
s.t. 
where (or ) means that the matrix (or the vector ) is elementwise nonnegative.
Our measurement operator is a special instance of the trace regression model [2] which generalizes the matrix completion setting [3]. In matrix completion each measurement is exactly one entry. Various forms of linear measurements other than matrix completion have been considered for matrix recovery without nonnegativity [4, 5, 6]. To the best of our knowledge, most studies in NMF are either focused on full observation for dimension reduction [7, 8], or on matrix completion [9, 10]. We extend classic alternating direction NMF algorithms to use linear measurements as observations. This extension is discussed in Section 2.1.
In realworld applications, global information such as temporal autocorrelation could be available in addition to measurements. Previous approaches combining matrix factorization and autoregressive structure are often focused on obtaining factors that are more smooth and/or sparse, both in NMF [11, 12, 13] and without nonnegativity [14, 15]. Our objective is different from these studies: we try to further improve the matrix recovery by constraining temporal correlation on individual time series (not factors). We use a recent convex relaxation of quadratically constrained quadratic programs [16] to deduce an algorithm in this case. This is discussed in Section 2.2. In Section 3, both algorithms are validated on synthetic and real electricity consumption datasets.
2 Matrix of nonnegative time series
2.1 Iterative algorithm with simplex projection
We represent temporal aggregation by a linear operator . For each , the th measurement on , , is the sum of several consecutive rows on a column of . Each measurement covers a disjoint index set. Not all entries of are necessarily involved in the measurements.
We use a block GaussSeidel algorithm (Algorithm 1) to solve (1). We alternate by minimizing over or , keeping the other two matrices fixed. Methods from classical NMF problems are used to update and [17]. In our implementation, we use two variants that seem similarly efficient (more details in Section 3): Hierarchical Alternating Least Squares (HALS, [18]), and a matrixbase NMF solver with Nesterovtype acceleration (NeNMF, [19]).
When and are fixed, the optimization problem on is equivalent to simplex projection problems, one for each scalar measurement. For , we have
(2)  
s.t. 
We use the simplex projection algorithm introduced by [20] to solve this subproblem efficiently. In Algorithm 1, we note this step as the projection operator, , into the affine subspace . Projector encodes the measurement data .
We use a classical stopping criterion in the NMF literature based on KarushKuhnTucker (KKT) conditions on (1) ([7, Section 3.1.7]). We calculate , and The algorithm is stopped if , for a small threshold .
Algorithm 1 can be generalized to other types of measurement operators , as long as a projection into the affine subspace verifying the data constraint can be efficiently computed.
2.2 From autocorrelation constraint to penalization
In addition to the measurements in , we have some prior knowledge on the temporal autocorrelation of the individuals. For , suppose that the lag1 autocorrelation of Individual ’s time series is at least equal to a threshold (e.g. from historical data), that is,
(3) 
Notice that with the lag matrix, , . We define , for a threshold . Inequality (3) is then equivalent to To add an additional projection step in Algorithm 1 would mean solving quadratically constrained quadratic programs (QCQP) of the form:
(4)  
s.t. 
Let be a diagonalization of . The matrix is orthogonal, because is symmetric. Let be the diagonal entries of . In our case, , . This means that (4) is nonconvex for most of . Nevertheless, the following theorem shows that its optimum can be computed relatively easily.
Theorem 1
Suppose that has at least one strictly positive component, and has no zero component (generically satisfied). There exists , so that is the optimal solution of (4).
Proof. We follow [16] to get a convex relaxation of (4). Notice that
are all distinct (nondegenerate) eigenvalues. Define
. Recall that , , and that .Problem (4) is equivalent to the nonconvex problem
(5)  
s.t. 
Now consider the convex problem
(6)  
s.t. 
By [16, Theorem 3], if is an optimal solution of (6), and if , , then is also an optimal solution of (5), which makes an optimal solution of (4).
Problem (6) is convex, and it verifies the Slater condition: . This is true, because . We could choose and small but strictly positive values for other components of so as to have , and . Thus, (6) always has an optimal solution, because the objective function is coercive over the constraint. This shows the existence of .
Now we show that , . The KKT conditions of (6) are verified by . In particular, there is some dual variable that verifies,
(7)  
(8)  
(9)  
(10) 
Since , we have by (9). The values for can be deduced from (7). By (10), . Therefore, . This shows that is an optimal solution for (4).
Theorem 1 shows that with a properly chosen , Problem (4) can be easily solved. However, computing exactly is computationally heavy. Therefore, we propose to replace the constraints by penalty terms in the objective function in (1), which becomes
(11)  
s.t. 
where is a single fixed regularization parameter.
When and are fixed, the subproblem on can be separated into constrained problems of the form,
(12)  
s.t. 
where is the th column of , is the observations on the th column, and is a matrix which encodes the observation pattern over that column. The following theorem shows how to solve this problem.
Proof. Let be the dimension of . Define as the indicator function for the constraint of (12), that is and Problem (12) is then equivalent to
(13) 
The subgradient of (13) is . When , (13) is convex. Therefore, is a minimizer if and only if , and . That is, The vector thereby verifies .
The by matrix is invertible, because is smaller than , and is of full rank (because each measurement covers disjoint periods). Therefore,
Both and are always invertible under the conditions of the theorem. The matrix inversion only needs to be done once for each individual. After computing and for each , we use Algorithm 2 to solve (11).
Convergence to a stationary point has been proved for past NMF solvers with the full observation or the matrix completion setting ([19, 17]). Our algorithms have similar convergence property. Although the subproblems on and do not necessarily have unique optimum, the projection of attains a unique minimizer. By [21, Proposition 5], the convergence to a stationary point is guaranteed.
An optimal value of could be calculated. Substituting the values of in (8), shows that is a root of the polynomial . It also verifies , where is the biggest eigenvalue of . The rootfinding is too expensive to do at every iteration. In our experimental studies, we chose in the penalization when the constraint in (4) is active, and (no penalization) when the constraint is verified by .
3 Experimental results
We used one synthetic dataset and three realworld electricity consumption datasets to evaluate our algorithms. In each dataset, the individual autocorrelation is calculated on historical data from the same datasets, excluded in matrix recovery.

Synthetic data: 120 random mixtures of 20 independent Gaussian processes with Matern covariance function (shifted to be nonnegative) are simulated over 150 periods ().

French electricity consumption (proprietary dataset of Enedis): daily consumption of 636 groups (mediumvolume feeders) of around 1,500 consumers based near Lyon in France during 2012 ().

Portuguese electricity consumption[22] daily consumption in kW of 370 clients during 2014 ().
For each individual in a dataset, we generated observations by selecting a number of observation periods. The temporal aggregates are the sum of the time series between two consecutive observation periods. The observation periods are chosen in two possible ways: periodically (at fixed intervals, with the first observation period sampled at random), or uniformly at random. The fixed intervals for periodic observations are . With random observations, we used sampling rates that are equivalent to these intervals. That is, the number of observations verifies .
We applied both algorithms on each dataset, with ranks . When autocorrelation penalization is used, we chose . With a recovered matrix obtained in an algorithm run, we compute the normalized error in Frobenius norm: . Error rates for the best rank (smallest oracle error) is then reported. In practice, the choice of rank should be dictated by the interpretation, i.e. the number of profiles () needed in the application.
Each experiment (dataset, sampling scheme, sampling rate, unpenalized or penalized, implementation of update on and ) is run three times, and the average error rate is reported in Figure 1. Both or
update implementations (NeNMF with solid lines, HALS with dotted lines, almost identical) have equivalent error rates. As a comparison to our algorithms, we also provide the error rate of an interpolation method: temporal aggregates are distributed equally over the covered periods (red dotdashed line). The figure is zoomed to show the error rates of the proposed algorithms. Much higher error rates for interpolation are sometimes not shown.
Proposed algorithms, whether unpenalized (green lines) or penalized (blue lines), outperforms the interpolation benchmark by large when the sampling rate is smaller than 10%. With higher sampling rates, unpenalized recovery continues to works well on random observations (green lines in lower panels), but less so on real datasets with periodic observations (green lines in upper panels). Real electricity consumption has significant weekly periodicity, which is poorly captured by observations at similar periods. However, this shortcoming is more than compensated for by autocorrelation penalization (blue lines). Penalized recovery consistently outperforms interpolation with both observation schemes. This makes penalized recovery particularly useful for realworld applications, where it may be costly to change the sampling scheme from periodic to random (i.e. electricity metering).
4 Perspectives
We extended NMF to use temporal aggregates as observations, by adding a projection step into NMF algorithms. With appropriate projection algorithms, this approach could be further generalized to other types of data, such as disaggregating spatially aggregated data, or general linear measures.
When such information is available, we introduced a penalization on individual autocorrelation, which improves the recovery performance of the base algorithm. This component can be generalized to larger lags (with a matrix with 1’s further off the diagonal), or multiple lags (by adding several lag matrices together). It is also possible to generalize this approach to other types of expert knowledge through additional constraints on .
References
 [1] Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 401(6755):788–791, 1999.
 [2] Angelika Rohde and Alexandre B. Tsybakov. Estimation of highdimensional lowrank matrices. The Annals of Statistics, 39(2):887–930, 2011.
 [3] Emmanuel J. Candès and Benjamin Recht. Exact Matrix Completion via Convex Optimization. Foundations of Computational Mathematics, 9(6):717–772, 2009.
 [4] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimumrank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501, 2010.
 [5] Emmanuel J. Candès and Yaniv Plan. Tight Oracle Inequalities for LowRank Matrix Recovery From a Minimal Number of Noisy Random Measurements. IEEE Transactions on Information Theory, 57(4):2342–2359, 2011.

[6]
Or Zuk and Avishai Wagner.
LowRank Matrix Recovery from RowandColumn Affine
Measurements.
In
International Conference on Machine Learning (ICML)
, 2015. 
[7]
Nicolas Gillis.
The why and how of nonnegative matrix factorization.
Regularization, Optimization, Kernels, and Support Vector Machines
, 12:257, 2014.  [8] Pierre Alquier and Benjamin Guedj. An Oracle Inequality for QuasiBayesian NonNegative Matrix Factorization. axXiv preprint arXiv:1601.01345, 2016.
 [9] Nicolas Gillis and François Glineur. Lowrank matrix approximation with weights or missing data is NPhard. SIAM Journal on Matrix Analysis and Applications, 32(4):1149–1165, 2011.

[10]
Yangyang Xu and Wotao Yin.
A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion.
SIAM Journal on Imaging Sciences, 6(3):1758–1789, 2013.  [11] Zhe Chen and Andrzej Cichocki. Nonnegative matrix factorization with temporal smoothness and/or spatial decorrelation constraints. Laboratory for Advanced Brain Signal Processing, RIKEN, Tech. Rep, 68, 2005.
 [12] Cédric Févotte and Jérôme Idier. Algorithms for nonnegative matrix factorization with the betadivergence. Neural Computation, 23(9):2421–2456, 2011.
 [13] Paris Smaragdis, Cédric Févotte, Gautham J. Mysore, Nasser Mohammadiha, and Matthew Hoffman. Static and Dynamic Source Separation Using Nonnegative Factorizations: A unified view. IEEE Signal Processing Magazine, 31(3):66–75, 2014.
 [14] Madeleine Udell, Corinne Horn, Reza Zadeh, and Stephen Boyd. Generalized low rank models. arXiv preprint arXiv:1410.0342, 2014.
 [15] HsiangFu Yu, Nikhil Rao, and Inderjit S. Dhillon. Highdimensional Time Series Prediction with Missing Values. arXiv preprint arXiv:1509.08333, 2015.
 [16] Aharon BenTal and Dick den Hertog. Hidden conic quadratic representation of some nonconvex quadratic optimization problems. Mathematical Programming, 143(12):1–29, 2013.
 [17] Jingu Kim, Yunlong He, and Haesun Park. Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework. Journal of Global Optimization, 58(2):285–319, 2014.
 [18] Andrzej Cichocki, Rafal Zdunek, and Shunichi Amari. Hierarchical ALS algorithms for nonnegative matrix and 3d tensor factorization. In Independent Component Analysis and Signal Separation, pages 169–176. Springer, 2007.
 [19] Naiyang Guan, Dacheng Tao, Zhigang Luo, and Bo Yuan. NeNMF: An Optimal Gradient Method for Nonnegative Matrix Factorization. IEEE Transactions on Signal Processing, 60(6):2882–2898, 2012.
 [20] Yunmei Chen and Xiaojing Ye. Projection Onto A Simplex. arXiv preprint arXiv:1101.6081, 2011.
 [21] Luigi Grippo and Marco Sciandrone. On the convergence of the block nonlinear Gauss–Seidel method under convex constraints. Operations Research Letters, 26(3):127–136, 2000.
 [22] Artur Trindade. UCI Maching Learning Repository  ElectricityLoadDiagrams20112014 Data Set. http://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014, 2016.
 [23] Commission for Energy Regulation, Dublin. Electricity smart metering customer behaviour trials findings report. 2011.
 [24] Commission for Energy Regulation, Dublin. Results of electricity coastbenefit analysis, customer behaviour trials and technology trials commission for energy regulation. 2011.
Comments
There are no comments yet.