I Introduction
For a largescale power system, multiple events can hardly be identified properly as it is difficult to distinguish the features of multievent from the ones of singleevent. The multievent poses a more serious threat to the systems: it can hardly be identified, and thus be addressed, which may lead to a widespread blackout.
This paper proposes a statistical, datadriven solution, rather than its deterministic, empirical or modelbased counterpart, to solve the problem given above. The study is built upon our previous work in the last several years. See Section IB for details.
Ia Contribution
This paper, based on random matrix theory (RMT), proposes a high statistical tool, namely, factor model, for multievent detection and recognition in a modern grid. This paper extracts the spatial and temporal information from the massive raw data, respectively, in the form of principal components (factors) and residuals (bulk). The factors map event signals, and the residuals map white/nonGaussian noises.
The proposed method can be used for multievent analytics effectively. To the factors, we experimentally obtain that there is a linear relationship between the number of factors and the event number of the multievent. To the residuals, on the other hand, we extract their information rather than simply assuming it to be identically independent “pure white noise” as
[1]. Time series information contained in noise, together with the spatial information in factors, reveals the multievent status. The proposed method is practical for realtime analysis.Besides, the proposed solution is modelfree, requiring no knowledge of topologies or parameters of the power system [2, 3, 4], and able to handle nonGaussian noises. To the best of our knowledge, it is the first time to propose an algorithm aiming at multievent detection based on random matrix theory in the field of power systems.
IB Related Work
In our previous work, a universal architecture with big data analytics is proposed [5]
and is applied for anomaly detection
[6, 7]. Little work, howerve, has been done to multievent analytics in a complex situation. Current researches on event analysis are mainly modelbased, aiming at singleevent analytics. They may not be suitable for realtime analytics in a complex situation [8]. Some other methods adopt graph theory [9, 10]; the methods strongly depend on the structure of the power system.Some datadriven methods for event analysis are proposed recently [11] and applied for multievent analytics [12].
Rafferty utilizes principal component analysis (PCA) for realtime multievent detection and classification in
[12]. In his approach, a threshold of cumulative percentage of total variation is selected in advance. Then, the number of principal components is determined according to the threshold mentioned above. The threshold is selected empirically and subjectively. Besides, for other supervised tools, like deep learning
[13] and kernelbased algorithms [14], which are hotspot to datadriven approaches, the same problem is inevitable. The deep learning algorithms automatically select the features from the massive datasets. This is one big advantage of deep learning over our paradigm. Our paradigm, however, has the advantage of transparency in that our results are provably. Also, our paradigm is deeply rooted in random matrix theory.Ii Theory Foundation and Data Processing
The frequently used notations are given in Table 1.
Iia Random Matrix Theory and Spectral Analytics
Random matrices have been an important issue in multivariate statistical analysis since the landmark work of Wigner and Wishart [18]
, motivated by problems in quantum physics. Factor model, on the other side, can be used to identify nonrandom properties (in the form of spikes/outliers) which are deviations from the universal predictions (in the form of bulk)
[19, 20]. To be specific, the eigenvalues of covariance matrix (spectrum) can usually be divided into two parts: a bulk and several spikes. The bulk represents the noises, while the spikes represent the signals, namely, factors.
In previous work, noises are usually assumed to be identically independent in power systems, namely, white noises. However, ”no information” or ”pure noise” assumption is invalid in practice. For instance, for a certain PMU, there exits time correlation between the measured voltage magnitude data of adjacent sampling points [21, 22]. The time correlation is nonignorable, especially for a large interconnected system! This paper formulars the noises using time series analytics.
Notations  Means 


a matrix, a vector, an entry of a matrix 
mean, variance for 

raw data source  
dimensional complex space  
the row and column size of moving splitwindow  
the number of measurable status variables  
the sampling time  
a raw data matrix  
a standard nonHermitian matrix  
the number of factors  
the covariance structure of residualss  
a complex eigenvalue,  
the estimated value of , 
Reference [23] provides a fundamental theory to estimate noises, and formulates them as:
(1) 
where is an matrix with i.i.d (identically independent distribution) Gaussian entries, and and are and symmetric nonnegative definite matrices, representing cross and auto covariances, respectively. For more details about the model, please refer to Appendix A. On the other side, the spikes (deviating eigenvalues) of the spectrum map the event signals. They represent dominant information for system operating status.
IiB Data Processing
Massive raw data can be represented by matrix naturally [24]. In a power system, assume that there are kinds of measurable variables. At sampling time , we arrange the measured data of these variables in the form of a column vector [25]. Then, arranging the column vectors in chronological order (), we obtain raw data source .
For the raw data source , we can cut off any arbitrary part, e.g., size of , at any time, e.g., samping time , forming as
(2) 
where is measured data at sampling time (). It is worth noting that is the length of the moving splitwindow. If we keep the last sampling time as the current time, with the moving splitwindow, the realtime analytics is conducted.
Then, we convert the raw data matrix obtained at each sampling time into a standard nonHermitian matrix with the following algorithm.
(3) 
where , , , and .
In the following section, is used to analyze the factors and noises at sampling time .
Iii Factor Model Algorithm
For a certain window, e.g. the one obtained at sampling time , we aim to decompose the standard nonHermitian matrix , as given in (3), into factors and residuals as follows:
(4) 
where is the number of factors, is the th factor, is the corresponding loading, is the residual. Usually, only is available, while , and need to be estimated.
Factor model aims to simultaneously provide estimators of factor number and correlation structures in residuals. We turn the parameterestimation problem into a minimumdistance problem. Specifically, we consider a minimum distance between the experimental spectral distribuion and the theoretical spectral distribuion . The experimental one , depending on the sampling data, is obtained as empirical eigenvalue density (EED) of in (7), and the theoretical one , based on Sokhotskys formlua, is given as in (8). As a result, we turn the factor model estimation into a classical optimziation as
(5) 
where
is a spectral distance measure or loss function. The solution of this minimization problem gives the number of factors, in the form of
, and the parameters for the correlation structure of the residuals, in the form of .Iiia Principal Component Estimation :
The first step is to generate level empirical residuals, by substracting largest principal components according to (4).
(6) 
where is a matrix of factors, each row of which is a th principal component from , is an matrix of factor loadings, estimated by multivariate least squares regression of on .
Then the covariance matrix from level residuals is obtained as
(7) 
The subscript “real” indicates that is obtained from real data. The steps can be summarized as follows:
Steps of Calculating 

1.Calculate : each row of which is a th principal component from correlation matrix of , i.e. ; denote as: . 
2.Conduct least squares regression of on : . 
3.Calculate level residual: . 
4.Calculate covariance matrix from level residual: 
. 
5.Calculate the empirical eigenvalue density of . 
IiiB Modeling Covariance of Residuals:
In section , we consider residuals as time series, which is represented by (1). The , however, is difficult to be obtained, since the limiting distribution of general and cost too much calculation resource via Stieltjes transform in [23] Fortunately, a recent work by [26]
provides an analytic derivation of limiting spectral density using free random variable techniques. This paper uses the results of
[26] to calculate . If we assume that the crosscorrelations [23], i.e. , are effectively removed by the factors, then, the crosscorrelations among the normalized residuals are negligible: . Under this assumption, only the autocorrelations, i.e. , left. The is in the forms of exponential decays with respect to time lags, as: . As a result, the is replaced by .This enables us to calculate the modeled spectral density, , much more easily. It can be done through the free random variable techniques proposed in [26] (Refer to Appendix A for analytic derivation). The steps can be summarized as follows:
Steps of Calculating  

1.Get the mean spectral density from Green Function G(z) by using Sokhotsky’s formula:


2. Green Function can be obtained from the Moments Generating Function


3. Solve the polynomial equation for and (a 6thorder polynomial equations for ):

With the above procedure, we can rewrite (5) as
(11) 
IiiC Distance Measure
Since the empirical spectrum contains spikes, a distance measure which is sensitive to the presence of spikes should be given. This paper uses JensenShannon divergence, which is a symmetrized version of KullbackLeibler divergence.
(12) 
where and
are probability densities,
, and is the KullbackLeibler divergence defined by . Note that the KullbackLeibler distance becomes larger if one density has a spike at a point while the other is almost zero at the same point. Refer to Appendix B for more details.Iv Case Studies
The proposed method is tested with simulated data in the standard IEEE 118bus system (the topology is shown in Fig. 1 on the Matpower platform [27]. In this simulations, we regard a sudden power consumption (active power, ) change on some node as an event.
Three cases are designed to validate the proposed method. In case 1, case 2, case 3, we set different numbers of events on node 52, 117, 75, and observe the number of factors, respectively. To make a comparison, we illustrate the results of the three cases in the same picture as Fig. 2.
The raw data source, , is in size of =118, =2500. The size of the moving splitwindow is set to be =118, =250, i.e. .
Then, (11) is used to estimate the parameters and as and . The for the number of the assumed events, and the for the correlation structure of the noises. It is noted that we implement the simulated system model for dozens of times to collect data, the noise of each time follows the same distribution. The dozens times simulation is reasonable, as that we can obtain dozens observations for a real physical system through its sampling data, which have noises following a certain distribtuion.
Then, in the th simulation, is generated. With (11), the estimation result and are obtained. For these and , their mean value and is calculated, which may appear in decimal form. We need to point out that the estimation of and begins at =250 due to the length of the splitwindow. In Fig.2, we amplify the value of for twenty times to make it obvious, i.e. .
The events in Case 1, Case 2, and Case 3 are given as Tab.II, Tab.III, and Tab.IV, respectively. The corresponding estimation resuts, i.e., the number of factors (i.e., ) and the correlation structure of the residuals (i.e., ), are obtained as Fig. 2.
Iva Case 1: Single Event Detection
Single Step Signal on Node 52:
Node  Sampling Time  Power Consumption (MW) 

52  0+  
100+  
others 
* is a constant of node ,
*, are noises following AR(1) model, where
Fig. 2(a) shows that:

During the sampling time ^{1}^{1}1(The beginning of Sigal)(Length of SplitWindow). and remain steady around 3 and 0.28, respectively.

At =500, starts to decline to around 2. Also, declines slightly.
Actually, as can be seen from Tab. II: no event occurs in the system when keeps steady. Right at =500, the changes from 0 to 100MW. Therefore, we can conduct event detection with the proposed method.
Moreover, in this case, for the splitwindow , there exist a singleevent (i.e., step signal on Node 52, , at =500) and 2 factors, i.e., 1 event, during to . A linear relationship between the number of events and the number of factors will be revealed afterwards.
IvB Case 2: Multiple Event Detection (Two Events)
Multiple Step Signal on Node 52 and Node 117:
Node  Sampling Time  Active Load (MW) 
52  0+  
100+  
117  0+  
150+  
0+  
others 
Fig. 2(b) shows that:

During the sampling time ^{2}^{2}2(The beginning of Sigal)(Length of SplitWindow), and remain steady. Thus, we deduce that no event occurs in the system, which meets Tab. III.

At =1300, starts to decline (from 3.023 to 2.603) and then keeps around 2 till =1399. For the splitwindow to , there exist a singleevent (i.e., at =1300) and 2 factors, i.e., 1 event, .

At =1400, starts to raise (from 2.333 to 3.047) and then keeps around 3. For the splitwindow to , there exist two multievent (i.e., at =1300, at =1400) and 3 factors, i.e., 2 event, .
IvC Case 3: Multiple Event Detection (Three Events)
Multiple Step Signal on Node 52, Node 117 and Node 75:
Node  Sampling Time  Active Load (MW) 
52  0+  
100+  
117  0+  
150+  
75  0+  
400+  
others 
Fig. 2(c) shows that:

During the sampling time ^{3}^{3}3(The beginning of Sigal)(Length of SplitWindow), and remain steady. Thus, we deduce that no event occurs in the system, which meets Tab. IV.

At =2250, starts to decline (from 2.873 to 2.603) and then keeps around 2 till =2300. For the splitwindow to , there exist a singleevent (i.e., at =2250) and 2 factors, i.e., 1 event, .

At =2300, starts to raise (from 2.307 to 3.173) and then keeps around 3 till =2400. For the splitwindow to , there exist two multievent (i.e., at =2250, at =2300) and 3 factors, i.e., 2 event, .

At =2400, starts to raise (from 3.433 to 3.547) and then keeps around 4. For the splitwindow to , there exist three multievent (i.e., at =2250, at =2300, at =2400) and 4 factors, i.e., 3 event, .
IvD Further Discussions about the Cases
Through the above three cases, the relationship between the number of events (i.e., ) and the number of factors (i.e., ) is revealed.
The results of the three cases are summarized in Tab. V:
Case  Split Window  
1  1  2  
2  1  2  
2  3  
3  1  2  
2  3  
3  4 
, estimated by factor model, is approximately equal to plus one, i.e. . There exists a linear relationship between them. Therefore, we can deduce the number of the multievent for a certain splitwindow.
Besides, every time an event occurs, drops. It indicates that the correlation in the residuals decreases when there exit events.
V Conclusion
This paper proposes a datadriven method, namely, factor model, to conduct multievent detection and recognition in a large power system. In the analysis procedure, we estimate the number of factors and the parameter for the correlation structure of residuals by minimizing the distance between two spectrums. Then, we conduct realtime analysis of the two parameters, and , using moving splitwindow. The proposed method is direct and practical for multievent analytics in a complex situation. Following conclusions are obtained: First, the number of factors estimated by factor model has an approximately linear relationship with the number of events that occur in the system. Second, taking nonGaussian noises into account, time series analytics is implemented to extract the information from noises. The decrease of parameter is related to the occurrence of events. It is noted that, the number of factors reveal the spatial information (events on different nodes) in the system; while the correlation structure in noises contains temporal information. The proposed method considers spacetime correlation in a large power system. Finally, case studies verify the effectiveness of the method.
Along this direction, following work can be done. For example, we can employ more general modeling for noises. If we consider vector ARMA (1, 1) processes, we have up to 6thorder polynomial equations [26]. Furthermore, the relationship betweeen the number of factors and events can be further studied with physical model.
Appendix A An Overview of Free Random Variable Techniques
We summarize the main concepts and key results in free random variables techniques that we employ to derive . We follow the notations and derivations from [26, 28]. First, consider a simple decomposition of covariance structures:
(13) 
where is an crosscovariance matrix and is a autocovariance matrix, , . Suppose is an Gaussian random matrix. Then a correlated Gaussian random matrix ( time series) can be written as . Its sample (empirical) covariance matrix is
(14) 
Consider a real symmetric random matrix .
Definition 1 Mean Spectral Density
(15) 
where the expectation is taken w.r.t. the rotationally invariant probability measure, is a Dirac delta function, and is a unit matrix.
Definition 2 The Green’s Function (or Stieltjes Transform)
(16) 
The relationship between and is:
(17) 
The Green’s function generates moments of a probability distribution, where the
moment is defined by:Definition 3 Moment
(18) 
Definition 4 Moment Generating Function
(19) 
The relationship between and is
(20) 
Blue’s function and Ntransform are the inverse transform of the Green’s function and moment generating function, respectively.
Definition 5 Blue’s function and Ntransform
(21) 
Then, return to . The Ntransform of can be derived as:
(22) 
Using the moments’ generating function and its inverse relation to Ntransform, can be written as:
(23) 
Now, we consider the simplified model with . In such case,
is a timeseries (AR(1)) following the autoregressive model:
(24) 
where , , . We calculate the eigenvalue distribution of correlation matrix based on the following strategy.
Step 1: Find , from the equation for Ntransform.
Step 2: Find , by .
Step 3: Find , by .
For Step 1, consider . Because , so . Therefore, can be rewritten as:
(25) 
To find , note that the autocovariance matrix of AR(1) process has a simple form:
(26) 
Using Fouriertransform of the matrix B, it can be shown that the moment generating function of B is
(27) 
Therefore, we obtain for Step 1. The other steps are followed straightforwardly as and .
Appendix B KullbackLeibler divergence
The KullbackLeibler divergence is defined as follows:
(28) 
where and are probability densities. To deal with zero elements of , we use:
(29) 
where is a small enough positive number. Denote the number of zero elements in as , . Probability density is dealt with in the same way.
References
 [1] X. Xu, X. He, Q. Ai, and R. C. Qiu, “A correlation analysis method for power systems based on random matrix theory,” IEEE Transactions on Smart Grid, vol. PP, no. 99, pp. 1–10, 2015.
 [2] X. Miao and D. Zhang, “The opportunity and challenge of big data’s application in distribution grids,” in China International Conference on Electricity Distribution, 2014, pp. 962–964.
 [3] J. Zhang and M. L. Huang, “5ws model for big data analysis and visualization,” in IEEE International Conference on Computational Science and Engineering, 2013, pp. 1021–1028.
 [4] M. Mayilvaganan and M. Sabitha, A cloudbased architecture for BigData analytics in smart grid: A proposal, 2013.
 [5] X. He, Q. Ai, R. C. Qiu, W. Huang, L. Piao, and H. Liu, “A big data architecture design for smart grids based on random matrix theory,” IEEE Transactions on Smart Grid, vol. 8, no. 2, pp. 674–686, 2017.

[6]
X. He, R. C. Qiu, Q. Ai, and X. Xu, “An unsupervised learning method for early event detection in smart grid with big data,”
Computer Science, 2015.  [7] L. Chu, R. C. Qiu, X. He, Z. Ling, and Y. Liu, “Massive streaming pmu data modeling and analytics in smart grid state evaluation based on multiple highdimensional covariance tests,” IEEE Transactions on Big Data, vol. PP, no. 99, pp. 1–1, 2016.
 [8] M. J. Smith and K. Wedeward, “Event detection and location in electric power systems using constrained optimization,” in IEEE Power and Energy Society General Meeting, 2009, pp. 1–6.
 [9] S. Soltan, D. Mazauric, and G. Zussman, “Cascading failures in power grids:analysis and algorithms,” in International Conference on Future Energy Systems, 2014, pp. 195–206.
 [10] M. He and J. Zhang, “A dependency graph approach for fault detection and localization towards secure smart grid,” IEEE Transactions on Smart Grid, vol. 2, no. 2, pp. 342–351, 2011.
 [11] L. Xie, Y. Chen, and P. R. Kumar, “Dimensionality reduction of synchrophasor data for early event detection: Linearized analysis,” IEEE Transactions on Power Systems, vol. 29, no. 6, pp. 2784–2794, 2014.
 [12] M. Rafferty, X. Liu, D. M. Laverty, and S. Mcloone, “Realtime multiple event detection and classification using moving window pca,” IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2537–2548, 2016.

[13]
Y. Wang, M. Liu, and Z. Bao, “Deep learning neural network for power system fault diagnosis,” in
Control Conference, 2016, pp. 6678–6683. 
[14]
S. Naderian and A. Salemnia, “An implementation of type‐2 fuzzy kernel based support vector machine algorithm for power quality events classification,”
International Transactions on Electrical Energy Systems, vol. 27, no. 5, pp. –, 2016.  [15] M. Forni, A. Giovannelli, M. Lippi, and S. Soccorsi, “Dynamic factor model with infinite dimensional factor space: Forecasting,” Center for Economic Research, 2016.
 [16] J. Yeo and G. Papanicolaou, “Random matrix approach to estimation of highdimensional factor models,” Papers, 2016.
 [17] Z. Sun, X. Liu, and L. Wang, “A hybrid segmentation method for multivariate time series based on the dynamic factor model,” Stochastic Environmental Research and Risk Assessment, pp. 1–14, 2016.
 [18] E. P. Wigner, “On a class of analytic functions from the quantum theory of collisions,” Annals of Mathematics, vol. 53, no. 1, pp. 36–67, 1951.
 [19] V. Plerou, P. Gopikrishnan, B. Rosenow, L. A. N. Amaral, and H. E. Stanley, “Universal and nonuniversal properties of crosscorrelations in financial time series,” Papers, vol. 83, no. 7, pp. 1471–1474, 2012.
 [20] L. LALOUX, P. CIZEAU, M. POTTERS, and J.P. BOUCHAUD, “Random matrix theory and financial correlations,” International Journal of Theoretical and Applied Finance, vol. 3, no. 03, pp. 391–397, 2000.
 [21] S. C. Chevalier and P. D. H. Hines, “Identifying systemwide early warning signs of instability in stochastic power systems,” in Power and Energy Society General Meeting, 2016, pp. 1–5.
 [22] C. Xu, J. Liang, Z. Yun, and L. Zhang, “The smalldisturbance voltage stability analysis through adaptive ar model based on pmu,” in Transmission and Distribution Conference and Exhibition: Asia and Pacific, 2005 IEEE/PES, 2005, pp. 1–5.
 [23] L. Zhang, “Spectral analysis of large dimentional random matrices,” Ph D, 2007.
 [24] R. Qiu and M. Wicks, Cognitive Networked Sensing and Big Data. Springer Publishing Company, Incorporated, 2013.

[25]
D. L. Donoho, “Highdimensional data analysis: The curses and blessings of dimensionality,” 2000.
 [26] Z. Burda, A. Jarosz, M. A. Nowak, and M. Snarska, “A random matrix approach to varma processes,” vol. 12, no. 1002.0934, pp. 1653–1655, 2010.
 [27] R. D. Zimmerman and C. E. MurilloSánchez, “Matpower 4.1 user’s manual,” Power Systems Engineering Research Center, 2011.
 [28] B. Z, J. J, and W. B, “Spectral moments of correlated wishart matrices.” Phys.rev.e, vol. 71, no. 2 Pt 2, p. 026111, 2005.
Comments
There are no comments yet.