I Introduction
The distribution network is an important part of the power system, and its reliability is directly related to the safety of the entire system. One main factor that influence the reliability is the anomalies caused by overload, unbalanced threephase voltages or currents, system swing, etc. It is important for the safety analysis and decision making of selfhealing control strategy to detect and locate the anomalies in an early phase. With numerous branch lines and complicated network structure, it becomes difficult to realize anomaly detection and location in the distribution network. What’s more, the anomalies may present intermittent, asymmetric, and sporadic spikes, which are random in magnitude and could involve sporadic bursts as well, and exhibit complex, nonlinear, and dynamic characteristics [1]. On this occasion, modelbased methods are questionable because they are usually based on certain assumptions and simplifications.
Along the years, there have been significant deployments of online monitoring systems in distribution network. The massive data collected through them contain rich information on the operating status of distribution network, which stimulates the research developments of datadriven methods. In [2]
, a realtime anomaly detection and abnormal line identification functionality in active distribution network is proposed. It merges the anomaly detection and location functionalities by using phasormeasurementunit(PMU)based state estimation. In
[3], the dimensionality of the PMU data is analyzed and an online application for early event detection using the reduced dimensionality is proposed. In [4], a densitybased detection algorithm is proposed to detect local outliers. It can differentiate highquality synchrophasor data from the lowquality during system physical disturbances. In
[5], by using massive streaming phasor measurement unit data, a realtime power state detection algorithm based on the multiple high dimensional covariance matrix test is developed. Feasibility of the method is validated and it can effectively reveal the attributes of a system event.The random matrix theory (RMT), which starts in the early 20th century, is one important mathematical tool for statistical analysis of highdimensional data. Through analyzing the statistical characteristics of the eigenvalues of target data matrices, it can reveal the regular or abnormal behavior contained in the data, thus helping to explore the complex interconnected system in macroscopic. So far, the RMT has been widely used in wireless communication
[6], finance [7], quantum information [8][9], etc. However, its application in power system has just begun in recent years. In [10], an architecture with the application of the RMT into smart grids is proposed. In [11], based on the RMT, a datadriven method to reveal the correlations between factors and the power system status is proposed. In [12], [13], the RMT is used for power system transient analysis and steadystate analysis respectively.In this paper, based on the RMT, we propose a fully datadriven approach to realize anomaly detection and location in distribution network. It merges anomaly detection and location functionalities by using online monitoring data. For each feeder in the distribution network, a spatiotemporal data matrix among multipletimeinstant monitoring devices is presented as data source. The data collected from each monitoring device can be threephase voltages, threephase currents, active power, reactive power, etc. Then, we conduct realtime analysis by using moving splitwindow method and the RMT, and compare their findings with the theoretical values (i.e., MarchenkoPastur Law and Ring Law). During the period, the linear eigenvalue statistic (LES), a highdimensional statistical magnitude for the spatiotemporal data matrix, is used to indicate the data behavior. Considering the lowdimensional data matrices corresponding to some feeders, by using tensor product, an increasing data dimension method is developed for them to be analyzed by the RMT. As for the judged abnormal feeder, analyses of the latencies introduced by different branch lines are provided. The main features of the proposed approach can be summarized as follows: 1) It is a fully datadriven approach without requiring too much prior knowledge on the complex topology or parameter information of the distribution network. 2) Based on the RMT, the developed approach merges anomaly detection and location functionalities by using online monitoring data. 3) It is theoretically justified that the developed approach is robust against random fluctuations and measuring errors, which can reduce the potential false alarm probability. 4) Both online and offline analysis can be handled in this developed approach. 5) The developed approach is capable of detecting the anomaly and locating them in realtime, which can help the safety analysis and decision making in selfhealing control.
The rest of this paper is organized as follows. Section II presents the RMT for spectrum analysis of random matrices under both normal and abnormal status, defines LES, and gives the detailed steps of the RMT for anomaly detection and location. In section III, spatiotemporal matrices are formulated by using online monitoring data in distribution network and specific procedures of the anomaly detection and location algorithm in distribution network are designed. As for those lowdimensional data matrices formulated by the online monitoring data of the feeders with only a few or less distribution transformers, an increasing data dimension method is developed for them to be analyzed by using the RMT. Both synthetic data from IEEE standard bus system and realworld online monitoring data in a grid are used to validate the effectiveness of the developed approach in section IV. Conclusions and possible further research directions are presented in Section V.
Ii Random Matrix Theory for Anomaly Detection
The RMT was first introduced in mathematical statistics. As for large random matrices, the importance of the RMT for statistics comes from the fact that it may be used to correct traditional tests or estimators which fail in the ‘large , large ’ setting, where is the number of parameters (dimensions) and is the sample size. In practical world, massive data can be naturally represented by large random matrices [14]. With the arise of large dimensional data sets in various fields, the RMT has been used as an important mathematical tool for studying the statistical properties of functions of them. In this section, statistical properties of both normal and abnormal random matrices are first analyzed. On this basis, an anomaly detection and location approach based on the RMT is characterized.
Iia Statistical Properties of Normal and Abnormal Random Matrices
Let be a random matrix. According to the RMT, when but
, the empirical spectral distribution (ESD) of its corresponding covariance matrix or the product of singular value equivalent matrix will converge to some theoretical limits, such as MarchenkoPastur Law (MPlaw) and Ring Law (Ringlaw). In this subsection, by using MPlaw and Ringlaw, we will discuss ESD and compare it with the theoretical values under both normal and abnormal states.
1) MarchenkoPastur Law Case: Let be a random matrix, whose entries are independent identically distributed (i.i.d.) variables with the mean
and the variance
. The corresponding covariance matrix is defined as . As but , according to the MPlaw [15], ESD ofconverges to the limit with probability density function (PDF)
(1) 
where , .
In normal state, according to the MPlaw, the ESD of the Wishart matrix converges to the spectral density , which is shown in Figure 1a. The bars in blue color indicate the eigenvalue distribution of and the MPlaw is plotted in the red curve. However, what will happen in abnormal state? Here, “abnormal” usually means signals occur in the random matrix and correlations among its rows have been changed. Figure 1b shows the results: the empirical spectral density of the Wishart matrix can not be fit by the MPlaw. It can be observed that the spikes caused by the outliers are out of range.
2)Ring Law Case: Assume be a nonHermitian random matrix with i.i.d. entries . The mean and the variance . The products of L nonHermitian random matrix can be defined as
(2) 
where is the singular value equivalent [16] of . The product can be standardized into (i.e., ). As but , according to the Ringlaw [17][18], the ESD of converges to the limit with PDF
(3) 
In normal condition, we can see that the ESD of converges to the spectral density , which is shown in Figure 2a. In the complex plane, the blue dots represent the eigenvalues of , the inner red circle radius is , and the outer red circle radius is unity. In abnormal state, the ESD of and the Ringlaw are plotted in Figure 2b.
It can be observed that some blue dots are scattered within the radius of the inner circle. We can conclude that the ESD of does not converge to the Ringlaw.
3)Linear Eigenvalue Statistics: From the discussions of 1) and 2), it can be concluded that the eigenvalue distribution can indicate the data behavior of a highdimensional random matrix, which inspires us to find a statistic regarding eigenvalues. LES is just a highdimensional statistical magnitude for the eigenvalues, which can be defined as [19]
(4) 
where are eigenvalues, and is a test function, which makes a linear or nonlinear mapping for those eigenvalues. The commonly used nonlinear test functions include

Chebyshev Polynomial (CP): , where are real numbers;

Information Entropy (IE): ;

Likelihood Radio Function (LRF): ;

Wasserstein Distance (WD): .
In the complex plane of Eigenvalues, the mean spectral radius (MSR), which can be regarded as a special form of LES, is used to indicate the eigenvalue distribution. It is the mean distribution radius of eigenvalues, which can be defined as
(5) 
where is the radius of the eigenvalue on the complex plane.
IiB RMT for anomaly Detection and Location
In practice, we assume there are dimensional observations . At the sampling time , measured data of the
dimensional observations can be formed as a column vector
. For a series of time , by arranging these vectors in chronological order, we can obtain the data source for further analysis.By using a splitwindow on , we can obtain a raw data matrix . For each matrix , we can convert it into the standard form matrix by
(6) 
where , , and . The standardization process in practice i.e. in , can be performed by using function.
For the standard matrix , the corresponding covariance matrix is computed by
(7) 
Then we can compute the eigenvalues
and the eigenvectors
, which in , can be performed by using function. In order to characterize the distribution of the eigenvalues, the LES is defined and computed by(8) 
where are the eigenvalues of , and is the selected test function.
For the computed eigenvalues and eigenvectors of the covariance matrix , according to their definitions in matrix theory, we can obtain
(9) 
The derivation of Equation (9) regarding the elements is
(10) 
Since is real and symmetric, and there exist . Left multiply for equation (10), we can obtain
(11) 
where gets the value of 1 only for the element in and 0 for others. So equation (11) can be simplified as
(12) 
Then the contribution of the th row’s elements to can be computed by
(13) 
From equation (13), we can conclude that the th row’s contribution to the eigenvalue can be measured by the th element of the corresponding eigenvector . From the MPlaw case in IIA, we know there exist outliers (i.e., ) when the system is in abnormal condition. This inspires us to realize abnormal location by studying the corresponding eigenvectors of the outliers. Inspired by the work in [20], here, we design an abnormal location indicator as
(14) 
where . The indicator measures the scale of the th row’s induction to the abnormity. Thus, we can obtain abnormal location indexes through
(15) 
where represents the fault index set, and is the threshold calculated through
(16) 
In the equation, and
represent the mean and the standard deviation of
, and is a coefficient related to the confidence level. Assuming the confidence level is, then the confidence interval of the mean is shown as
(17) 
In practice, we can calculate the value of once the confidence level is set. For example, let , then the calculated is 1.96.
In realtime analysis, we can continuously obtain the raw data matrices by using a slidingwindow method, i.e. moving the splitwindow at continuous sampling times, and the last sampling time is the current time. For example, at the sampling time , the obtained raw data matrix is formed by
(18) 
where for is the sampling data at time . The fundamental abnormal detection and location procedures based on the RMT is given as the steps below.
Iii Anomaly Detection and Location Using Online Monitoring Data in Distribution Network
In this section, by using online monitoring data in the distribution network, we develop a correlation analysis approach to detect the anomalies and locate them in an early phase. The anomalies, in general, may last for a period of time, but if can’t be detected and handled in time, they will be more likely to expand and even cause power failures. First, for each feeder, a highdimensional data matrix is formulated by using the online monitoring data of all distribution transformers on it. The data matrix contains rich information on the operating states of the feeder and the correlations among its rows will change once an abnormal signal occurs. Then our anomaly detection and location approach based on correlation change analysis is described and specific procedures are given. We systematically analyze the advantages of our approach.
Iiia Formulation of Online Monitoring Data as SpatioTemporal Matrices
Figure 3 illustrates circuitry topology of the distribution network. The feeder consists of different levels of branch lines and substations with distribution transformers. On the low voltage side of each distribution transformer, one online monitoring device is installed, through which we can obtain many types of measurement variables such as threephase voltages (i.e.,), threephase currents (i.e.,), active load (), etc. The operating state of the feeder can be reflected through those condition monitoring data. Here, we choose those 7 measurement variables at the sampling time as the elements to form a data vector , where denotes the number of distribution transformers on the feeder. Assume , for a series of time , we can obtain the data set . Since the measurements in have different units and magnitudes, we normalize into by
(19) 
where denotes the normalization value and for and , and mean the minimum and maximum value in the th row vector .
Then we add white noise into
to reduce the correlations among its rows. Letis a white noise matrix with standard normally distributed entries, the final data matrix with white noise is formulated by
(20) 
where is the formulated data set and denotes the introduced white noise magnitude for . The signaltonoise ratio (SNR) for the finally matrix is defined as
(21) 
where denotes the trace function of the matrix. In practice, the value of deserves carefully selected, which will affect the sensibility to the abnormal signal. Once the value of is set, can be calculated through
(22) 
Thus, for each feeder in a series of time , the finally spatiotemporal data set is formulated.
IiiB Anomaly Detection and Location in Distribution Network
Based on the RMT and the formulated spatiotemporal data set, an incipient anomaly detection and location algorithm in distribution network is designed. The specific procedures are characterized as follows.
Procedure 13 are conducted for data preparing and preprocessing. For each sampling time, by using a splitwindow, the ESD of the sample covariance matrix is computed and compared with their theoretical values in procedure 4, where MPlaw and Ringlaw are used as auxiliary analyzing tools. Meanwhile, procedure 4e calculates the linear eigenvalue statistics and procedure 4f calculates the location indicator . Procedure 5 draws the curve, aim to detect the anomalies. Procedure 6 plots the 3D figure of anomaly location indicator, further locating the abnormal indexes.
The anomaly detection and location approach developed is driven by online monitoring data in distribution network, and based on statistical theories. It reveals the correlation changes of the input data at the incipient anomaly moment. The procedures above involve no mechanism models, thus avoiding the errors brought by assumptions and simplifications. It merges anomaly detection and location functionalities, which makes it faster to detect and locate faults. Besides, the method is theoretically robust against random fluctuations and measuring errors, which can help to improve anomaly detection accuracy and reduce the potential false alarm probability. What’s more, the developed approach is practical for realtime anomaly detection and location by using the slidingwindow method.
IiiC More Discussions about the Proposed Approach
We may notice that MPlaw in the RMT holds under infinite or high data dimensions. However, in the application of anomaly detection and location in distribution network, there exist some feeders with only a few distribution transformers. Dimensions of the formulated data matrices corresponding to those feeders are often moderate, such as tens or less. In the work [21], a natural way of increasing dimensions of vectors is introduced. On this basis, here, we develop an approach to increase dimensions of data matrices. This approach allows for the analysis of high dimensional data matrices and yields smaller variance for the related functionals.
Assume be a random matrix with i.i.d. entries, . For , we construct a new random vector by using the tensor product of the column vectors of in the form
(23) 
where are i.i.d. copies of a normalized isotropic random vector and ‘’ denotes tensor product operation. The new random vector lies in the dimensional normed space. Through the tensor product in equation (23), we obtain the dimension increased random matrix .
We consider random matrices of the form[22]
(24) 
where are real numbers. The asymptotic behavior of is well studied in [23] . For every fixed , as but , the ESD of converges to a nonrandom measure.
ESD of the original covariance matrix and the tensor product version and their comparisons with the theoretical MPlaw are depicted in Figure 4a and Figure 4b respectively. It can be observed that the ESD of the original covariance matrix does not fit the theoretical MPlaw for the reason of low dimensions of . In contrast, the ESD of the tensor product version of covariance matrix converges almost surely to the theoretical limit. This developed increasing data dimension approach makes it possible for the analysis of low dimensional data matrices corresponding the feeders with only a few or less distribution transformers.
Iv Case Studies
In this section, the approach developed in this paper is tested with both synthetic data from the standard IEEE bus system and realworld online monitoring data from a power grid. Five cases in different scenarios are designed to validate the effectiveness of the approach. In all the numerical cases, white noise is introduced to represent fluctuations and system errors.
Iva Case Study with Synthetic Data
The synthetic data is sampled from the simulation results of the standard IEEE118 and IEEE30 bus system [24], with a sampling rate of 50 Hz. In the simulations, a sudden change of the active load at one node is considered as a signal.
1) Case Study on Effectiveness of Different Test Functions: In this case, the synthetic data set contains voltage measurement variables with sampling times, size of the splitwindow is , and is set to be . In order to test the effectiveness of the developed approach with different test functions, an assumed step signal is set for node and others stay unchanged, which is shown in Table I.
For comparing the effectiveness of the proposed method with different test functions, we normalize the calculated results into . The commonly used test functions, illustrated in IIA, are tested and Figure 5 illustrates the comparison results. It is noted that the curve begins at , because the initial splitwindow includes 199 times of historical sampling and the present sampling data. Another needs to be noted is that the index number starts from 0 in Python.
Based on the curve, it can be observed that:
I. During , computed through the proposed method with 4 different test functions remains almost constant, which means the system is under normal status. As is shown in Figure 6a and 6b, the ESD converges almost surely to its theoretical MPlaw and Ringlaw.
II. From to , changes dramatically, which means signal occurs and the system is under abnormal status. Figure 6c and 6d shows that there exist outliers.
III. At , recovers its initial value and remains almost constant afterwards, which means the system returns to the normal state. Figure 6e and 6f shows that the outliers vanish and the ESD fits its MPlaw and Ringlaw again.
From the above analyses, we can conclude that the developed approach with any one of the 4 test functions can detect the anomaly signal effectively and the delay lag of the signal to is equal to the splitwindow’s width. More importantly, the curve with test function has the highest variance radio, which indicates that it is more sensitive to abnormal data behavior than others. Therefore, we choose as the test function in the following cases.
2) Case Study on Effectiveness of Increasing Data Dimension Method: In case 2, we aim to test the effectiveness of the proposed increasing data dimension method. The synthetic data from IEEE30 bus system contains 30 voltage measurement variables, the splitwindow’s size is , the parameter is 1, and other parameters are set the same as in case 1. An assumed step signal is set for bus and others stay unchanged, which is shown in Table II. By using the developed increasing data dimension method, dimensions of the matrices are increased from to . The anomaly detection results are shown in Figure 7.
From the curve, it can be seen that the assumed step signal for bus can be more easily detected when the data dimension is increased from 15 to 225. The detection process is shown as follows:
I. During , the value of remains almost constant, which indicates that the system is in normal state without signals. The ESD does not accurately fit its theoretical MPlaw in Figure 8a1 for the low data dimension. However, in Figure 8a2, by using the developed increasing data dimension method, the ESD converges almost surely to its theoretical limits.
II. From to , the curve is almost shaped and the delay lag is equal to the splitwindow’s width, which indicates that the signal occurs at and remains afterwards. Besides, the curve reaches its local minimum at , when just half of the splitwindow contains the signal. During this period of time, the ESD does not fit their theoretical MPlaw, which are shown in Figure 8b1 and 8b2. The distinction lies in more outliers occur and the value of the largest outlier become bigger when the data dimension is increased from 15 to 225, which makes it more easier to detect the abnormal behavior.
III. At , increases back to its initial value and remains constant afterwards, which means the whole splitwindow contains the signal.
Analyses above are in accord with the assumed step signal in Table II. It indicates that the developed increasing data dimension method is feasible for gaining high dimensional data matrices to be analyzed by using the RMT, which makes anomaly detection more easier and can theoretically be justified through MPlaw.
3) Case Study On Effectiveness of Anomaly Location Function: In case , the developed anomaly location function both for highdimensional and lowdimensional data matrices are tested.
3a) Highdimensional data scenario: The synthetic data set from simulation results of IEEE118 bus system contains 118 active load measurement variables with sampling 1000 times. Parameters about the splitwindow are set the same as in case 1. Three assumed step signals are set for node , which is shown in Table III.
Figure 9 is the 3D plot of the location indicator regarding IEEE 118 buses from . It can be observed that:
I. During and , no signal occurs and are small random real numbers nearly 0.
II. From , corresponding to those buses most affected by signals increase dramatically and reaches their peaks when half of the splitwindow contains those assumed signals, which are in accord with the signal detection processes. Thus, we can calculate the indexes of buses with signals through equation (15), (16) and (17), i.e., 99, 100, 101.
3b) low dimensionaldata scenario: In this scenario, the synthetic data set is the simulation results of IEEE30 bus system and it contains 30 active load measurement variables with sampling 1000 times. Parameters of the splitwindow and others are set the same as in case 2. For each splitwindow, the dimension of data matrix is increased from 30 to 225 by using the developed increasing data dimension method. Here, two cases are tested: three assumed step signals are separately set for the former part (node ) and the latter part (node ) of the matrix, which are shown in Table IV and Table V. The corresponding location results are shown in Figure 10a and 10b.
From Figure 10a, we can see that the values of corresponding to some indexes increase dramatically from and reaches their peaks at . By using equation (15), (16) and (17), we can calculate the index set . Considering in the increasing data dimension process, we can obtain the anomaly index set in the original matrix through
(25) 
i.e., 9,10,11. Similarly, in Figure 10b, we can obtain the fault index set through equation (15), (16) and (17) in the dimension increased matrix, i.e., . combining the data division method in increasing data dimension process, we can locate the anomaly by
(26) 
i.e., 19, 20, 21.
Analyses of the developed anomaly location function under both highdimensional and lowdimensional data scenario indicate that the approach can effectively identify buses with signals. We will further test the location method in the following cases by using realworld online monitoring data in distribution networks.
IvB Case Study with RealWorld Online Monitoring Data
Online monitoring data obtained from a realworld distribution network are used to test the developed anomaly detection and location approach. The data is sampled every 15 minutes and the sampling time is from March 1st, 2017 to March 14th, 2017. Power failure time and location for every feeder are recorded, which are shown in Figure 11. In the following cases, threephase voltages, threephase currents and active load measurements for each distribution transformer are chosen as the components to form the data matrix.
4) Case Study on Highdimensional Feeders: In case 4, the developed approach for highdimensional feeder is tested. The selected main feeder with branch lines and substations contains 17 distribution transformers in total, thus forming a data set. The size of the splitwindow is , and is set to be 100. The realworld online monitoring data with power failure time and location recorded is shown in Figure 11a. It can be obtained that the recorded fault time is 2017/3/8 13:47:00 and the fault induction are the branch lines or substations with ID 1554766408 and ID 1238878713 transformer.
Figure 12a illustrates the detection results for the main feeder. From the curve, we can realize the anomaly detection as follows:
I. During 2017/3/3 00:00:00 and 2017/3/7 22:30:00, the value of changes smoothly, which indicates the feeder runs in normal state.
II. At 2017/3/7 22:30:00, the value of begins to decrease dramatically. It indicates anomaly signals occur and the status of the feeder change. Comparing the recorded power failure time, the anomaly can be detected before that. Meanwhile, from 2017/3/7 22:30:00 to 2017/3/9 23:45:00, the curve is almost shaped and the delay lag is equal to the splitwindow’s width, which is consistent with our simulation result in case 2.
In realtime analysis, faults may happen any time after anomaly signals occur and we may not observe the complete shaped curve. Therefore, the decreasing rate of for a series of time is usually calculated to judge whether incipient anomaly signals occur. Furthermore, by using the developed location method, induction row numbers of data matrices are located, which is shown in Figure 12b.
Figure 12b is the 3D plot of location indicator regarding 119 observations from 2017/3/3 00:00:00 to 2017/3/14 23:59:59. At the incipient anomaly detection moment, i.e., 2017/3/7 22:30:00, we obtain the anomaly indexes through equation (15), (16) and (17), i.e., . As 7 measurement variables for each distribution transformer are chosen to form the data matrix, we can locate the abnormal transformers by getting round numbers of divided by 7, i.e., NO.7 and NO.8 transformer in the original matrix. Transformer IDs that correspond to NO.7 and NO.8 are 1554766408 and 1238878713, which are in accord with the recorded failure transformer IDs. Thus, combining the topology of the main feeder, abnormal branch lines or substations with those 2 transformers can be identified in time.
5) Case Study on Low Dimensional Feeders: In this case, effectiveness of the developed approach for a lowdimensional feeder is validated. We choose one feeder with only 6 distribution transformers as our analyzing object. The size of the formulated data set is . The original online monitoring data with power failure time and location recorded is shown in Figure 11b. It can be seen that the recorded power failure time are 2017/3/8 13:31:00 and 2017/3/13 03:10:00, and the failure transformer ID is 2513743732.
By using the increasing data dimension method, we increase the data’s dimension from to . Thus, the size of the splitwindow is . Parameter is set to be 100 and is 1. The anomaly detection result is shown in Figure 13a.
Based on the curve, we can detect the anomaly in an incipient phase as follows:
I. During 2017/3/3 00:00:00 and 2017/3/7 22:09:00, the value of changes smoothly, which means no abnormal signals occur and the status of the main feeder is normal.
II. At 2017/3/7 22:09:00, decreases dramatically, which indicates incipient anomaly signals occur and the status of the main feeder is getting worse. By comparing with the recorded power failure time, we can detect the anomaly before that. Besides, from 2017/3/7 22:09:00 to 2017/3/9 22:09:00, the curve is almost shaped and the delay lag is equal to the splitwindow’s width, which is consistent with our simulation results.
II. Similarly, at 2017/3/11 13:46:00, incipient anomaly signals are detected again before power failures. It is noted that the delay lag of the shaped curve from 2017/3/11 13:46:00 to 2017/3/13 05:00:00 is not equal to the width of the splitwindow, which may be caused by the superposition of incipient anomaly signals with some other undesired signals.
Furthermore, we locate the anomaly by using the developed location method, which is shown in Figure 13b. From 2017/3/7 22:09:00, incipient anomaly signals occur and we obtain those abnormal indexes in the dimension increased matrix through equation (15), (16) and (17), i.e., . Then we can obtain anomaly index set in the original matrix through equation (25), i.e., . Considering 7 measurement variables for each distribution transformer, we can locate abnormal transformers by getting round numbers of those anomaly indexes divided by 7, i.e., NO.0 transformer. Similarly, we can locate the anomaly at 2017/3/11 13:46:00. The obtained indexes in the dimension increased matrix are and , and the corresponding anomaly indexes in the original matrix are 1,2 and 7, result of which divided by 7 is also NO.0 transformer. The ID corresponding to NO.0 transformer for the feeder is 2513743732, which is in accord with the recorded failure transformer ID. Thus, we can locate the anomaly branch line or substation with ID 2513743732 transformer.
V Conclusion
This paper develops a datadriven approach to realize incipient anomaly detection and location in distribution network. For each feeder, a spatiotemporal data matrix is formulated by using online monitoring data. Based on the RMT for spectrum analysis, the proposed approach performs realtime anomaly detection and location by using a slidingwindow method, during which LES for each splitwindow is calculated to indicate the data behaviour and MPlaw and Ringlaw are used as assisted tools to compare ESD with their theoretical findings. Meanwhile, for the data matrices formulated by the lowdimensional feeders, we develop an increasing data dimension method for them to be analyzed by using the RMT. As for the detected abnormal feeders, analyzes of eigenvectors corresponding to the outliers are conducted to locate the latencies. The developed approach is purely datadriven and it merges anomaly detection and location functionalities, which makes it suitable for realtime applications. Both synthetic data and realworld data are used to corroborate the effectiveness of the approach and a software based on it has been developed to put into practical application.
During our work, we notice that status of feeders in distribution network are directly related to the information of outliers, such as the number of outliers, the value of outliers, etc. Therefore, our future research could focus on the estimation methods of outliers and exploration of statistics regarding them.
References
 [1] M. R. Jaafari Mousavi, “Underground distribution cable incipient fault diagnosis system,” Ph.D. dissertation, Texas A&M University, 2007.
 [2] M. Pignati, L. Zanni, P. Romano, R. Cherkaoui, and M. Paolone, “Fault detection and faulted line identification in active distribution networks using synchrophasorsbased realtime state estimation,” IEEE Transactions on Power Delivery, vol. 32, no. 1, pp. 381–392, 2017.
 [3] L. Xie, Y. Chen, and P. R. Kumar, “Dimensionality reduction of synchrophasor data for early event detection: Linearized analysis,” IEEE Transactions on Power Systems, vol. 29, no. 6, pp. 2784–2794, 2014.
 [4] M. Wu and L. Xie, “Online detection of lowquality synchrophasor measurements: A datadriven approach,” IEEE Transactions on Power Systems, vol. 32, no. 4, pp. 2817–2827, 2017.
 [5] L. Chu, R. C. Qiu, X. He, Z. Ling, and Y. Liu, “Massive streaming pmu data modeling and analytics in smart grid state evaluation based on multiple highdimensional covariance tests,” IEEE Transactions on Big Data, vol. PP, no. 99, pp. 1–1, 2016.
 [6] R. C. Qiu, Z. Hu, H. Li, and M. C. Wicks, Cognitive radio communication and networking: Principles and practice. John Wiley & Sons, 2012.
 [7] N. A. S. B. K. Saad, Random Matrix Theory with Applications in Statistics and Finance. University of Ottawa (Canada), 2013.
 [8] K. Chaitanya, “Random matrix theory approach to quantum mechanics,” arXiv preprint arXiv:1501.06665, 2015.
 [9] J. Watrous, “Theory of quantum information,” University of Waterloo Fall, vol. 128, p. 19, 2011.
 [10] X. He, Q. Ai, R. C. Qiu, W. Huang, L. Piao, and H. Liu, “A big data architecture design for smart grids based on random matrix theory,” IEEE transactions on smart Grid, vol. 8, no. 2, pp. 674–686, 2017.
 [11] X. Xu, X. He, Q. Ai, and R. C. Qiu, “A correlation analysis method for power systems based on random matrix theory,” IEEE Transactions on Smart Grid, vol. 8, no. 4, pp. 1811–1820, 2017.
 [12] W. Liu, D. Zhang, X. Wang, D. Liu, and X. Wu, “Power system transient stability analysis based on random matrix theory,” Proceedings of the CSEE, vol. 36, no. 18, pp. 4854–4863, 2016.
 [13] X. Wu, D. Zhang, D. Liu, W. Liu, and C. Deng, “A method for power system steady stability situation assessment based on random matrix theory,” Proceedings of the CSEE, vol. 36, no. 20, pp. 5414–5420, 2016.
 [14] R. Qiu and M. Wicks, Cognitive Networked Sensing and Big Data. Springer Publishing Company, Incorporated, 2013.
 [15] V. A. Marčenko and L. A. Pastur, “Distribution of eigenvalues for some sets of random matrices,” Mathematics of the USSRSbornik, vol. 1, no. 4, p. 457, 1967.
 [16] J. R. Ipsen and M. Kieburg, “Weak commutation relations and eigenvalue statistics for products of rectangular random matrices,” Physical Review E, vol. 89, no. 3, p. 032106, 2014.
 [17] A. Guionnet, M. Krishnapur, and O. Zeitouni, “The single ring theorem,” Annals of mathematics, pp. 1189–1217, 2011.
 [18] F. BenaychGeorges and J. Rochet, “Outliers in the single ring theorem,” Probability Theory and Related Fields, vol. 165, no. 12, pp. 313–363, 2016.
 [19] I. Jana, K. Saha, and A. Soshnikov, “Fluctuations of linear eigenvalue statistics of random band matrices,” Theory of Probability & Its Applications, vol. 60, no. 3, pp. 407–443, 2016.
 [20] Z. Ling, R. C. Qiu, X. He, and L. Chu, “A new approach of exploiting selfadjoint matrix polynomials of large random matrices for anomaly detection and fault location,” arXiv preprint arXiv:1802.03503, 2018.
 [21] R. C. Qiu, Random Matrix Theory and Big Data Analysis, 2017.
 [22] A. Ambainis, A. W. Harrow, and M. B. Hastings, “Random tensor theory: Extending random matrix theory to mixtures of random product states,” Communications in Mathematical Physics, vol. 310, no. 1, pp. 25–74, 2012.

[23]
A. Lytova, “Central limit theorem for linear eigenvalue statistics for a tensor product version of sample covariance matrices,”
Journal of Theoretical Probability, pp. 1–34, 2017.  [24] R. D. Zimmerman, C. E. MurilloSanchez, and R. J. Thomas, “Matpower: Steadystate operations, planning, and analysis tools for power systems research and education,” IEEE Transactions on Power Systems, vol. 26, no. 1, pp. 12–19, Feb 2011.
Comments
There are no comments yet.