I Introduction
The operational risk assessment is a fundamental task in distribution networks, which can help realize situation awareness of the network and offer support on the safety analysis and control decision. One main factor that influences the operational risks are faults or fluctuations caused by anomalies in the distribution network. These anomalies may present intermittent, asymmetric, and sporadic spikes, which are random in magnitude and could involve sporadic bursts as well, and exhibit complex, nonlinear, and dynamic characteristics [1]. Additionally, with numerous branch lines and changeable network topology, it is questionable for the traditional modelbased approaches to fully and accurately detect the anomalies in the distribution network, because they are usually based on certain assumptions or simplifications.
In recent years, there have been significant deployments of online monitoring systems in distribution networks and a large amount of data is collected through them. The massive data contains rich information on the operating state of the distribution network. In order to leverage the data, many advanced analytics are developed. For example, in [2], a PCAbased approach is proposed to reduce the dimensionality of phasormeasurementunit (PMU) data, whose result is utilized to detect early events in the network. In [3]
, based on wavelet energy spectrum entropy decomposition of disturbance waveforms, characteristic features are extracted to detect and classify faults in a distribution network. In
[4], by using multitimeinstant synchrophasor data, a densitybased outlier detection approach is used to detect lowquality synchrophasor measurements. In
[5], by computing parallel synchrophasorbased state estimators, a realtime fault detection and faulted line identification methodology is proposed. In
[6], a method using timefrequency analysis is proposed for feature extraction, and a classifier is trained by those extracted features. In
[7], based on the multiple high dimensional covariance matrix test theory, a statisticallybased anomaly detection algorithm is proposed for streaming PMU data.
Reviewing the current efforts on anomaly detection by using online monitoring data, two main weaknesses exist: 1) they rely on a prior parametric model of the monitoring data, which are usually based on certain assumptions and simplifications. 2) they often use simple features calculated through the timeseries monitoring data, such as the mean, variance, spectrum, high moments, etc. For the methods based on predesigned parametric models, they are sensitive to parameter values and it is not easy to find the optional parameters to capture the essential features for each data segment. Therefore, false alarm can easily happen. While for the statisticallybased methods, simple statistical features often are not well generalized in most cases and they are susceptible to random fluctuations, which makes it impossible to detect the latent anomalies.
Generative adversarial nets(GANs) are first proposed by Goodfellow in 2014 [8]
, overcoming the difficulties of approximating many intractable probabilistic computations and leveraging the benefits of piecewise linear units in deep generative models. It trains a generator to automatically capture the distribution of the sample data from simple latent distributions, and a discriminator to distinguish between real and generated samples. Compared with the probability distributions calculated by traditional techniques, the automatically captured ones can better depict the rich structure information of the arbitrarily complex data. However, the existing GANs have no function of projecting the generated data back into the latent space, which makes it impossible to use those latent feature representations for auxiliary problems. In view of this occasion, Donahue propose an improved framework of GANs in 2016, bidirectional generative adversarial networks (BiGANs)
[9]. BiGANs have solved the problem of inaccessible feature representations by adding an encoder in the original framework of GANs. It is a robust and highly generative feature learning approach for arbitrarily complex data, making no assumptions or simplifications on the data.In this paper, we propose a new unsupervised approach for the operational risk assessment in distribution networks. It can automatically learn the most representative features from the input data in an adversarial way by using BiGANs. Based on the extracted features, a statistical index is defined and calculated to indicate the data behavior. Furthermore, to quantify the operational risks of feeder lines in the distribution network, the risk levels are classified into emergency, high risk, preventive and normal, and they are combined with the confidence level for the population mean of the standard . By comparing the p value for each data point of the standard with the intervals of
, the operational risks of the feeder lines can be judged intuitively. The main advantages of the proposed approach are summarized as follows: 1) It is a purely datadriven approach without requiring too much prior knowledge on the complex topology of the distribution network, which eliminates the potential detection errors caused by inaccurate network information. 2) It is an unsupervised learning approach requiring no anomaly labels or records, which solves the label lack or inaccuracy problem in the distribution network. 3) It automatically learns features from the online monitoring data in the distribution network, which makes it possible for detecting the latent anomalies. Because the learned features are more powerful in representing the real data than artificially designed ones. 4) It is suitable for both online and offline analysis.
The rest of this paper is organized as follows. Section II describes the proposed BiGANsbased anomaly detection algorithm, i.e., data preparing and normalization, adversarial feature learning, and anomaly detection. In section III, spatiotemporal matrices are formulated by using the online monitoring data in distribution networks and specific steps of operational risk assessment are given. In section IV, the synthetic data from IEEE 118bus system are used to illustrate the selection of parameters involved in the approach, and the realworld online monitoring data in a distribution network are used to validate the effectiveness and advantages of the proposed approach. Conclusions are presented in section V.
Ii BiGANsBased Anomaly Detection
In this section, the BiGANsbased anomaly detection algorithm is introduced. First, the multidimensional time series data are partitioned into a series of segments in chronological order. The main idea is that BiGANs are utilized to automatically learn the most representative features from the data segments without making any prior assumptions or simplifications. Based on the extracted features for each data segment, a statistical index is calculated to indicate the data behaviour. The designed algorithm offers an endtoend solution for anomaly detection and specific steps are characterized as below.
Iia Data Preparing and Normalization
Assume there are dimensional measurements (such as voltage measurements from sensors installed on one feeder line) . At the sampling time , the
dimensional measurements can be formulated as a column vector
. For a series of time , by arranging these vectors in chronological order, a spatiotemporal data matrix is obtained.With a () window moving on at the step size , a series of data segments are generated. For example, at the sampling time , the generated data segment is
(1) 
where () is the sampling data at time . For the data segment at the sampling time , we reshape it into a column vector denoted as . Thus, the spatiotemporal data matrix is reformulated as , which is shown in Figure 1.
To reduce the calculation error and improve the convergence speed of training BiGANs in the subsequent feature learning process, we normalize into by
(2) 
where is the normalized value in the range , and respectively denote the minimum and maximum value of .
IiB Adversarial Feature Learning
BiGANs, train a generative model to capture the distribution of the sample data, a discriminative model to distinguish the real samples from the generated ones as accurately as possible, and an encoding model to project sample data back into the latent space. Feature representations learned by the encoding model can depict the rich structure information of the sample data well, which can be used for other auxiliary problems. The framework of BiGANs is shown in Figure 2.
The generative model , also called generator, is composed of multilayer networks. For convenience, we assume it’s a threelayer network, i.e., the input layer, one hidden layer and the output layer. Data transition from the input layer to the output layer can be denoted as in Equation (3) and (4).
(3) 
(4) 
where is the input data sampled from a simple latent distribution (gaussian, uniform, exponential, etc), is the output of the hidden layer in , is the weight matrix between the input layer and the hidden layer and is the weight matrix between the hidden layer and the output layer. To avoid gradient vanishing in training multilayer , as proposed in [10], and
can be initialized by using an uniform distribution
(5) 
where and are the fanin and fanout of the units in the th layer, denotes the uniform distribution supported by and . and
are the bias vectors, which can be initialized as small random values or 0.
andare activation functions, such as sigmoid function, or tanh function, or rectified linear units (ReLu) first proposed in
[11](6) 
or leaky ReLu (LReLu) proposed in [12][13]
(7) 
The parameter in Equation (7) represents the slope of the leak. Compared with ReLu, the LReLu can keep a small gradient even though the unit is saturated. The output is the generated sample, which has the same size with the real data sample.
The encoding model
, also called encoder, is stacked with multilayer neural networks. Assuming an encoder with only twolayer networks, i.e., the input layer and one hidden layer, data transition from the input data
to the hidden units is called encoding, which is defined as(8) 
where is the weight matrix between the input layer and the hidden layer, which can be initialized through Equation (5). is the bias vector initialized with small random values or 0, and is the activation function. The output can be considered as the feature representations of the real sample data in the latent space.
The discriminative model , called discriminator, is also with multilayer network structure, i.e., the input layer, multiple hidden layers, and the output layer. It takes the combination of the sample data and its latent features as the input (i.e., or ), and outputs to represent the probability that is from the real sample rather than the generated one. Considering a discriminator with threelayer network, the discriminative process can be denoted as in Equation (9) and (10).
(9) 
(10) 
where and are the activation functions, and are the weight matrices initialized through Equation (5), and are the bias vectors initialized with small random values or 0, and represents the output of the hidden layer in .
Let be the distribution of the real data for (e.g. data segments), be the distribution of the sampled data in for . In BiGANs, we train to maximize the probability of distinguishing the real samples from the generated ones (i.e., maximizing ), train to minimize the probability of making correct distinctions (i.e., minimizing , and simultaneously train to map the real data into the latent space of (i.e., introducing . Thus, the objective function of training BiGANs can be defined as [9]
(11) 
where
(12) 
Considering the large number of parameters in BiGANs, it is mandatory to introduce an regularization technique to prevent the overfitting problem. Dropout, first proposed in [14], addresses this problem by introducing randomness, i.e., dropping out the units in the hidden layers with a fixed probability, such as 0.2. The minimax objective function in Equation (11
) can be optimized by using stochastic gradient descent (SGD) based techniques, such as adaptive subgradient (AdaGrad) method
[15], root mean square prop (RMSprop) alogithm
[16], adaptive moment (Adam) estimation [17], etc. Here, we choose Adam as the optimization algorithm, which combines the advantages of AdaGrad and RMSprop, i.e., sparse gradients, online and nonstationary settings.In practice, the objective function in Equation (11) may not provide sufficient gradients for to learn well, because can clearly distinguish real sample from the generated one early in learning and this will easily lead to to saturate. Therefore, we can train by maximizing instead of minimizing . Theoretical results in [9] show the objective function achieves its global minimum value if and only if and have the same distribution (i.e., ).
The process of feature learning is training BiGANs, i.e., obtaining the optimal parameters by minimaximizing the objective function in Equation (11). Here, and denote the corresponding parameters (i.e., and ) in and . When the network almost converges (i.e., ), the features output by the encoder can be considered as the latent representations of the real data in the generator’s space, which can be used for the subsequent anomaly detection task.
IiC Anomaly Detection
Based on the features extracted through BiGANs for each data segment, a highdimensional statistical index for them is calculated to indicate the data behavior. For example, at the sampling time
, the statistical index for the learned features is calculated as(13) 
where . The test function makes a linear or nonlinear mapping for the features, which can be chebyshev polynomial (CP), information entropy (IE), likelihood radio function (LRF) or wasserstein distance (WD). Detailed information about the test functions can be found in [18]. is a complex function of the extracted features, which will be further discussed in Section IVA.
Considering random weight initialization and dropout enforces randomness during the adversarial feature learning, the average value of the objective function in continuous iterations is calculated to judge whether terminating the training. For example, for the th iteration, the average value is calculated as
(14) 
where is the calculated objective function value in the th () iteration, and denotes the average value for continuous iterations. Here, the simple averaging method in ensemble learning is used. For each iteration, one network learning model is built and outputs . Thus, in continuous iterations, the average result for learning models is more accurate in judging whether to terminate the training than simply using the output in the last iteration, because the former is more stable and reliable for reducing the error caused by randomness and the risk of network falling into local optimum. The procedure for anomaly detection based on BiGANs is summarized as in Algorithm 1.
Algorithm 1: The proposed BiGANsbased algorithm for anomaly detection. For , the LReLu function is used as the activation function in hidden layers and the tanh function as that in output layers, and Adam is chosen as the optimization method, see Section IIB for details. denotes the number of steps applied to and is the number of iterations used to calculate . 
Input: The data segment , the required 
approximation error ; 
Output: The anomaly index ; 
1. For each data segment do 
2. Normalize into according to Equation (2); 
3. Initialize as illustrated in Section IIB; 
4. For iteration do 
5. For steps do 
6. Sample from a simple latent distribution; 
7. Update by descending their gradients: 
End for 
8. Sample from the same distribution as in step 6; 
9. Update by descending its gradient: 
10. If do 
11. Calculate through Equation (14); 
12. If do 
13. ; 
14. Output the learned features calculated 
through Equation (8); 
15. Break; 
End if 
End if 
End for 
16. Calculate through Equation (13); 
17. Calculate the statistical index for : 
; 
End for 
Iii Operational Risk Assessment Using Online Monitoring Data in Distribution Networks
In this section, by using the online monitoring data, a new unsupervised learning approach to assess the operational risks of feeder lines in a distribution network is proposed. First, a spatiotemporal data matrix is formulated for each feeder line by using the online monitoring data, and the anomaly index is calculated as illustrated in Section II. Then, by combining the confidence level for the population mean of the standardized , the operational risks are classified into different levels with clear criterion defined. The specific steps of the proposed approach are given and analyzed.
Iiia Formulation of Online Monitoring Data as SpatioTemporal Matrices
As illustrated in Figure 3, one feeder line in the partial distribution network consists of branch lines and substations with distribution transformers. On the low voltage side of each distribution transformer, one online monitoring sensor is installed, through which we can obtain multiple measurements, such as threephase voltages (i.e.,). Here, we choose at the sampling time to formulate a data vector , where denotes the number of sensors installed on the feeder. Let , for a series of time , we can obtain a spatiotemporal data matrix . It is noted that, by stacking the voltage measurements together, the formulated spatiotemporal data matrix contains rich information on the operating state of the feeder line.
IiiB Operational Risk Classification in Distribution Networks
The anomaly detection result can indicate the operating states of the feeder lines in distribution networks. Here, it is used as the basis for assessing the operational risks. For a series of time , the anomaly index for each data segment is calculated, size of which is . We first standardize by
(15) 
where and
are the sample mean and sample standard deviation of
in a series of time .Considering the sample size is sometimes small, here, is assumed to follow a student’s tdistribution with degrees of freedom, i.e.,
. According to the central limit theorem, the confidence level
for the population mean of is defined as(16) 
where and are the sample mean and sample standard deviation of with and , is the upper critical value for the t distribution with degrees of freedom, and
is the probability operator. Thus, the confidence interval of level
is simplified as , and the p value for the interval critical values is equal to . For a given , the corresponding p value can be obtained by the t distribution table. For example, let and , then the p value is .To further quantify the operational risks of feeder lines in distribution networks, we classify the operational risk levels into emergency, high risk, preventive and normal according to the defined intervals of the confidence level for the population mean of , which is shown in Table I. Thus, for a calculated , we can judge the operational risk level by comparing the p value with the corresponding interval of : the smaller the p value, the higher the risk level.
Operational risk level  Confidence level () 

Emergency  
High risk  
Preventive  
Normal 
Here, “emergency” means a feeder line operates in abnormal state and serious faults may happen at any time. If one feeder is diagnosed as in emergency state, it will be further analyzed. “High risk” denotes a feeder line is of high risk in suffering from faults, which deserves special attention. “Preventive” means a feeder line operates in normal state, but it is not safe and should be watched for a period of time. “Normal” denotes a feeder line is in healthy state. By using Table I, the operational risks of feeders are quantified, which offer references for operators to make safety assessments.
IiiC The Operational Risk Assessment Approach in Distribution Networks
Based on the research above, an unsupervised risk assessment approach in distribution networks is proposed. The steps of the approach are shown as follows.
Steps of the operational risk assessment in distribution networks 

1. For each feeder, a spatiotemporal data matrix is 
formulated as illustrated in Section IIIA. 
2. Partition into a series of data segments with a window 
moving on it at a step size . 
3. For the data segment at the sampling time , 
3a) Reshape it into a column vector ; 
3b) Normalize into according to Equation (2); 
3c) calculate the anomaly detection index , see Algorithm 1 
for details. 
4. Draw curve for each feeder in a series of time . 
5. Calculate the p value for each data point of in Equation (15). 
6. Assess the feeder’s operational risk level by comparing the calculated 
p value with the interval of defined in Table I. 
The operational risk assessment approach proposed is driven by the online monitoring data and based on adversarial feature learning theory. Step 1 is conducted for the formulation of a spatiotemporal data matrix for each feeder. In Step 2, the data matrix is partitioned into a series of data segments by using a moving window method. Step 3 is the adversarial feature learning process for each data segment, in which no assumptions or simplifications are made for the underlying structure of the real data. Step 46 are conducted for the operational risk assessment based on the central limit theorem. The proposed approach is practical for online analysis when the last sampling time is considered as the current time.
Iv Case Studies
In this section, we validate the effectiveness of the proposed approach and compare it with other existing approaches. Six cases in different scenarios are designed. The first three cases, using the synthetic data generated from IEEE 118bus test system, test the performances of the proposed approach with different parameter settings, which offer parameter selection guidelines for analyzing the real data. The last three cases, using the realworld online monitoring data, validate the proposed approach and compare it with other existing approaches.
Iva Case Study with Synthetic Data
The synthetic data was sampled from the simulation results of the IEEE 118bus test system [19]
. In the simulations, a sudden change of the active load at one bus was considered as an anomaly signal and a little white noise was introduced to represent random fluctuations.
1) Case Study on sampling Distribution: In BiGANs, represents the input data of the generative model
, which is sampled from a simple distribution, such as uniform distribution, gaussian distribution, exponential distribution, etc. In this case, we will explore whether
sampling distribution affects the proposed approach’s performance. The synthetic data set contained 118 voltage measurements for sampling 500 times. An assumed step signal was set for bus 20 during and others stayed unchanged, which was shown in Table II.Bus  Sampling Time  Active Load(MW) 

20  20  
120  
20  
Others  Unchanged 
The other involved parameters were set as follows:
– The moving window’s size : ;
– The moving step size : ;
– The number of layers for //: ;
– The number of neurons in each hidden layer of
:;
– The number of neurons in each hidden layer of :
;
– The number of neurons in each hidden layer of :
;
– The feature size: 64;
– The number of steps applied to : 1;
– The number of iterations to calculate : 10;
– The initial learning rate : 0.0002;
– The slope of the leak in LReLu: 0.2;
– The dropout coefficient: 0.1;
– The required approximation error : 0.0001;
– The test function : ln.
For exploring the effect of sampling distribution on the performance of the proposed approach, the statistical indices with sampled from uniform distribution (i.e., ), gaussian distribution (i.e., ) and exponential distribution (i.e., ) were respectively calculated and the corresponding curves were plotted in Figure 4. It can be observed that the assumed anomaly signal can be detected when is sampled from any distribution. Meanwhile, for the curves, the p values of corresponding to the anomaly point were calculated, results of which were , , , respectively. It can be concluded that the detection performance of the proposed approach is almost not affected by the assumption of sampling distribution.
In the experiment, the synthetic data set were partitioned into data segments and were generated for each data segment. In order to explore the shape of the sampling distribution of the anomaly index corresponding to different
sampling distribution, the probability density function (PDF) curves of
with outliers (the values corresponding to the anomaly point) dropped are plotted in Figure 5. It can be observed that the sampling distribution of is approximately normal when the degrees of freedom are large, regardless of sampling distribution. It validates our assumption in Section IIIB that follows a t distribution.2) Case Study on Model Depth: Since and in BiGANs are composed of multilayer network, in this case, we will explore how the model depth (i.e., the number of layers in //) affects the proposed approach’s performance. The generated data set in Case 1) was used in this case, and was sampled from standard gaussian distribution, i.e., . The other involved parameters were set the same as in Case 1). For illustrating the effect of model depth on the performance of the proposed approach, the anomaly indices corresponding to different model depth were calculated and normalized into , which was shown in Figure 6.
It can be observed that the assumed anomaly signal can be detected for different model depth (i.e., ). Meanwhile, for the curves, the p values of corresponding to the anomaly point were calculated, results of which were , , , , respectively. It shows that the best anomaly detection performance is achieved when the model depth is or . Furthermore, the effect of model depth on the convergence rate in training BiGANs is illustrated in Figure 7. Considering the performance and efficiency comprehensively, the model depth is set as in the subsequent experiments.
3) Case Study on Feature Size: In this case, the effect of feature size on the performance of the proposed approach is explored. The generated data set in case 1) was used in this case, and the model depth was set to be . The other involved parameters were set the same as in Case 1). The anomaly indices corresponding to different feature size were calculated and normalized into , which was shown in Figure 8.
It can be observed that the assumed anomaly signal can be detected for different feature size (i.e., ). For the curve, the p values of corresponding to the anomaly point were calculated and the results were , , , , , respectively. It can be concluded that: 1) when the feature size is small (such as 20 or 40), the proposed approach is sensitive to the anomaly signal, but it is vulnerable to random fluctuations; 2) with the increase of feature size, the proposed approach becomes less sensitive to the anomaly signal and more robust to random fluctuations. In the experiment, we note that large feature size will lead to a slow convergence rate in training BiGANs. Therefore, a moderate feature size is often selected empirically.
IvB Case Study on RealWorld Online Monitoring Data
In this section, the online monitoring data obtained from a distribution network in Hangzhou city of China is used to validate the proposed approach. The distribution network contains feeder lines with distribution transformers. The online monitoring data were sampled every 15 minutes. Anomaly time and type for each feeder line were recorded during the operation. In the following cases, threephase voltages were chosen as the measurement variables to formulate the data matrices. Voltage violation and disturbance were considered as the risk items.
1) Case Study on Voltage Violation: Voltage violation is an common anomaly type in distribution networks, which increases the operational risks of the networks. It contains two aspects, i.e., exceeding the upper limit or the lower limit. In this case, we assess the operational risk of one feeder line suffering from voltage violation to validate the proposed approach. The feeder, with branch lines and substations, contained distribution transformers in total. The online monitoring data were sampled from 2017/3/1 00:00:00 to 2017/3/14 23:45:00, thus a data matrix was formulated. The data with anomaly time and type labelled are shown in Figure 9. The involved parameters are set as follows:
– the moving window’s size : ;
– the moving step size : ;
– the model depth: ;
– the number of neurons in each hidden layer of :
;
– the number of neurons in each hidden layer of :
;
– the number of neurons in each hidden layer of :
;
– the feature size: ;
– the number of steps applied to : ;
– the number of iterations to calculate : ;
– the initial learning rate : ;
– the slope of the leak in LReLu: ;
– the dropout coefficient: ;
– the required approximation error : ;
– the test function : ln.
Figure 10 shows the anomaly detection results when is sampled from uniform distribution, gaussian distribution, exponential distribution, respectively. From the curves, we can obtain:
I. The value of on March 13th is significantly smaller than those on other days, which indicates anomaly occurred on March 13th. The PDFs of the extracted features corresponding to March 13th and other days (such as March 1st) are shown in Figure 11. It can be observed that the PDFs of the extracted features are different when the feeder operates in different states, i.e., the PDF of the extracted features in normal operating state is more centered.
II. The p values of corresponding to the anomaly time for different sampling distribution were calculated, results of which were , , , respectively. It validates the performance of the proposed approach is almost not affected by the assumption sampling distribution.
III. The calculated p values of on March 13th are smaller than , which indicates the feeder operates in emergency state and it needs to be further analyzed.
2) Case Study on Voltage Disturbance: Voltage disturbance is an complex anomaly type in distribution networks, which is random in magnitude and could involve sporadic bursts as well. It may be caused by short circuit fault, sudden load change, or connection of distribution generation, etc. In this case, the performance of the proposed approach is tested by assessing the operational risk of one feeder line suffering from voltage disturbance. The feeder contained distribution transformers and the online monitoring data were sampled during 2017/3/1 00:00:00 2017/3/14 23:45:00, thus a data matrix was formulated. The data with anomaly time and type labelled are shown in Figure 12. The moving window’s size was , the number of neurons in each hidden layer of / were , and the number of neurons in each hidden layer of were . The other parameters were set the same as in the above case.
Figure 13 shows the anomaly detection results corresponding to different sampling distributions. From the curves, we can obtain:
I. The value of on March 9th is smaller than those on other days, which indicates the latent anomaly is accurately detected. The PDFs of the extracted features corresponding to abnormal and normal feeder operating states are shown in Figure 14. It can be observed that, the PDF of the extracted features in normal operating state is more centered.
II. For each curve, the p values corresponding to the anomaly time were calculated, results of which were , , , respectively. It also validates the performance of the proposed approach is almost not affected by the assumption of sampling distribution.
III. The calculated p values of on March 9th is smaller than , which indicates the feeder operates in emergency state and it deserves to be further analyzed.
3) Comparison with Other Existing Approaches: We further compare the proposed approach with other existing approaches in accuracy and efficiency by assessing the operational risks of feeder lines suffering from anomalies. Here, the risk levels in Table I
are simplified as abnormal (emergency state) and normal (the other states). Anomaly detection techniques based on deep autoencoders (DAE)
[20][21], principal component analysis (PCA)
[2], spectrum analysis (SPA) [18], or threshold analysis (THA) [22] have been well studied. In order to make a full comparison with the other existing techniques, we analyzed feeder lines with anomaly records during 2017/3/1 00:00:00 2017/4/30 23:45:00. Here, voltage violation and fluctuation were considered as anomaly items. For DAE, PCA, and SPA, the moving window’s size was , the moving step size was and the test function was ln. For THA, the anomaly index was defined as(17) 
where is the total number of sampling times, is the duration for each abnormal state (i.e., the voltage exceeds the upper or lower limit), is the number of abnormal states, and . In the experiments, the optimal parameters involved in each detection approach were tested, and they were set as in Table III.
Approaches  Parameter Settings 

BiGANs  the model depth: 5; 
the number of neurons in each hidden layer of : ;  
the number of neurons in each hidden layer of : ;  
the number of neurons in each hidden layer of : ;  
the feature size: ;  
the number of steps applied to : 1;  
the number of iterations to calculate : 5;  
the initial learning rate : 0.0001;  
the slope of the leak in LReLu: 0.2;  
the dropout coefficient: 0.1;  
the required approximation error : 0.0001;  
DAE  the model depth: ; 
the number of neurons in each hidden layer of encoder: ;  
the number of neurons in each hidden layer of decoder: ;  
the feature size: ;  
the initial learning rate: ;  
the activation function: ;  
the minimum reconstruction error: ;  
the optimizer: .  
PCA  the contribution rate of top eigenvalues: . 
SPA  the signalnoiseratio: . 
THA  the lower limit of voltage violation: ; 
the upper limit of voltage violation: ;  
the anomaly threshold : . 
In order to compare the detection performances of different approaches, and are used to measure the performance of each method. The and are defined as
(18) 
where is the number of anomalies that are correctly detected, denotes the number of groundtruth anomalies, and is the number of all detected alarms. The higher the and the smaller the , the better detection performance of one approach. Meanwhile, in order to compare the efficiency of different approaches, the for each sampling times (i.e., the moving window’s width) was counted. The experiments were conducted on a server with GHz central processing unit (CPU) and GB random access memory (RAM). The comparison results are shown in Table IV.
Methods  ()  ()  (s) 

BiGANs  76.80  13.90  4.235 
DAE  69.60  30.95  1.856 
PCA  53.60  23.86  0.587 
SPA  56.40  35.02  0.790 
THA  42.80  29.17  0.314 
It can be observed that the proposed BiGANsbased approach has the highest and the smallest , which indicates it outperforms DAE, PCA, SPA and THA in anomaly detection performance. The reasons are:

THA uses the simple statistical features that often are not well generalized, which makes it impossible to detect the latent anomalies.

SPA makes an assumption that the input data follows a certain distribution and the entries of the data matrix are independently and identically distributed. Besides, SPA is an anomaly detection approach based on correlation analysis, which is not sensitive to the amplitude variation of the data.

PCA is a linear dimension reduction approach and the optimal parameter measuring the contribution rate of top eigenvalues is hard to find for all data segments.

DAE is a nonlinear generalization of PCA and it is vulnerable to the random fluctuations for the reason of simple network structure and learning algorithm.
The proposed approach overcomes the shortcomings of the other existing approaches by using complex to learn the features of the input data in an adversarial way. Meanwhile, it is noted that the proposed approach has the highest (i.e., s) for the reason of complex network structure and learning algorithm, which indicates the worst detection efficiency compared with the other approaches. However, considering the online monitoring data in the researched network is sampled every 15 minutes, the proposed approach is practical for the online operational risk analysis. Moreover, with the development of graphics processing unit (GPU) and fieldprogrammable gate array (FPGA) techniques, the computational efficiency will be improved greatly.
V Conclusion
This paper proposes a new unsupervised approach to realize the operational risk assessment in distribution networks. The proposed approach is capable of mining the hidden structure of the real data and automatically learning the most representative features of the data in an adversarial way. By analyzing the distribution of the extracted features, a statistical index is calculated to indicate the data behavior. The standard form of the index is experimentally proved to approximate the t distribution. Furthermore, the operational risks of feeders are divided into emergency, high risk, preventive and normal by the defined intervals of the confidence level for the population mean of the standardized index, which makes it possible for the quantitative risk assessment.
Cases on the synthetic data offer guidelines for the parameter selections of the proposed approach, including simple sampling distribution, moderate model depth and feature size. Cases on the realworld online monitoring data indicate the proposed approach can improve the risk assessment accuracy a lot compared with the other existing techniques. However, the computational burden is increased for the complex network structure and learning algorithm of the approach. In view of the outstanding advantages in assessment performance, the proposed approach can serve as a primitive for analyzing the spatiotemporal data in distribution networks.
References
 [1] M. R. Jaafari Mousavi, “Underground distribution cable incipient fault diagnosis system,” 2007.
 [2] L. Xie, Y. Chen, and P. R. Kumar, “Dimensionality reduction of synchrophasor data for early event detection: Linearized analysis,” IEEE Trans. Power Syst., vol. 29, no. 6, pp. 2784–2794, Nov. 2014.
 [3] A. C. Adewole and R. Tzoneva, “Fault detection and classification in a distribution network integrated with distributed generators,” in IEEE Power and Energy Society Conference and Exposition in Africa: Intelligent Grid Integration of Renewable Energy Resources (PowerAfrica), Jul. 2012, pp. 1–8.
 [4] M. Wu and L. Xie, “Online detection of lowquality synchrophasor measurements: A datadriven approach,” IEEE Trans. Power Syst., vol. 32, no. 4, pp. 2817–2827, Jul.
 [5] M. Pignati, L. Zanni, P. Romano, R. Cherkaoui, and M. Paolone, “Fault detection and faulted line identification in active distribution networks using synchrophasorsbased realtime state estimation,” IEEE Trans. Power Del., vol. 32, no. 1, pp. 381–392, Feb. 2017.
 [6] A. Ghaderi, H. A. Mohammadpour, H. L. Ginn, and Y.J. Shin, “Highimpedance fault detection in the distribution network using the timefrequencybased algorithm,” IEEE Trans. Power Del., vol. 30, no. 3, pp. 1260–1268, Jun. 2015.
 [7] L. Chu, R. C. Qiu, X. He, Z. Ling, and Y. Liu, “Massive streaming pmu data modeling and analytics in smart grid state evaluation based on multiple highdimensional covariance tests,” IEEE Trans. Big Data, vol. 4, no. 1, pp. 55–64, Mar. 2018.
 [8] I. J. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Proc. Adv. Neural Inform. Process. Syst., 2014, pp. 2672–2680.
 [9] J. Donahue, P. Krähenbühl, and T. Darrell, “Adversarial feature learning,” in Proc. Int. Con. Learn. Rep.
 [10] X. Glorot and Y. Bengio, “Understanding the difficulty of training deep feedforward neural networks,” in Proc. 13th Int. Conf. Artf. Intell. Stat., 2010, pp. 249–256.

[11]
V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in
Proc. 27th Int. Con. Mach. Learn., 2010, pp. 807–814.  [12] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. 30th Int. Con. Mach. Learn., vol. 30, no. 1, 2013, p. 3.
 [13] B. Xu, N. Wang, T. Chen, and M. Li, “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853, 2015. [Online]. Available: http://arxiv.org/abs/1505.00853
 [14] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958, Jun. 2014.

[15]
J. Duchi, E. Hazan, and Y. Singer, “Adaptive subgradient methods for online learning and stochastic optimization,”
J. Mach. Learn. Res., vol. 12, pp. 2121–2159, Jul. 2011. 
[16]
T. Tieleman and G. Hinton, “Lecture 6.5rmsprop, coursera: Neural networks for machine learning,”
University of Toronto, Technical Report, 2012. [Online]. Available: https://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf  [17] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” 2015.
 [18] X. Shi, R. Qiu, X. He, L. Chu, and Z. Ling, “Anomaly detection and location in distribution networks: A datadriven approach,” arXiv preprint arXiv:1801.01669, 2018. [Online]. Available: https://arxiv.org/abs/1801.01669
 [19] R. D. Zimmerman, C. E. MurilloSanchez, and R. J. Thomas, “Matpower: Steadystate operations, planning, and analysis tools for power systems research and education,” IEEE Trans. Power Syst., vol. 26, no. 1, pp. 12–19, Feb 2011.
 [20] W.H. Lee, J. Ortiz, B. Ko, and R. Lee, “Time series segmentation through automatic feature learning,” arXiv preprint arXiv:1801.05394, 2018. [Online]. Available: https://arxiv.org/abs/1801.05394
 [21] P. P. Barbeiro, H. Teixeira, J. Krstulovic, J. Pereira, and F. Soares, “Exploiting autoencoders for threephase state estimation in unbalanced distributions grids,” Elect. Power Syst. Res., vol. 123, pp. 108–118, Jun. 2015.
 [22] T. Xiao, W. Pei, H. Ye, G. Niu, H. Xiao, and Z. Qi, “Operation risk assessment of distribution network considering time dependence correlation coefficient,” J. Eng., vol. 2017, no. 13, pp. 2489–2495, Dec. 2017.
Comments
There are no comments yet.