1 Introduction
Power system transient stability assessment is one of the most significant ways to ensure the security and stability of power systems. It assesses the ability of a power system to recover to the original secure state or transition to a new secure state after withstanding a specific disturbance [wu_survey_2012]. Therefore, fast and accurate transient stability assessment is needed to deal with emergencies in time and effectively ensure the secure operation of power systems. However, Time Domain Simulation (TDS), a traditional method of transient stability assessment, is extremely timeconsuming due to the nonlinear complexity of power systems [tang_transient_1994]. In recent years, to improve the computational speed of assessment models, several transient stability assessment methods based on deep learning are proposed [gao_transient_2019, li_transient_2020, tacchi_model_2020, wei_realtime_2019]. These assessment methods are usually datadriven, and need largescale valid samples [bo_power_2016, vasant2018intelligent, vasant2019intelligent, vasant2020intelligent].
However, there are two problems that need to be addressed when training the assessment model in practice. Firstly, the insufficient samples cannot effectively represent the distribution of features, resulting in the risk of model overfitting. Moreover, since the category distribution of samples is highly imbalanced, the learning of the unstable samples is usually inhibited, leading to poor performance of trained models on unstable samples.
To solve the insufficient and imbalanced samples and improve the performance of transient stability assessment models, we use a sample generation model to supplement the transient stability samples, especially the unstable samples. Generative Adversarial Network (GAN) [goodfellow_generative_2014] is widely used in sample generation tasks, which trains a generator and a discriminator in the adversarial process. However, the generation process of GAN is uncontrollable, resulting in a large number of unnecessary samples. Instead, Conditional GAN (CGAN) realized a conditional generation mechanism based on the architecture of GAN to generate required samples [mirza_conditional_2014]. Furthermore, since transient stability samples are usually recorded as tabular data, we focus on the dedicated CTGAN method which implemented modespecific normalization and conditional generation for tabular data [xu_modeling_2019].
Therefore, this paper proposes an imbalanced sample generation framework based on CTGAN for power system transient stability. Considering the structural characteristics of transient stability samples, the generation framework firstly models the samples as tabular data, and uses the Gaussian Mixture Model (GMM) to normalize the tabular data [anzai_2012_pattern, tsukakoshi_analysis_2012]. Multiple conditions, including the transient stability and the load level, are converted into a single condition vector to enable multiconditional generation. Besides, we design a multimetric evaluation to effectively evaluate the obtained sample generation framework. The evaluation includes the effect of conditional generation, distance calculation, and the performance of transient stability assessment models trained with generated samples. Case studies on the IEEE 39bus system show that the proposed framework can effectively balance the transient stability samples and significantly improve the performance of transient stability assessment models.
2 Sample Generation Framework for Transient Stability
In this section, we detail the proposed sample generation framework based on CTGAN for power system transient stability. As shown in Fig. 1, the proposed sample generation framework first models transient stability samples as tabular data, then transform the data using onehot code and GMM normalization, and finally train the CTGAN model.
2.1 Transient Stability Sample Representation
To construct appropriate input characteristics, we should not only consider the correlation between characteristics and transient stability, but also consider whether the characteristics can be obtained in real time or quickly calculated in actual power system. Assuming that there is no another fault in the transient process, the transient stability of the power system has been determined at the moment of fault removal. Therefore, we take the values at the moment of fault clearing as the representation of transient stability samples. A transient stability sample is represented by the voltage magnitude and voltage angle of bus nodes, active power and reactive power of load nodes, active power and reactive power of generator nodes at the moment of fault clearing.
2.2 Transformation of Multicondition Vector
With the basic idea of conditional generation, the transient stability and the load level of transient stability samples are used as generation conditions to realize multiconditional generation. However, in common CGANs, the conditional vector is a onehot code, which can only represent a single condition. Therefore, a simple transformation method for multicondition vector is designed in the proposed generation framework, which aims to convert multiple condition vectors into a single condition vector:
(1) 
where represents conditions, represents the number of conditions, and represents the operation of serially concatenate. The specific principle is that condition vectors can be serially concatenated, and then transformed into one condition vector as the condition input of CTGAN model.
2.3 Normalization with GMM
To eliminate the dimensional influence between different characteristics, it is important to transform the samples through appropriate methods before inputting them into the model for training. Transient stability samples are composed of the feature values of bus, load, and generator nodes. However, these continuous values cannot be normalized by onehot code.
Considering the complex distribution of transient stability samples, the general minmax normalization is unable to fit the complex distribution. Therefore, when processing transient stability samples, the variational GMM is used to process continuous values to fit the complex distribution of each feature. The basic steps of the normalization are elaborated as follows:
2.3.1 Learning GMM.
For each continuous column , we use a variational Gaussian mixture model to learn a GMM distribution:
(2) 
where is the number of modes, , and
are weight, mean value and standard deviation of the
mode, respectively.2.3.2 Calculating probability density.
2.3.3 Normalization.
We find the highest in modes and normalizing it. For instance, if the highest probability density in three modes , the value can be transformed to a onehot code and a scalar normalized to .
2.4 CTGANbased Network
We adopt CTGAN model as the basic sample generation model, which includes a generator and a discriminator. And we construct the generator and the discriminator with fully connected layers respectively.
The processed transient stability samples are applied as the training input of the constructed CTGANbased network. In the training process, the discriminator and generator are trained by turns to obtain the model for the sample generation framework. To test the model, we apply it to the generation task of transient stability samples with labels., And we can also control the generating conditions to generate samples with specific labels purposefully, such as controlling the model to generate transient unstable samples.
3 Multimetric Evaluation
After realizing the generation framework of power system transient stability samples, it is necessary to evaluate the generation framework. This paper designs a multimetric evaluation for the transient stability samples generation framework. As shown in Fig. 2, the evaluation is composed of the following three metrics: the effect of conditional generation, the distance between real samples and generated synthetic samples, and the performance of assessment models trained with generated samples.
3.1 Conditional Generation
The power system transient stability sample generation framework should have the ability to control the transient stability and the load level characteristics of power system samples in generating process. By comparing the proportions of transient stability samples that generated under different settings (without setting conditions, setting conditions as transient stable, and setting conditions as transient unstable), the condition generation ability of the transient stability can be evaluated. The same is true for the evaluation of the load level condition.
3.2 Distance Calculation
Without setting the generating conditions, the generated samples should be similar to the real samples as much as possible. Therefore, calculating the similarity or distance between the two distributions is an efficient metric for evaluating the generation framework.
First, dimensionality reduction methods, such as Principal Component Analysis (PCA)
[yata_principal_2015], should be used to reduce the dimension of the samples to some appropriate degree. Second, we convert the dimensionality reduced samples into discrete probability distributions through the binning operation. Finally, we calculate the distance between the probability distribution of synthetic samples and real samples. Common methods for measuring the similarity between two distributions are adopted to calculate the distance between the distributions, such as KL divergence, JS divergence, and Wasserstein distance
[arjovsky_wasserstein_2017].3.3 Performance of Assessment Models
To evaluate the generated samples more practically, the performance of the transient stability assessment model trained with generated samples is a proper metric. Some classical networks for classification are selected as the power system transient stability assessment model, the generated samples are used for the training of assessment models, and the performance is obtained by testing the assessment models.
More specifically, the real dataset of transient stability samples is randomly divided into and . We randomly generate and get the united set . And , , and are used for the training of the transient stability assessment models respectively to obtain different assessment models and the models are tested on . These models are tested on the real test set to obtain the accuracy, recall rate of transient stable samples, recall rate of transient unstable samples. And these test scores can be used as the evaluation metric to evaluate the quality of the generation framework.
4 Experiment
In this section, we study our proposed framework on the classical IEEE 39bus power system [pai_energy_1989] and show its excellent performance by evaluating the effect of conditional control, calculating the distance between distributions and the scores of assessment models trained with generated samples.
4.1 Experimental Setup
4.1.1 Time Domain Simulation Samples.
Matpower [zimmerman_matpowermatlab_1997] and Power System Analysis Toolbox (PSAT) [ayasun_voltage_2006] are applied to obtain the original dataset of real samples, taking the IEEE 39bus system as the basic system. The power system contains 39 buses, 10 generators, 19 loads and 46 transmission lines. For simulating the transient stability samples, we adopt the following principles:

Randomly changing both active and reactive power of all loads from 60% to 145% of basic load level.

Using the matpower to compute the optimal power flow for the next TDS.

Randomly selecting a fault line, setting a threephase grounding fault from 20% to 80% and clearing it after a time from 1/60 to 1/3 seconds.

Using the PSAT to do time domain simulation for 10 seconds.

Labeling the stability of generated sample by values of generators after TDS.
With the simulation operations above performing, we generate a total of 14,221 transient stability samples that include 11,510 stable samples and 2,711 unstable samples as the original dataset.
4.1.2 Generation Model Training.
CTGAN is used as the primary sample generation model, which includes a generator and a discriminator.. In the generator, two fully connected layers are used, and each fully connected layer is equipped with a batch normalization layer and a ReLU activation layer. The tanh and softmax activation functions are used for the output layer. In the discriminator, two fully connected layers are used, and the dropout layer is used to filter the nodes appropriately to reduce overfitting.
4.2 Evaluation Metrics
The CTGANbased generation framework of power system transient stability samples is trained with the simulated samples as the training set. After that, it is necessary to evaluate the quality of the generation framework. This paper designs a multimetric evaluation for the generation framework of transient stability samples, composed of three evaluation metrics.
4.2.1 The Effect of Conditional Generation.
We evaluate the ability to control the transient stability and the load level of power system samples in generating.
Table 1 shows the result of conditional generation with different transient stability condition settings. We set the conditions as follows: no condition, stable, and unstable. When the condition is set as transient stable, the proportion of stable samples generated is increased by 18.7% compared with the samples generated without condition. When the condition is set as unstable, the increment is 48.8%. The result shows that the transient stability ratio of the generated samples can be effectively controlled, and the framework can effectively balance the transient stability samples by generating more unstable samples.
Condition  Stable proportion (%)  Unstable proportion (%) 
Without condition  59.92  40.08 
With condition (stable)  71.10  28.90 
With condition (unstable)  40.38  59.62 
Moreover, the result of conditional generation with different load level condition settings is shown in Table 2. We set the conditions as no condition, and as 18 load levels (60% to 145%, with a step of 5%). We count the number of samples of corresponding load level in the generated samples under the control of generation conditions, and calculate the proportion for comparison. When the condition is set to a specific load level, the proportion of the corresponding load level generated will be higher than that of the samples generated without condition. The results show that the generation framework can effectively control the load level proportion of the generated samples.





70%  2.49  3.35  34.54  
80%  3.89  4.64  19.32  
90%  2.70  4.92  81.90  
100%  3.38  4.59  36.09  
110%  5.81  8.86  52.60  
120%  2.51  2.58  2.67  
130%  2.50  4.60  84.53  
140%  0.54  1.81  233.76 
4.2.2 The Distance between Real and Generated Sample Distribution.
The generated samples should be similar to the real samples as much as possible. Therefore, calculating the distance between the two distributions is an efficient metric for evaluating the generation framework.
Table 3 shows the results of JS divergence and Wasserstein distance calculated between distributions. We randomly select 2,000 samples from real samples as set , repeat the operation to get , and generate 2,000 samples as set . From Table 3, we can see that the distance between and and the distance between and are in the same order of magnitude, which means that the samples generated by the generation framework are similar to the real samples in these three distance measurements.




,  0.002826  0.001429  
,  0.063141  0.006388  
,  0.063084  0.005939 
4.2.3 The Performance of Assessment Models Trained with Generated Samples.
The performance of assessment models trained with generated samples is a valuable metric. In this paper, we select Multilayer Perceptron (MLP) and Decision Tree (DT) as the power system transient stability assessment models for training and testing, since they are classical network models for classification problems. The hidden layer size of MLP is 200, and the max number of iterations is 500. The max depth of DT is 100.
Table 4 shows the test results of assessment models trained with different datasets. We randomly divide the real dataset into with 8,533 samples and with 5,688 samples. We randomly generate with 8,533 samples and get the united set . Note that , , and are used for the training of the transient stability assessment models respectively to obtain different assessment models and the models are tested on . The scores of the model trained with are lower than the scores of the model trained with . However, the scores of the model trained with are higher than the scores of the model trained with . The recall rate of unstable samples is increased by 1.48% in DT, and is increased by 2.74% in MLP. The results show that adding the generated samples into the train set is able to improve the performance of transient stability assessment models, especially for the unstable label, which is the scarce class in the train set.







DT  0.9770  0.9348  0.9813  0.9694  
DT  0.9430  0.6856  0.9376  0.8969  
DT  0.9788  0.9486  0.9837  0.9734  
MLP  0.9883  0.9261  0.9861  0.9771  
MLP  0.7719  0.8061  0.8509  0.7780  
MLP  0.9832  0.9515  0.9863  0.9775 
5 Conclusion
In this paper, we attempt to solve the imbalanced distribution and insufficient samples in the research of power system transient stability assessment. We propose a CTGANbased controllable sample generation framework for transient stability. In the generation framework, firstly, the transient stability samples are processed into tabular data. Then the transient stability and load level are converted into the conditional vector and the variational Gaussian mixture model is used to fit and normalize the tabular data. And finally train the CTGAN model with processed samples. Moreover, we design a multimetric evaluation to effectively evaluate the generation framework from three aspects: the effect of conditional generation, the distance between real and generated sample distribution, and the performance of the assessment model trained with generated samples. Experiments demonstrate that samples generated through the proposed generation framework are valid and effective in multiple metrics.
5.0.1 Acknowledgement.
This work is funded by National Key Research and Development Project (Grant No: 2018AAA0101503) and State Grid Corporation of China Scientific and Technology Project: Fundamental Theory of Humanintheloop HybridAugmented Intelligence for Power Grid Dispatch and Control.
Comments
There are no comments yet.