I Introduction
The “deep learning revolution” largely enlightened by the October 2012 ImageNet victory
[1]has transformed various industries in human society, including artificial intelligence, health care, online advertising, transportation, and robotics. As the most widelyused and mature model in deep learning, Deep Neural Network (DNN)
[2] demonstrates superb performance in complex engineering tasks such as recommendation [3], bioinformatics [4], mastering difficult game like Go [5], and human pose estimation
[6]. The capability of approximating continuous mappings and the desirable scalability make DNN a favorable choice in the arsenal of solving largescale optimization and decision problems in engineering systems. In this paper, we apply DNN to power systems for solving the essential securityconstrained direct current optimal power flow (SCDCOPF) problem in power system operation.The OPF problem, first posed by Carpentier in 1962 in [7], is to minimize an objective function, such as the cost of power generation, subject to all physical, operational, and technical constraints, by optimizing the dispatch and transmission decisions. These constraints include Kirchhoff’s laws, operating limits of generators, voltage levels, and loading limits of transmission lines [8]. The OPF problem is central to power system operations as it underpins various applications including economic dispatch, unit commitment, stability and reliability assessment, and demand response. While OPF with a full AC power flow formulation (ACOPF) is most accurate, it is a nonconvex problem and its complexity obscures practicability. Meanwhile, based on linearized power flows, DCOPF is a convex problem admitting a wide variety of applications, including electricity market clearing and power transmission management. See e.g., [9, 10] for a survey.
The SCDCOPF problem, a variant of DCOPF, is critical for reliable power system operation against contingencies caused by equipment failure [11]. It considers not only constraints under normal operation, but also additional steadystate security constraints for each possible contingency [12].^{1}^{1}1There are two types of SCDCOPF problems, namely the preventive SCDCOPF problem and the corrective SCDCOPF problem. In the preventive SCDCOPF problem, the system operating decisions cannot change once they are determined, thus they need to guarantee feasibility under both the pre and post contingency constraints. For the corrective SCDCOPF problem, the system operator can have a short time (e.g., 5 minutes) [12] to adjust the operating points after the occurrence of each contingency. Our DeepOPF approach is applicable to both problems. We focus on the preventive SCDCOPF problem in this paper for easy illustration. While SCDCOPF is important for reliable power system operation, solving it incurs excessive computational complexity, limiting its applicability in largescale power networks [13].
To this end, we propose a machine learning approach for solving the SCDCOPF problem efficiently. Our approach is inspired by the following observations.

Given a power network, solving the SCDCOPF problem is equivalent to depicting a highdimensional mapping between load inputs and generations and voltages outputs.

In practice, the SCDCOPF problem is usually solved repeatedly for the same power network, e.g., every 5 minutes, with different load inputs at different time epochs.
As such, it is conceivable to leverage the universal approximation capability of deep feedforward neural networks
[14, 15], to learn the inputtooutput mapping for a given power network, and then apply the mapping to obtain operating decisions upon giving load inputs (e.g., once every 5 minutes).Specifically, we develop DeepOPF as a DNN based solution for the SCDCOPF problem. As compared to conventional approaches based on interiorpoint methods [16], DeepOPF excels in (i) reducing computing time and (ii) scaling well with the problem size. These salient features are particularly appealing for solving (largescale) SCDCOPF problems, which are central to secure power system operation with contingency in consideration. Note that the complexity of constructing and training a DNN model is minor if amortized over the many problem instances (e.g., one per every 5 minutes) that can be solved using the same model. In more detail, our contributions are summarized as follows.
First, after reviewing the SCDCOPF problem in Sec. III, we describe DeepOPF as a DNN framework for solving the SCDCOPF problem in Sec. IV. In DeepOPF, we first construct and train a DNN to learn the mapping between the load inputs and the generations. We then directly compute the phase angles from the generations and loads by using the (linearized) power flow equations. Such a twostep procedure significantly reduces the dimension of the mapping to learn, subsequently cutting down the size of our DNN and the amount of training data/time needed. We also design a postprocessing procedure to ensure the feasibility of the final solution.
Then in Sec. V, we derive a condition suggesting that the approximation accuracy of the neural network in DeepOPF
decreases exponentially in the number of layers and polynomially in the number of neurons per layer. This allows us to systematically tune the size of the neural network in
DeepOPF according to the prespecified performance guarantee. We also derive the computational complexity of DeepOPF.Finally, we carry out simulations and summarize the results in Sec. VI. Simulation results of IEEE test cases show that DeepOPF always generates feasible solutions with negligible optimality loss, while speeding up the computing time by up to 400x as compared to a stateoftheart solver. The results also highlight a tradeoff between the prediction accuracy and running time of DeepOPF.
Due to the space limitation, all proofs are in the supplementary material.
Ii Related Work
Existing studies on solving SCOPF mainly focus on three lines of approches. The first is numerical iteration algorithm, where the SCOPF problem is first approximated as an optimization problem e.g., quadratic programming [17]
, or linear programming
[18], and the numerical iteration solvers like interiorpoint methods [19] are applied to obtain the optimal solutions. However, the time complexity of these numericaliteration based algorithms can be substantial for largescale power systems due to the excessive number of constraints in regard to different contingencies. See [12] for a survey on the numerical iteration algorithms for SCOPF.The second is heuristic algorithm based on computational intelligence techniques, including evolutionary programming like swarm optimization. For instance, a particle swarm optimization method with reconstruction operators was proposed in
[20] for solving the SCOPF problem is proposed, where the reconstruction operators and an external penalty are adapted to handle the constraints and improve the quality of the final solution. However, there are two drawbacks of this kind of methods. First, there is no performance guarantee on neither the optimality nor feasibility. Second, the method may still incur high computational complexity.The third is learningbased method. Existing studies focus on integrating the learning techniques (e.g., neural network, decision tree) into the conventional algorithm to facilitate the process of solving SCOPF problems. For instance,
[21] applies a neural network to learn the system security boundaries as an explicit function to be used in the OPF formulation. In [22, 23], decision trees are used to derive tractable rules from large data sets of operating points, which can efficiently represent the feasible region and identify possible solutions. However, the proposed heuristic schemes are still iteration based and may still incur a significant amount of running time for largescale instances.Recently, there have been some works on learning the active constraints set so as to reduce the size of OPF problems to solve [24, 25]. Determining active constraints set, however, is highly nontrivial for SCOPF problems. With incorrect active constraint sets, the approach may generate infeasible solution and is not clear how to derive a feasible solution at the end. In addition, [26] proposes neuralnetwork/decisiontree based methods to directly obtain a solution for ACOPF problems but these methods cannot guarantee the feasibility of the solutions.
Different from existing studies, our DeepOPF uses neural networks to learn the mapping between the load inputs and generation and voltage outputs, so as to directly obtains solutions for the SCDCOPF problem with feasibility guarantee. As compared to our previous effort in [27], this paper studies the more challenging SCDCOPF problem and, more importantly, characterizes a useful condition that allows us to design the neural network according to the prespecified performance guarantee of the obtained solution.
Iii SecurityConstrained DCOPF problem
We study the widelystudied SCDCOPF problem, which considers contingencies due to outage of any single transmission line in the power system. The objective is to minimize the total generation cost subject to the generator operation limits, the power balance equation, and the transmission line capacity constraints under all contingencies [28]. Assumed the power network remains connected upon contingency, the SCDCOPF problem is formulated as follows ^{2}^{2}2We note that there is another formulation involving only generations because the phase angels can be uniquely determined by the generations and loads; see e.g., [29]. As the complexity of solving either formulation is similar [30], we focus on the standard formulation.:
var. 
Here is the number of buses, is the number of generators, and is the number of contingency case ( denotes the case without any contingency). is the generator output, and are the generation limits at the th bus, and is the load input.
is the phase angles vector under the
th contingency and is the reaactance of the transmission line between the th bus and the th bus. is the admittance matrix for the th contingency, which is an matrix as follows:(2) 
where
In the above SCDCOPF formulation, the first set of constraints describe the generation limits. The second set of constraints are the power flow equations with contingencies taken into account. The third set of constraints capture that line transmission capacity for both precontingency and postcontingency cases. In the objective, is the cost function for the generator at the th bus, commonly modeled as a quadratic function [31]:
(3) 
where , , and are the model parameters and can be obtained from measured data of the heat rate curve [28].
While the SCDCOPF problem is important for reliable power system operation and it is a convex (quadratic) problem with efficient solvers, solving it for largescale power networks in practice still incurs excessive running time, limiting its practicability [13]. In the next, we address this issue by proposing a neural network approach to solve the SCDCOPF problem in a fraction of the time used by existing solvers.
Iv DeepOPF for Solving SCDCOPF
Iva A NeuralNetwork Framework for OPF
We outline a general predictandreconstrct framework for solving OPF in Fig. 1. Specifically, we take the dependency induced by the equality constraints among the decision variables in the OPF formulation. Given the load inputs, the learning model (e.g., DNN) is then applied only to predict a set of independent variables. The remaining variables are then determined by leveraging the (power flow) equality constraints. This way, we not only reduce the number of variables to be predicted, but also ensure the equality constraints are satisfied, which is usually difficult in generic learning based approaches. In this paper, we materialize the general framework to develop DeepOPF for solving the SCDCOPF problem and obtain strong theoretical and empirical results.
IvB Overview of DeepOPF
The framework of DeepOPF is shown in Fig. 2, which is divided into the training and inference stages. We first construct and train a DNN to learn the mapping between the load inputs and the generations. We then directly compute the voltages from the generations and loads by using the (linearized) power flow equations.
We discuss the process of constructing and training the DNN model in the following subsections. In particular, we discuss the preparation of the training in Sec. IVC, the variable prediction and reconstruction in Sec. IVD, and the design and training of DNN in Sec. IVE.
In the inference stage, we directly apply DeepOPF to solve the SCDCOPF problem with given load inputs. This is different from recent learningbased approaches for solving OPF where machine learning only help to facilitate existing solvers, e.g., by identifying the active constraints [24]. We describe a post processing to ensure the feasibility of the obtained solutions in Sec. IVF.
IvC Load Sampling and Preprocessing
We sample the loads within uniformly at random, where is the default power load at the th bus and is the percentage of sample range, e.g., 10. It is then fed into the traditional quadratic programming solver [32] to generate the optimal solutions. Uniform sampling is applied to avoid the overfitting issue which is common in generic DNN approaches.^{3}^{3}3
For load inputs of large dimension, uniform mechanism may not be sufficient to guarantee enough good samples, especially near the boundary. In those cases, Markov chain Monte Carlo (MCMC) methods can be applied to sample according to a preset probability distribution, so as to collect sufficient samples near the boundary of the sampling space and obtain a dense sample set around the significant elements of the load vector.
After that, the training data is normalized (using the statistical mean and standard variation) to improve training efficiency.IvD Generation Prediction and Phase Angle Reconstruction
We express as follows, for ,
(4) 
where is a scaling factor. Instead of predicting the generations with diverse value ranges, we instead predict the scaling factor and recover . This simplifies the DNN output layer design to be discussed later. Note that generation of the slack bus is obtained by subtracting generations of other buses from the total load.
Once we obtain , we directly compute the phase angles by a useful property of the admittance matrices [33, 34]. We first obtain an matrix, by eliminating the row and column corresponding to the slack bus from the admittance matrix for each contingency , . It is wellunderstood that is a fullrank matrix [28], [35]. Then we compute an dimension phase angle vector as
(5) 
where and stand for the dimension output and load vectors for buses excluding the slack bus, respectively. At the end, we output the dimension phase angle vector by inserting constant representing the phase angle for the slack bus into .
Again, there are two advantages of this approach. On one hand, we use the property of the admittance matrix to reduce the number of variables to predict by our neural network, cutting down the size of our DNN model and the amount of training data/time needed. On the other hand, the equality constraints involving the generations and the phase angles can be satisfied automatically, which is difficult to handle in alternative learningbased approaches.
IvE The DNN Model
The core of DeepOPF is the DNN model, which is applied to approximate the loadtogeneration mapping, given a power network. The DNN model is established based on the multilayer feedforward neural network structure, which consists of a typical threelevel network architecture: one input layer, several hidden layers, and one output layer. More specifically, the applied DNN model is defined as:
where denotes the input vector of the network, is the output vector of the th hidden layer, is the output vector (of the output layer), and is the generated scaling factor vector for the generators. Matrices
, and activation functions
and are subject to design.IvE1 The architecture
In the DNN model, represents the normalized load data, which is the inputs of the network. After that, features are learned from the input vector by several fully connected hidden layers. The th hidden layer models the interactions between features by introducing a connection weight matrix and a bias vector . Activation function
further introduces nonlinearity into the hidden layers. In our DNN model, we adopt the widelyused Rectified Linear Unit (ReLU) as the activation function of the hidden layers, which can be helpful for accelerating the convergence and alleviate the vanishing gradient problem
[1]. In addition, the Sigmoid function
[2], , is applied on the output layer to project the outputs of the network to .IvE2 The loss function
After constructing the DNN model, we need to design the corresponding loss function to guide the training. Since there exists a linear correspondence between
and , there is no need to introduce the loss term of the phase angles. The difference between the generated solution and the actual solution of is expressed by the sum of mean square error between each element in the generated scaling factors and the optimal scaling factors :(6) 
where represents the number of generators.
Meanwhile, we introduce a penalty term related to the inequality constraint into the loss function. We first introduce an matrix for each contingency, where is the number of adjacent buses. Each row in corresponds to an adjacent bus pair. Given any the adjacent bus pair under the th contingency, let the power flows from the th bus to the th bus. Thus, the elements, and , the corresponding entries of the matrix , are defined as: for ,
(7) 
Based on (5) and (7), the capacity constraints for the transmission line in (III) can be expressed as:
(8) 
where represents the th element of . Note that is the phase angle vector generated based on (5) and the discussion below it, and it is computed from and . We can then calculate the penalty value for and add the average penalty value into the loss function for training. The penalty term capturing the feasibility of the generated solutions is defined as:
(9) 
Thus, for each item in the training data set, the loss function consists of two parts: the difference between the generated solution and the reference solution and the penalty upon solutions being infeasible. The total loss can be expressed as a weighted sum of the two parts:
(10) 
where and are positive weighting factors for balancing the influence of each term in the training phase.
IvE3 The training process
In general, the training processing can be regarded as minimizing the average value of loss function with the given training data by tuning the parameters of the DNN model as following:
(11) 
where and , represent the connection weight matrix and vector for layer . is the amount of training data and is the loss of the
th item in the training. We apply the widelyused optimization technique in the deep learning, stochastic gradient descent (SGD)
[2], in the training stage, which is effective for largescale dataset and can economize on the computational cost at every iteration by choosing a subset of summation functions at every step.IvF PostProcessing
After obtaining a solution including the generations and phase angles, we check its feasibility by examining whether the constraints on the generation limits and the line transmission limits are satisfied. We output the solution if it passes the feasibility test. Otherwise, we will solve the following quadratic programming problem,
(12) 
to project the infeasible solution to the constraint set and output the projected (and thus feasible) solution.
V Performance Analysis of DeepOPF
Va Approximation Error of the LoadtoGeneration Mapping
Given a power network, the SCDCOPF problem is a quadratic programming problem with linear constraints. We denote the mapping between the load input and the optimal generation as . Following the common practice in deeplearning analysis (e.g., [36, 37, 38]) and without loss of generality, we focus on the case of onedimension output in the following analysis, i.e., is a scalar.^{4}^{4}4A common practice to extend the results for mappings with onedimension output to mappings with multidimension outputs is to view the latter as multiple mappings each with onedimension output, apply the results for onedimension output multiple times, and combine them to get the one for multidimension output. Assumed the load input domain is compact, which usually holds in practice, has certain properties.
Lemma 1.
The function is piecewise linear. Moreover, it is Lipschitzcontinuous; that is, there exists a constant , such that for any in the domain of ,
Define as the mapping between and the generation obtained by DeepOPF by using a neural network with depth and maximum number of neurons per layer . Again we study the case of onedimension output. As is generated from a neural network with ReLU activation functions, it is also piecewise linear [39].
Before we proceed, we present a result on the approximation error between two scalar function classes, which can be of independent interest.
Lemma 2.
Let be the class of twosegment piecewise linear functions with a Lipschitz constant , over an interval (). Let be the class of all all linear scalar functions over . Then, the following holds,
(13) 
Essentially, the lemma gives a lower bound to the worstcase error of using a linear function to best approximate a twosegment piecewise linear function. By generalizing Lemma 2 to multiinput functions, we study the approximation error between and .
Theorem 3.
Let be the class of all possible with a Lipschitz constant . Let be the class of all , generated by a neural network with depth and maximum number of neurons per layer .
(14) 
where is the diameter of the load input domain .
The theorem characterizes a lower bound on the worstcase error of using neural networks to approximate loadtogeneration mappings in SCDCOPF problems. The bound is linear in , which captures the size of the load input domain, and , which captures the “curveness” of the mapping to learn. Meanwhile, interestingly, the approximation error bound decreases exponentially in the number of layers while polynomially in the number of neurons per layer. This suggests the benefits of using “deep” neural networks in mapping approximation, similar to the observations in [36, 37, 38].^{5}^{5}5While our observations are similar to those in [36, 37, 38], there is distinct difference in the results and the proof techniques as we explore the piecewise linear property of the function unique to our setting.
A useful corollary suggested by Theorem 3 is the following.
Corollary 4.
The following gives a condition on the neural network parameters, such that it is ever possible to approximate the most difficult loadtogeneration mapping with a Lipschitz constant , up to an error of .
(15) 
where is the diameter of the input domain .
VB Computational Complexity
The computational complexity of conventional approaches is related to the scale of the SCDCOPF problem. For example, the computational complexity of interior point method based approach for the convex quadratic programming is measured as the number of arithmetic operations [40], where is the number of input bits and is the number of variables. Plugging in the parameters of the SCDCOPF problem, this computational complexity turns out to be .
The computational complexity of DeepOPF mainly consists of two parts: the calculation as the input data passing through the DNN model and the postprocessing. For the postprocessing, its computational complexity may be negligible in practice as the DNN model barely generate infeasible solutions, as seen in Sec. VI. Thus, the computational complexity of DeepOPF is dominated by the calculation with respect to the DNN model. It can be evaluated by the method in [41].
Specifically, recall that the number of bus and the number of contingencies are and , respectively. The input and the output of the DNN model have and dimensions, and the DNN model has hidden layers and each hidden layer has at most neurons. Once we finish training the DNN model, the complexity of generating solutions by using DeepOPF is characterized in the following proposition.
Proposition 5.
The computational complexity (measured as the number of arithmetic operations) to generate the generations to the SCDCOPF problem by using DeepOPF is
(16) 
which is .
From empirical experience, we set to be on the same order of and set to be a small constant. Thus the complexity of our DeepOPF is ^{6}^{6}6This result also take into account the complexity of recovering phase angles, , and verifying the feasibility of solutions, . , significantly smaller than that of the interior point method. Our simulation results in the next section corroborate this observation.
VC Tradeoff between Accuracy and Complexity
The results in Theorem 3 and Proposition 5 suggest a tradeoff between accuracy and complexity. In particular, we can tune the number of hidden layers and the number of neurons per layer to trade between the approximation accuracy and computational complexity of the DNN approach. It appears desirable to design multilayer neural networks in DeepOPF as increasing may reduce the approximation error exponentially, but only increase the complexity linearly.
Case 









30  6  20  41  3  32/16/8  

57  7  42  80  3  128/64/32  

118  54  99  186  3  128/64/32  

300  69  199  411  3  256/128/64 

The number of load buses is calculated based on the default load on each bus. If the default load for active power on the bus does not equal to zero, the bus is considered as a load bus and vice versa.

The values for these parameters are not unique. Different combinations of the parameters may achieve similar performance.
Case 








DeepOPF  Ref.  DeepOPF  Ref.  

38  1176  100  494.7  494.6  0.03  0.70  40  

79  4567  100  35834.9  35832.4  0.01  0.77  87  

177  21058  100  109706.7  109656.7  0.05  2.08  600  

322  96969  100  615084.9  614477.1  0.5  15.0  5993 
Vi Numerical Experiments
Via Experiment Setup
ViA1 Simulation environment
The experiments are conducted in CentOS 7.6 on the quadcore (i73770@3.40G Hz) CPU workstation and 16GB RAM.
ViA2 Test case
ViA3 Training data
ViA4 The implementation of the DNN model
We design the DNN model based on Pytorch platform and apply the stochastic gradient descent method
[2] to train the neural network. In addition, the epoch is set to 200 and the batch size is 64. We set the weighting factors in the loss function in (10) to be , based on empirical experience. The remaining parameters are shown in Table I, including the number of hidden layers and the number of neurons on each layer of each test cases. We illustrate the detail architecture of our DNN model for the IEEE case30 in Fig. 3.ViA5 Evaluation Metrics
We will compare the performance of DeepOPF and a stateoftheart Gurobi solver using the following metrics, averaged over 10,000 instances. The first is the percentage of the feasible solution obtained by both approaches (for DeepOPF, we only count the feasible solutions before postprocessing). The second is the objective cost obtained by both approaches. The third is the running time, i.e., the average computation time for obtaining solutions for the instances. Then we compute speedup as the ratio between the running times of the Gurobi solver and DeepOPF.
ViB Performance Evaluation
The simulation results for test cases are shown in Table II and we have several observations. First, as compared to the Gurobi solver, our DeepOPF approach speeds up the computing time by up to three orders of magnitude. The speedup is increasingly significant as the test cases get larger, suggesting that our DeepOPF approach is more efficient for largescale power networks. Second, the percentage of the feasible solution obtained by DeepOPF is 100% before postprocessing, which implies that DeepOPF barely generates infeasible solution and can find feasible solutions through the mapping. Third, the cost difference between with the DeepOPF solution and the reference Gurobi solution is negligible, which means each dimension of generated solution has high accuracy when compared to that of the optimal solution.
To further understand the performance of DeepOPF, we plot the empirical cumulative distortions of the speedup and the optimality loss for the IEEE 118bus test case in Fig. 4 and Fig. 4, respectively. As seen, DeepOPF consistently achieves excellent optimalityloss and speedup performance for all the test instances. Overall, our results show that DeepOPF can generate solutions with minor optimality loss within a fraction of the time used by the Gurobi solver.
ViC The Benefit of Multilayer Structure
We also carry out comparative experiments to compare the optimality loss and speedup of DeepOPF with different number of neural network layers, for the IEEE case118.

DeepOPFV1: A simple network with one hidden layer, and the number of neurons is 16.

DeepOPFV2: A simple network with two hidden layer, and the number of neurons on each hidden layer are 32 and 16, respectively.

DeepOPFV3: A simple network with three hidden layers, and the number of neurons on each hidden layer are 64, 32 and 16, respectively.
The results are shown in Table III. In alignment with our theoretical analysis in Sec. VA, increasing the depth and the size of the neural network improves the optimalityloss performance, at the (minor) cost of longer computing time.
Variant 


Speedup  

DeepOPFV1  6.74 (0.2%)  1.98  302  
DeepOPFV2  3.44 (0.1%)  2.03  300  
DeepOPFV3  2.54 (0.1%)  2.06  290 
Vii Conclusion
We develop DeepOPF for solving the SCDCOPF problem. DeepOPF is inspired by the observation that solving SCDCOPF for a given power network is equivalent to learning a highdimensional mapping between the load inputs and the dispatch and transmission decisions. DeepOPF employs a DNN to learn such mapping. With the learned mapping, it first obtains the generations from the load inputs and then directly computes the phase angels from the generations and loads. We characterize the approximation capability and computational complexity of DeepOPF. Simulation results also show that DeepOPF scales well in the problem size and speeds up the computing time by up to 400x as compared to conventional approaches. Future directions include extending DeepOPF to the ACOPF setting and exploring joint learning based and optimization based algorithm design.
References

[1]
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in
Proceedings of the International Conference on Neural Information Processing Systems, vol. 1, Lake Tahoe, Nevada, USA, 2012, pp. 1097–1105.  [2] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning. MIT Press Cambridge, 2016, vol. 1.
 [3] P. Covington, J. Adams, and E. Sargin, “Deep Neural Networks for YouTube Recommendations,” in Proceedings of the ACM Conference on Recommender Systems, New York, NY, USA, Sep 2016, pp. 191–198.
 [4] F. Wan, L. Hong, A. Xiao, T. Jiang, and J. Zeng, “NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drugtarget interactions,” Bioinformatics, vol. 35, no. 1, pp. 104–111, Jul 2018.
 [5] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan 2016.

[6]
A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in
Proceeding of IEEE Conference on Computer Vision and Pattern Recognition
, Columbus, OH, USA, June 2014, pp. 1653–1660.  [7] J. Carpentier, “Contribution to the economic dispatch problem,” Bulletin de la Societe Francoise des Electriciens, vol. 3, no. 8, pp. 431–447, 1962.
 [8] D. E. Johnson, J. R. Johnson, J. L. Hilburn, and P. D. Scott, Electric Circuit Analysis. Prentice Hall Englewood Cliffs, 1989, vol. 3.
 [9] S. Frank, I. Steponavice, and S. Rebennack, “Optimal power flow: a bibliographic survey i,” Energy Systems, vol. 3, no. 3, pp. 221–258, Sep 2012.
 [10] ——, “Optimal power flow: a bibliographic survey ii,” Energy Systems, vol. 3, no. 3, pp. 259–289, Sep 2012.
 [11] M. B. Cain, R. P. O’neill, and A. Castillo, “History of optimal power flow and formulations,” Federal Energy Regulatory Commission, vol. 1, pp. 1–36, 2012.
 [12] F. Capitanescu, J. M. Ramos, P. Panciatici, D. Kirschen, A. M. Marcolini, L. Platbrood, and L. Wehenkel, “Stateoftheart, challenges, and future trends in security constrained optimal power flow,” Electric Power Systems Research, vol. 81, no. 8, pp. 1731 – 1741, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0378779611000885
 [13] N. Chiang and A. Grothey, “Solving security constrained optimal power flow problems by a structure exploiting interior point method,” Optimization and Engineering, vol. 16, no. 1, pp. 49–71, 2015.
 [14] K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
 [15] B. Karg and S. Lucia, “Efficient representation and approximation of model predictive control laws via deep learning,” arXiv preprint arXiv:1806.10644, 2018.
 [16] J. A. Momoh and J. Z. Zhu, “Improved interior point method for opf problems,” IEEE Transactions on Power Systems, vol. 14, no. 3, pp. 1114–1120, Aug 1999.
 [17] J. A. Momoh, “A generalized quadraticbased model for optimal power flow,” in Proceedings of IEEE International Conference on Systems, Man and Cybernetics, vol. 1, Cambridge, MA, USA, Nov 1989, pp. 261–271.
 [18] S. H. Low, “Convex relaxation of optimal power flow—part i: Formulations and equivalence,” IEEE Transactions on Control of Network Systems, vol. 1, no. 1, pp. 15–27, March 2014.
 [19] A. A. Sousa and G. L. Torres, “Globally convergent optimal power flow by trustregion interiorpoint methods,” in 2007 IEEE Lausanne Power Tech, Lausanne, Switzerland, Jul 2007, pp. 1386–1391.
 [20] P. E. O. Yumbla, J. M. Ramirez, and C. A. C. Coello, “Optimal power flow subject to security constraints solved with a particle swarm optimizer,” IEEE Transactions on Power Systems, vol. 23, no. 1, pp. 33–40, 2008.
 [21] V. J. GutierrezMartinez, C. A. Cañizares, C. R. FuerteEsquivel, A. PizanoMartinez, and X. Gu, “Neuralnetwork securityboundary constrained optimal power flow,” IEEE Transactions on Power Systems, vol. 26, no. 1, pp. 63–72, 2010.
 [22] F. Thams, L. Halilbasic, P. Pinson, S. Chatzivasileiadis, and R. Eriksson, “Datadriven securityconstrained opf,” in X Bulk Power Systems Dynamics and Control Symposium, 2017.
 [23] L. Halilbašić, F. Thams, A. Venzke, S. Chatzivasileiadis, and P. Pinson, “Datadriven securityconstrained acopf for operations and markets,” in 2018 Power Systems Computation Conference (PSCC). IEEE, 2018, pp. 1–7.
 [24] Y. Ng, S. Misra, L. A. Roald, and S. Backhaus, “Statistical Learning For DC Optimal Power Flow,” arXiv preprint arXiv:1801.07809, 2018.
 [25] D. Deka and S. Misra, “Learning for DCOPF: Classifying active sets using neural nets,” arXiv preprint arXiv:1902.05607, 2019.
 [26] K. Baker, “Learning WarmStart Points for AC Optimal Power Flow,” arXiv preprint arXiv:1905.08860, 2019.
 [27] X. Pan, T. Zhao, and M. Chen, “Deepopf: Deep neural network for dc optimal power flow,” arXiv preprint arXiv:04479, 2019.
 [28] R. D. Christie, B. F. Wollenberg, and I. Wangensteen, “Transmission management in the deregulated environment,” Proceedings of the IEEE, vol. 88, no. 2, pp. 170–195, Feb 2000.
 [29] X. Cheng and T. J. Overbye, “PTDFbased power system equivalents,” IEEE Transactions on Power Systems, vol. 20, no. 4, pp. 1868–1876, 2005.
 [30] V. H. Hinojosa and F. GonzalezLongatt, “Preventive SecurityConstrained DCOPF Formulation Using Power Transmission Distribution Factors and Line Outage Distribution Factors,” Energies, vol. 11, no. 6, 2018.
 [31] J. H. Park, Y. S. Kim, I. K. Eom, and K. Y. Lee, “Economic load dispatch for piecewise quadratic cost function using hopfield neural network,” IEEE Transactions on Power Systems, vol. 8, no. 3, pp. 1030–1038, Aug 1993.
 [32] L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2019. [Online]. Available: http://www.gurobi.com
 [33] P. J. MartínezLacañina, J. L. MartínezRamos, A. de la VillaJaén, and A. MaranoMarcolini, “Dc corrective optimal power flow based on generator and branch outages modelled as fictitious nodal injections,” IET Generation, Transmission & Distribution, vol. 8, no. 3, pp. 401–409, 2013.
 [34] S. Chatzivasileiadis, “Lecture Notes on Optimal Power Flow (OPF),” 2018. [Online]. Available: http://arxiv.org/abs/1811.00943
 [35] A. M. Kettner and M. Paolone, “On the properties of the power systems nodal admittance matrix,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 1130–1131, Jan 2018.
 [36] D. Yarotsky, “Error bounds for approximations with deep ReLU networks,” Neural Networks, vol. 94, pp. 103–114, 2017.
 [37] I. Safran and O. Shamir, “Depthwidth Tradeoffs in Approximating Natural Functions with Neural Networks,” in Proceedings of the 34th International Conference on Machine Learning  Volume 70, ser. ICML’17, 2017, pp. 2979–2987.
 [38] S. Liang and R. Srikant, “Why deep neural networks for function approximation?” arXiv preprint arXiv:1610.04161, 2016.
 [39] G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, “On the number of linear regions of deep neural networks,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2924–2932. [Online]. Available: http://papers.nips.cc/paper/5422onthenumberoflinearregionsofdeepneuralnetworks.pdf
 [40] Y. Ye and E. Tse, “An extension of karmarkar’s projective algorithm for convex quadratic programming,” Mathematical Programming, vol. 44, no. 1, pp. 157–179, May 1989.
 [41] K. He and J. Sun, “Convolutional neural networks at constrained time cost,” in Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 2015, pp. 5353–5360.
 [42] “Power Systems Test Case Archive,” 2018, http://labs.ece.uw.edu/pstca/.
 [43] C. H. Liang, C. Y. Chung, K. P. Wong, and X. Z. Duan, “Parallel Optimal Reactive Power Flow Based on Cooperative CoEvolutionary Differential Evolution and Power System Decomposition,” IEEE Transactions on Power Systems, vol. 22, no. 1, pp. 249–257, Feb 2007.
 [44] “IEEE case300 topology,” 2018, https://www.alroomi.org/powerflow/300bussystem.
 [45] H. Wang, C. E. MurilloSanchez, R. D. Zimmerman, and R. J. Thomas, “On computational issues of marketbased optimal power flow,” IEEE Transactions on Power Systems, vol. 22, no. 3, pp. 1185–1193, Aug 2007.
 [46] M. Telgarsky, “Benefits of depth in neural networks,” arXiv preprint arXiv:1602.04485, 2016.
Appendix A Proof of Lemma 1
Proof.
We now show the considered piecewise linear onedimension output function is Lipschitzcontinuous in the input domain , which can be partitioned into different convex polyhedral regions, . The mapping is piecewise linear and can be defined as follows:
where and . Then, we can have:
Thus, let . We have
Therefore, is Lipschitzcontinuous. ∎
Appendix B Proof of Lemma 2
Proof.
We can derive the lower bound to the worstcase based approximation error as follows. Suppose we want to find a function belongs to the linear scalar function class to approximate the function belongs to the twosegment piecewise linear function class with a Lipschitz constant , over an interval (). An illustration is shown in Fig. 5. Let , for . Let be the following:
(17) 
Then, we can obtain the lower bound for the based approximation error of and by the classification discussion on the intercept .

If . Under this case, we can get:

Otherwise . If , under this case we can have:
Otherwise , we can consider the point and obtain the same result.
Thus overall, we observe
For the worstcase based approximation error, we have
∎
Appendix C Proof of Theorem 3
Proof.
We can characterize the lower bound on the worstcase error of using neural networks to approximate loadtogeneration mappings in SCDCOPF problems as follows.
Suppose be the family of piecewise linear function class generated by a neural network with depth and maximum number of neurons per layer , on the load input domain with the diameter . The maximal number of segment any functions belongs to can have is defined as . Let be the class of all possible with a Lipschitz constant . Let comprises of linear segments with equal length be the following:
where . According to Lemma 2, on the interval , we can have:
Thus,
Meanwhile, we use the result in [46], of which the following is an immediate corollary:
Corollary 6.
The maximal number of linear segments generated from the family of ReLU neural networks with depth (the number of hidden layers) and maximal width (neurons on the hidden layer) is .
By the above corollary, we have . Consequently,
(18) 
∎
Appendix D Proof of Corollary 4
Proof.
We next will show how to derive the Corollary 4. Suppose is defined as the upper bound for the worstcase approximation error, that is:
(19) 
Then, we can derive the following inequality based on the above definition and Theorem 3:
(20) 
After some transformations, we can obtain the following necessary condition related to the DNN’s scale on the Corollary 4, which can guarantee that the designed DNN’s ever possible to approximate the most difficult loadtogeneration mapping with a Lipschitz constant , up to an error of :
(21) 
∎
Appendix E Proof of Proposition 5
Proof.
We next will show how to derive the computational complexity of using the DNN model to obtain the generation output from the given input. Recall that the input and the output of the DNN model in DeepOPF are and dimensions, respectively, and the DNN model has hidden layers and each hidden layer has , . The maximal neurons on the hidden layers is neurons. For each neuron in the DNN model, we can regard the computation complexity on each neuron (measured by basic arithmetic operation) as . As we apply the fullyconnected architecture, the output of each neuron is calculated by taking a weighted sum of the output from the neurons on the previous hidden layer and passing through a activation function.
Thus, the computational complexity (measured as the number of arithmetic operations) to generate the output from the input by a DNN model consists of the following three parts:

Complexity of computation from the input to the first hidden layer. As each neuron on the first hidden layer will take the input data, thus the corresponding complexity is .

Complexity of computation between the consecutive hidden layers. Since each neuron on the current hidden layer will take the output from each neuron on the previous hidden layer as the input data. Thus, thus the corresponding complexity is .

Complexity of computation from the last hidden layer to the output. As the output of each neuron on the last hidden layer is used to calculated the output, the corresponding complexity is .
Hence, the whole complexity of the calculation by a DNN model is:
∎
Comments
There are no comments yet.