has transformed various industries in human society, including artificial intelligence, health care, online advertising, transportation, and robotics. As the most widely-used and mature model in deep learning, Deep Neural Network (DNN) demonstrates superb performance in complex engineering tasks such as recommendation , bio-informatics , mastering difficult game like Go 
, and human pose estimation. The capability of approximating continuous mappings and the desirable scalability make DNN a favorable choice in the arsenal of solving large-scale optimization and decision problems in engineering systems. In this paper, we apply DNN to power systems for solving the essential security-constrained direct current optimal power flow (SC-DCOPF) problem in power system operation.
The OPF problem, first posed by Carpentier in 1962 in , is to minimize an objective function, such as the cost of power generation, subject to all physical, operational, and technical constraints, by optimizing the dispatch and transmission decisions. These constraints include Kirchhoff’s laws, operating limits of generators, voltage levels, and loading limits of transmission lines . The OPF problem is central to power system operations as it underpins various applications including economic dispatch, unit commitment, stability and reliability assessment, and demand response. While OPF with a full AC power flow formulation (AC-OPF) is most accurate, it is a non-convex problem and its complexity obscures practicability. Meanwhile, based on linearized power flows, DC-OPF is a convex problem admitting a wide variety of applications, including electricity market clearing and power transmission management. See e.g., [9, 10] for a survey.
The SC-DCOPF problem, a variant of DC-OPF, is critical for reliable power system operation against contingencies caused by equipment failure . It considers not only constraints under normal operation, but also additional steady-state security constraints for each possible contingency .111There are two types of SC-DCOPF problems, namely the preventive SC-DCOPF problem and the corrective SC-DCOPF problem. In the preventive SC-DCOPF problem, the system operating decisions cannot change once they are determined, thus they need to guarantee feasibility under both the pre- and post- contingency constraints. For the corrective SC-DCOPF problem, the system operator can have a short time (e.g., 5 minutes)  to adjust the operating points after the occurrence of each contingency. Our DeepOPF approach is applicable to both problems. We focus on the preventive SC-DCOPF problem in this paper for easy illustration. While SC-DCOPF is important for reliable power system operation, solving it incurs excessive computational complexity, limiting its applicability in large-scale power networks .
To this end, we propose a machine learning approach for solving the SC-DCOPF problem efficiently. Our approach is inspired by the following observations.
Given a power network, solving the SC-DCOPF problem is equivalent to depicting a high-dimensional mapping between load inputs and generations and voltages outputs.
In practice, the SC-DCOPF problem is usually solved repeatedly for the same power network, e.g., every 5 minutes, with different load inputs at different time epochs.
As such, it is conceivable to leverage the universal approximation capability of deep feed-forward neural networks[14, 15], to learn the input-to-output mapping for a given power network, and then apply the mapping to obtain operating decisions upon giving load inputs (e.g., once every 5 minutes).
Specifically, we develop DeepOPF as a DNN based solution for the SC-DCOPF problem. As compared to conventional approaches based on interior-point methods , DeepOPF excels in (i) reducing computing time and (ii) scaling well with the problem size. These salient features are particularly appealing for solving (large-scale) SC-DCOPF problems, which are central to secure power system operation with contingency in consideration. Note that the complexity of constructing and training a DNN model is minor if amortized over the many problem instances (e.g., one per every 5 minutes) that can be solved using the same model. In more detail, our contributions are summarized as follows.
First, after reviewing the SC-DCOPF problem in Sec. III, we describe DeepOPF as a DNN framework for solving the SC-DCOPF problem in Sec. IV. In DeepOPF, we first construct and train a DNN to learn the mapping between the load inputs and the generations. We then directly compute the phase angles from the generations and loads by using the (linearized) power flow equations. Such a two-step procedure significantly reduces the dimension of the mapping to learn, subsequently cutting down the size of our DNN and the amount of training data/time needed. We also design a post-processing procedure to ensure the feasibility of the final solution.
Then in Sec. V, we derive a condition suggesting that the approximation accuracy of the neural network in DeepOPF
decreases exponentially in the number of layers and polynomially in the number of neurons per layer. This allows us to systematically tune the size of the neural network inDeepOPF according to the pre-specified performance guarantee. We also derive the computational complexity of DeepOPF.
Finally, we carry out simulations and summarize the results in Sec. VI. Simulation results of IEEE test cases show that DeepOPF always generates feasible solutions with negligible optimality loss, while speeding up the computing time by up to 400x as compared to a state-of-the-art solver. The results also highlight a trade-off between the prediction accuracy and running time of DeepOPF.
Due to the space limitation, all proofs are in the supplementary material.
Ii Related Work
Existing studies on solving SC-OPF mainly focus on three lines of approches. The first is numerical iteration algorithm, where the SC-OPF problem is first approximated as an optimization problem e.g., quadratic programming 
, or linear programming, and the numerical iteration solvers like interior-point methods  are applied to obtain the optimal solutions. However, the time complexity of these numerical-iteration based algorithms can be substantial for large-scale power systems due to the excessive number of constraints in regard to different contingencies. See  for a survey on the numerical iteration algorithms for SC-OPF.
The second is heuristic algorithm based on computational intelligence techniques, including evolutionary programming like swarm optimization. For instance, a particle swarm optimization method with reconstruction operators was proposed in for solving the SC-OPF problem is proposed, where the reconstruction operators and an external penalty are adapted to handle the constraints and improve the quality of the final solution. However, there are two drawbacks of this kind of methods. First, there is no performance guarantee on neither the optimality nor feasibility. Second, the method may still incur high computational complexity.
The third is learning-based method. Existing studies focus on integrating the learning techniques (e.g., neural network, decision tree) into the conventional algorithm to facilitate the process of solving SC-OPF problems. For instance, applies a neural network to learn the system security boundaries as an explicit function to be used in the OPF formulation. In [22, 23], decision trees are used to derive tractable rules from large data sets of operating points, which can efficiently represent the feasible region and identify possible solutions. However, the proposed heuristic schemes are still iteration based and may still incur a significant amount of running time for large-scale instances.
Recently, there have been some works on learning the active constraints set so as to reduce the size of OPF problems to solve [24, 25]. Determining active constraints set, however, is highly non-trivial for SC-OPF problems. With incorrect active constraint sets, the approach may generate infeasible solution and is not clear how to derive a feasible solution at the end. In addition,  proposes neural-network/decision-tree based methods to directly obtain a solution for AC-OPF problems but these methods cannot guarantee the feasibility of the solutions.
Different from existing studies, our DeepOPF uses neural networks to learn the mapping between the load inputs and generation and voltage outputs, so as to directly obtains solutions for the SC-DCOPF problem with feasibility guarantee. As compared to our previous effort in , this paper studies the more challenging SC-DCOPF problem and, more importantly, characterizes a useful condition that allows us to design the neural network according to the pre-specified performance guarantee of the obtained solution.
Iii Security-Constrained DCOPF problem
We study the widely-studied SC-DCOPF problem, which considers contingencies due to outage of any single transmission line in the power system. The objective is to minimize the total generation cost subject to the generator operation limits, the power balance equation, and the transmission line capacity constraints under all contingencies . Assumed the power network remains connected upon contingency, the SC-DCOPF problem is formulated as follows 222We note that there is another formulation involving only generations because the phase angels can be uniquely determined by the generations and loads; see e.g., . As the complexity of solving either formulation is similar , we focus on the standard formulation.:
Here is the number of buses, is the number of generators, and is the number of contingency case ( denotes the case without any contingency). is the generator output, and are the generation limits at the -th bus, and is the load input.
is the phase angles vector under the-th contingency and is the reaactance of the transmission line between the -th bus and the -th bus. is the admittance matrix for the -th contingency, which is an matrix as follows:
In the above SC-DCOPF formulation, the first set of constraints describe the generation limits. The second set of constraints are the power flow equations with contingencies taken into account. The third set of constraints capture that line transmission capacity for both pre-contingency and post-contingency cases. In the objective, is the cost function for the generator at the -th bus, commonly modeled as a quadratic function :
where , , and are the model parameters and can be obtained from measured data of the heat rate curve .
While the SC-DCOPF problem is important for reliable power system operation and it is a convex (quadratic) problem with efficient solvers, solving it for large-scale power networks in practice still incurs excessive running time, limiting its practicability . In the next, we address this issue by proposing a neural network approach to solve the SC-DCOPF problem in a fraction of the time used by existing solvers.
Iv DeepOPF for Solving SC-DCOPF
Iv-a A Neural-Network Framework for OPF
We outline a general predict-and-reconstrct framework for solving OPF in Fig. 1. Specifically, we take the dependency induced by the equality constraints among the decision variables in the OPF formulation. Given the load inputs, the learning model (e.g., DNN) is then applied only to predict a set of independent variables. The remaining variables are then determined by leveraging the (power flow) equality constraints. This way, we not only reduce the number of variables to be predicted, but also ensure the equality constraints are satisfied, which is usually difficult in generic learning based approaches. In this paper, we materialize the general framework to develop DeepOPF for solving the SC-DCOPF problem and obtain strong theoretical and empirical results.
Iv-B Overview of DeepOPF
The framework of DeepOPF is shown in Fig. 2, which is divided into the training and inference stages. We first construct and train a DNN to learn the mapping between the load inputs and the generations. We then directly compute the voltages from the generations and loads by using the (linearized) power flow equations.
We discuss the process of constructing and training the DNN model in the following subsections. In particular, we discuss the preparation of the training in Sec. IV-C, the variable prediction and reconstruction in Sec. IV-D, and the design and training of DNN in Sec. IV-E.
In the inference stage, we directly apply DeepOPF to solve the SC-DCOPF problem with given load inputs. This is different from recent learning-based approaches for solving OPF where machine learning only help to facilitate existing solvers, e.g., by identifying the active constraints . We describe a post processing to ensure the feasibility of the obtained solutions in Sec. IV-F.
Iv-C Load Sampling and Pre-processing
We sample the loads within uniformly at random, where is the default power load at the -th bus and is the percentage of sample range, e.g., 10. It is then fed into the traditional quadratic programming solver  to generate the optimal solutions. Uniform sampling is applied to avoid the over-fitting issue which is common in generic DNN approaches.333 For load inputs of large dimension, uniform mechanism may not be sufficient to guarantee enough good samples, especially near the boundary. In those cases, Markov chain Monte Carlo (MCMC) methods can be applied to sample according to a preset probability distribution, so as to collect sufficient samples near the boundary of the sampling space and obtain a dense sample set around the significant elements of the load vector.
For load inputs of large dimension, uniform mechanism may not be sufficient to guarantee enough good samples, especially near the boundary. In those cases, Markov chain Monte Carlo (MCMC) methods can be applied to sample according to a preset probability distribution, so as to collect sufficient samples near the boundary of the sampling space and obtain a dense sample set around the significant elements of the load vector.After that, the training data is normalized (using the statistical mean and standard variation) to improve training efficiency.
Iv-D Generation Prediction and Phase Angle Reconstruction
We express as follows, for ,
where is a scaling factor. Instead of predicting the generations with diverse value ranges, we instead predict the scaling factor and recover . This simplifies the DNN output layer design to be discussed later. Note that generation of the slack bus is obtained by subtracting generations of other buses from the total load.
Once we obtain , we directly compute the phase angles by a useful property of the admittance matrices [33, 34]. We first obtain an matrix, by eliminating the row and column corresponding to the slack bus from the admittance matrix for each contingency , . It is well-understood that is a full-rank matrix , . Then we compute an -dimension phase angle vector as
where and stand for the -dimension output and load vectors for buses excluding the slack bus, respectively. At the end, we output the -dimension phase angle vector by inserting constant representing the phase angle for the slack bus into .
Again, there are two advantages of this approach. On one hand, we use the property of the admittance matrix to reduce the number of variables to predict by our neural network, cutting down the size of our DNN model and the amount of training data/time needed. On the other hand, the equality constraints involving the generations and the phase angles can be satisfied automatically, which is difficult to handle in alternative learning-based approaches.
Iv-E The DNN Model
The core of DeepOPF is the DNN model, which is applied to approximate the load-to-generation mapping, given a power network. The DNN model is established based on the multi-layer feed-forward neural network structure, which consists of a typical three-level network architecture: one input layer, several hidden layers, and one output layer. More specifically, the applied DNN model is defined as:
where denotes the input vector of the network, is the output vector of the -th hidden layer, is the output vector (of the output layer), and is the generated scaling factor vector for the generators. Matrices
, and activation functionsand are subject to design.
Iv-E1 The architecture
In the DNN model, represents the normalized load data, which is the inputs of the network. After that, features are learned from the input vector by several fully connected hidden layers. The -th hidden layer models the interactions between features by introducing a connection weight matrix and a bias vector . Activation function
further introduces non-linearity into the hidden layers. In our DNN model, we adopt the widely-used Rectified Linear Unit (ReLU) as the activation function of the hidden layers, which can be helpful for accelerating the convergence and alleviate the vanishing gradient problem
. In addition, the Sigmoid function, , is applied on the output layer to project the outputs of the network to .
Iv-E2 The loss function
After constructing the DNN model, we need to design the corresponding loss function to guide the training. Since there exists a linear correspondence betweenand , there is no need to introduce the loss term of the phase angles. The difference between the generated solution and the actual solution of is expressed by the sum of mean square error between each element in the generated scaling factors and the optimal scaling factors :
where represents the number of generators.
Meanwhile, we introduce a penalty term related to the inequality constraint into the loss function. We first introduce an matrix for each contingency, where is the number of adjacent buses. Each row in corresponds to an adjacent bus pair. Given any the adjacent bus pair under the -th contingency, let the power flows from the -th bus to the -th bus. Thus, the elements, and , the corresponding entries of the matrix , are defined as: for ,
where represents the -th element of . Note that is the phase angle vector generated based on (5) and the discussion below it, and it is computed from and . We can then calculate the penalty value for and add the average penalty value into the loss function for training. The penalty term capturing the feasibility of the generated solutions is defined as:
Thus, for each item in the training data set, the loss function consists of two parts: the difference between the generated solution and the reference solution and the penalty upon solutions being infeasible. The total loss can be expressed as a weighted sum of the two parts:
where and are positive weighting factors for balancing the influence of each term in the training phase.
Iv-E3 The training process
In general, the training processing can be regarded as minimizing the average value of loss function with the given training data by tuning the parameters of the DNN model as following:
where and , represent the connection weight matrix and vector for layer . is the amount of training data and is the loss of the
-th item in the training. We apply the widely-used optimization technique in the deep learning, stochastic gradient descent (SGD), in the training stage, which is effective for large-scale dataset and can economize on the computational cost at every iteration by choosing a subset of summation functions at every step.
After obtaining a solution including the generations and phase angles, we check its feasibility by examining whether the constraints on the generation limits and the line transmission limits are satisfied. We output the solution if it passes the feasibility test. Otherwise, we will solve the following quadratic programming problem,
to project the infeasible solution to the constraint set and output the projected (and thus feasible) solution.
V Performance Analysis of DeepOPF
V-a Approximation Error of the Load-to-Generation Mapping
Given a power network, the SC-DCOPF problem is a quadratic programming problem with linear constraints. We denote the mapping between the load input and the optimal generation as . Following the common practice in deep-learning analysis (e.g., [36, 37, 38]) and without loss of generality, we focus on the case of one-dimension output in the following analysis, i.e., is a scalar.444A common practice to extend the results for mappings with one-dimension output to mappings with multi-dimension outputs is to view the latter as multiple mappings each with one-dimension output, apply the results for one-dimension output multiple times, and combine them to get the one for multi-dimension output. Assumed the load input domain is compact, which usually holds in practice, has certain properties.
The function is piece-wise linear. Moreover, it is Lipschitz-continuous; that is, there exists a constant , such that for any in the domain of ,
Define as the mapping between and the generation obtained by DeepOPF by using a neural network with depth and maximum number of neurons per layer . Again we study the case of one-dimension output. As is generated from a neural network with ReLU activation functions, it is also piece-wise linear .
Before we proceed, we present a result on the approximation error between two scalar function classes, which can be of independent interest.
Let be the class of two-segment piece-wise linear functions with a Lipschitz constant , over an interval (). Let be the class of all all linear scalar functions over . Then, the following holds,
Essentially, the lemma gives a lower bound to the worst-case error of using a linear function to best approximate a two-segment piece-wise linear function. By generalizing Lemma 2 to multi-input functions, we study the approximation error between and .
Let be the class of all possible with a Lipschitz constant . Let be the class of all , generated by a neural network with depth and maximum number of neurons per layer .
where is the diameter of the load input domain .
The theorem characterizes a lower bound on the worst-case error of using neural networks to approximate load-to-generation mappings in SC-DCOPF problems. The bound is linear in , which captures the size of the load input domain, and , which captures the “curveness” of the mapping to learn. Meanwhile, interestingly, the approximation error bound decreases exponentially in the number of layers while polynomially in the number of neurons per layer. This suggests the benefits of using “deep” neural networks in mapping approximation, similar to the observations in [36, 37, 38].555While our observations are similar to those in [36, 37, 38], there is distinct difference in the results and the proof techniques as we explore the piece-wise linear property of the function unique to our setting.
A useful corollary suggested by Theorem 3 is the following.
The following gives a condition on the neural network parameters, such that it is ever possible to approximate the most difficult load-to-generation mapping with a Lipschitz constant , up to an error of .
where is the diameter of the input domain .
V-B Computational Complexity
The computational complexity of conventional approaches is related to the scale of the SC-DCOPF problem. For example, the computational complexity of interior point method based approach for the convex quadratic programming is measured as the number of arithmetic operations , where is the number of input bits and is the number of variables. Plugging in the parameters of the SC-DCOPF problem, this computational complexity turns out to be .
The computational complexity of DeepOPF mainly consists of two parts: the calculation as the input data passing through the DNN model and the post-processing. For the post-processing, its computational complexity may be negligible in practice as the DNN model barely generate infeasible solutions, as seen in Sec. VI. Thus, the computational complexity of DeepOPF is dominated by the calculation with respect to the DNN model. It can be evaluated by the method in .
Specifically, recall that the number of bus and the number of contingencies are and , respectively. The input and the output of the DNN model have and dimensions, and the DNN model has hidden layers and each hidden layer has at most neurons. Once we finish training the DNN model, the complexity of generating solutions by using DeepOPF is characterized in the following proposition.
The computational complexity (measured as the number of arithmetic operations) to generate the generations to the SC-DCOPF problem by using DeepOPF is
which is .
From empirical experience, we set to be on the same order of and set to be a small constant. Thus the complexity of our DeepOPF is 666This result also take into account the complexity of recovering phase angles, , and verifying the feasibility of solutions, . , significantly smaller than that of the interior point method. Our simulation results in the next section corroborate this observation.
V-C Trade-off between Accuracy and Complexity
The results in Theorem 3 and Proposition 5 suggest a trade-off between accuracy and complexity. In particular, we can tune the number of hidden layers and the number of neurons per layer to trade between the approximation accuracy and computational complexity of the DNN approach. It appears desirable to design multi-layer neural networks in DeepOPF as increasing may reduce the approximation error exponentially, but only increase the complexity linearly.
The number of load buses is calculated based on the default load on each bus. If the default load for active power on the bus does not equal to zero, the bus is considered as a load bus and vice versa.
The values for these parameters are not unique. Different combinations of the parameters may achieve similar performance.
Vi Numerical Experiments
Vi-a Experiment Setup
Vi-A1 Simulation environment
The experiments are conducted in CentOS 7.6 on the quad-core (email@example.comG Hz) CPU workstation and 16GB RAM.
Vi-A2 Test case
Vi-A3 Training data
Vi-A4 The implementation of the DNN model
We design the DNN model based on Pytorch platform and apply the stochastic gradient descent method to train the neural network. In addition, the epoch is set to 200 and the batch size is 64. We set the weighting factors in the loss function in (10) to be , based on empirical experience. The remaining parameters are shown in Table I, including the number of hidden layers and the number of neurons on each layer of each test cases. We illustrate the detail architecture of our DNN model for the IEEE case30 in Fig. 3.
Vi-A5 Evaluation Metrics
We will compare the performance of DeepOPF and a state-of-the-art Gurobi solver using the following metrics, averaged over 10,000 instances. The first is the percentage of the feasible solution obtained by both approaches (for DeepOPF, we only count the feasible solutions before post-processing). The second is the objective cost obtained by both approaches. The third is the running time, i.e., the average computation time for obtaining solutions for the instances. Then we compute speedup as the ratio between the running times of the Gurobi solver and DeepOPF.
Vi-B Performance Evaluation
The simulation results for test cases are shown in Table II and we have several observations. First, as compared to the Gurobi solver, our DeepOPF approach speeds up the computing time by up to three orders of magnitude. The speedup is increasingly significant as the test cases get larger, suggesting that our DeepOPF approach is more efficient for large-scale power networks. Second, the percentage of the feasible solution obtained by DeepOPF is 100% before post-processing, which implies that DeepOPF barely generates infeasible solution and can find feasible solutions through the mapping. Third, the cost difference between with the DeepOPF solution and the reference Gurobi solution is negligible, which means each dimension of generated solution has high accuracy when compared to that of the optimal solution.
To further understand the performance of DeepOPF, we plot the empirical cumulative distortions of the speedup and the optimality loss for the IEEE 118-bus test case in Fig. 4 and Fig. 4, respectively. As seen, DeepOPF consistently achieves excellent optimality-loss and speedup performance for all the test instances. Overall, our results show that DeepOPF can generate solutions with minor optimality loss within a fraction of the time used by the Gurobi solver.
Vi-C The Benefit of Multi-layer Structure
We also carry out comparative experiments to compare the optimality loss and speedup of DeepOPF with different number of neural network layers, for the IEEE case118.
DeepOPF-V1: A simple network with one hidden layer, and the number of neurons is 16.
DeepOPF-V2: A simple network with two hidden layer, and the number of neurons on each hidden layer are 32 and 16, respectively.
DeepOPF-V3: A simple network with three hidden layers, and the number of neurons on each hidden layer are 64, 32 and 16, respectively.
The results are shown in Table III. In alignment with our theoretical analysis in Sec. V-A, increasing the depth and the size of the neural network improves the optimality-loss performance, at the (minor) cost of longer computing time.
We develop DeepOPF for solving the SC-DCOPF problem. DeepOPF is inspired by the observation that solving SC-DCOPF for a given power network is equivalent to learning a high-dimensional mapping between the load inputs and the dispatch and transmission decisions. DeepOPF employs a DNN to learn such mapping. With the learned mapping, it first obtains the generations from the load inputs and then directly computes the phase angels from the generations and loads. We characterize the approximation capability and computational complexity of DeepOPF. Simulation results also show that DeepOPF scales well in the problem size and speeds up the computing time by up to 400x as compared to conventional approaches. Future directions include extending DeepOPF to the AC-OPF setting and exploring joint learning based and optimization based algorithm design.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inProceedings of the International Conference on Neural Information Processing Systems, vol. 1, Lake Tahoe, Nevada, USA, 2012, pp. 1097–1105.
-  I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, Deep Learning. MIT Press Cambridge, 2016, vol. 1.
-  P. Covington, J. Adams, and E. Sargin, “Deep Neural Networks for YouTube Recommendations,” in Proceedings of the ACM Conference on Recommender Systems, New York, NY, USA, Sep 2016, pp. 191–198.
-  F. Wan, L. Hong, A. Xiao, T. Jiang, and J. Zeng, “NeoDTI: neural integration of neighbor information from a heterogeneous network for discovering new drug-target interactions,” Bioinformatics, vol. 35, no. 1, pp. 104–111, Jul 2018.
-  D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, Jan 2016.
A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in
-  J. Carpentier, “Contribution to the economic dispatch problem,” Bulletin de la Societe Francoise des Electriciens, vol. 3, no. 8, pp. 431–447, 1962.
-  D. E. Johnson, J. R. Johnson, J. L. Hilburn, and P. D. Scott, Electric Circuit Analysis. Prentice Hall Englewood Cliffs, 1989, vol. 3.
-  S. Frank, I. Steponavice, and S. Rebennack, “Optimal power flow: a bibliographic survey i,” Energy Systems, vol. 3, no. 3, pp. 221–258, Sep 2012.
-  ——, “Optimal power flow: a bibliographic survey ii,” Energy Systems, vol. 3, no. 3, pp. 259–289, Sep 2012.
-  M. B. Cain, R. P. O’neill, and A. Castillo, “History of optimal power flow and formulations,” Federal Energy Regulatory Commission, vol. 1, pp. 1–36, 2012.
-  F. Capitanescu, J. M. Ramos, P. Panciatici, D. Kirschen, A. M. Marcolini, L. Platbrood, and L. Wehenkel, “State-of-the-art, challenges, and future trends in security constrained optimal power flow,” Electric Power Systems Research, vol. 81, no. 8, pp. 1731 – 1741, 2011. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0378779611000885
-  N. Chiang and A. Grothey, “Solving security constrained optimal power flow problems by a structure exploiting interior point method,” Optimization and Engineering, vol. 16, no. 1, pp. 49–71, 2015.
-  K. Hornik, “Approximation capabilities of multilayer feedforward networks,” Neural networks, vol. 4, no. 2, pp. 251–257, 1991.
-  B. Karg and S. Lucia, “Efficient representation and approximation of model predictive control laws via deep learning,” arXiv preprint arXiv:1806.10644, 2018.
-  J. A. Momoh and J. Z. Zhu, “Improved interior point method for opf problems,” IEEE Transactions on Power Systems, vol. 14, no. 3, pp. 1114–1120, Aug 1999.
-  J. A. Momoh, “A generalized quadratic-based model for optimal power flow,” in Proceedings of IEEE International Conference on Systems, Man and Cybernetics, vol. 1, Cambridge, MA, USA, Nov 1989, pp. 261–271.
-  S. H. Low, “Convex relaxation of optimal power flow—part i: Formulations and equivalence,” IEEE Transactions on Control of Network Systems, vol. 1, no. 1, pp. 15–27, March 2014.
-  A. A. Sousa and G. L. Torres, “Globally convergent optimal power flow by trust-region interior-point methods,” in 2007 IEEE Lausanne Power Tech, Lausanne, Switzerland, Jul 2007, pp. 1386–1391.
-  P. E. O. Yumbla, J. M. Ramirez, and C. A. C. Coello, “Optimal power flow subject to security constraints solved with a particle swarm optimizer,” IEEE Transactions on Power Systems, vol. 23, no. 1, pp. 33–40, 2008.
-  V. J. Gutierrez-Martinez, C. A. Cañizares, C. R. Fuerte-Esquivel, A. Pizano-Martinez, and X. Gu, “Neural-network security-boundary constrained optimal power flow,” IEEE Transactions on Power Systems, vol. 26, no. 1, pp. 63–72, 2010.
-  F. Thams, L. Halilbasic, P. Pinson, S. Chatzivasileiadis, and R. Eriksson, “Data-driven security-constrained opf,” in X Bulk Power Systems Dynamics and Control Symposium, 2017.
-  L. Halilbašić, F. Thams, A. Venzke, S. Chatzivasileiadis, and P. Pinson, “Data-driven security-constrained ac-opf for operations and markets,” in 2018 Power Systems Computation Conference (PSCC). IEEE, 2018, pp. 1–7.
-  Y. Ng, S. Misra, L. A. Roald, and S. Backhaus, “Statistical Learning For DC Optimal Power Flow,” arXiv preprint arXiv:1801.07809, 2018.
-  D. Deka and S. Misra, “Learning for DC-OPF: Classifying active sets using neural nets,” arXiv preprint arXiv:1902.05607, 2019.
-  K. Baker, “Learning Warm-Start Points for AC Optimal Power Flow,” arXiv preprint arXiv:1905.08860, 2019.
-  X. Pan, T. Zhao, and M. Chen, “Deepopf: Deep neural network for dc optimal power flow,” arXiv preprint arXiv:04479, 2019.
-  R. D. Christie, B. F. Wollenberg, and I. Wangensteen, “Transmission management in the deregulated environment,” Proceedings of the IEEE, vol. 88, no. 2, pp. 170–195, Feb 2000.
-  X. Cheng and T. J. Overbye, “PTDF-based power system equivalents,” IEEE Transactions on Power Systems, vol. 20, no. 4, pp. 1868–1876, 2005.
-  V. H. Hinojosa and F. Gonzalez-Longatt, “Preventive Security-Constrained DCOPF Formulation Using Power Transmission Distribution Factors and Line Outage Distribution Factors,” Energies, vol. 11, no. 6, 2018.
-  J. H. Park, Y. S. Kim, I. K. Eom, and K. Y. Lee, “Economic load dispatch for piecewise quadratic cost function using hopfield neural network,” IEEE Transactions on Power Systems, vol. 8, no. 3, pp. 1030–1038, Aug 1993.
-  L. Gurobi Optimization, “Gurobi optimizer reference manual,” 2019. [Online]. Available: http://www.gurobi.com
-  P. J. Martínez-Lacañina, J. L. Martínez-Ramos, A. de la Villa-Jaén, and A. Marano-Marcolini, “Dc corrective optimal power flow based on generator and branch outages modelled as fictitious nodal injections,” IET Generation, Transmission & Distribution, vol. 8, no. 3, pp. 401–409, 2013.
-  S. Chatzivasileiadis, “Lecture Notes on Optimal Power Flow (OPF),” 2018. [Online]. Available: http://arxiv.org/abs/1811.00943
-  A. M. Kettner and M. Paolone, “On the properties of the power systems nodal admittance matrix,” IEEE Transactions on Power Systems, vol. 33, no. 1, pp. 1130–1131, Jan 2018.
-  D. Yarotsky, “Error bounds for approximations with deep ReLU networks,” Neural Networks, vol. 94, pp. 103–114, 2017.
-  I. Safran and O. Shamir, “Depth-width Tradeoffs in Approximating Natural Functions with Neural Networks,” in Proceedings of the 34th International Conference on Machine Learning - Volume 70, ser. ICML’17, 2017, pp. 2979–2987.
-  S. Liang and R. Srikant, “Why deep neural networks for function approximation?” arXiv preprint arXiv:1610.04161, 2016.
-  G. F. Montufar, R. Pascanu, K. Cho, and Y. Bengio, “On the number of linear regions of deep neural networks,” in Advances in Neural Information Processing Systems 27, Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, Eds. Curran Associates, Inc., 2014, pp. 2924–2932. [Online]. Available: http://papers.nips.cc/paper/5422-on-the-number-of-linear-regions-of-deep-neural-networks.pdf
-  Y. Ye and E. Tse, “An extension of karmarkar’s projective algorithm for convex quadratic programming,” Mathematical Programming, vol. 44, no. 1, pp. 157–179, May 1989.
-  K. He and J. Sun, “Convolutional neural networks at constrained time cost,” in Proceeding of IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, June 2015, pp. 5353–5360.
-  “Power Systems Test Case Archive,” 2018, http://labs.ece.uw.edu/pstca/.
-  C. H. Liang, C. Y. Chung, K. P. Wong, and X. Z. Duan, “Parallel Optimal Reactive Power Flow Based on Cooperative Co-Evolutionary Differential Evolution and Power System Decomposition,” IEEE Transactions on Power Systems, vol. 22, no. 1, pp. 249–257, Feb 2007.
-  “IEEE case300 topology,” 2018, https://www.al-roomi.org/power-flow/300-bus-system.
-  H. Wang, C. E. Murillo-Sanchez, R. D. Zimmerman, and R. J. Thomas, “On computational issues of market-based optimal power flow,” IEEE Transactions on Power Systems, vol. 22, no. 3, pp. 1185–1193, Aug 2007.
-  M. Telgarsky, “Benefits of depth in neural networks,” arXiv preprint arXiv:1602.04485, 2016.
Appendix A Proof of Lemma 1
We now show the considered piece-wise linear one-dimension output function is Lipschitz-continuous in the input domain , which can be partitioned into different convex polyhedral regions, . The mapping is piece-wise linear and can be defined as follows:
where and . Then, we can have:
Thus, let . We have
Therefore, is Lipschitz-continuous. ∎
Appendix B Proof of Lemma 2
We can derive the lower bound to the worst-case -based approximation error as follows. Suppose we want to find a function belongs to the linear scalar function class to approximate the function belongs to the two-segment piece-wise linear function class with a Lipschitz constant , over an interval (). An illustration is shown in Fig. 5. Let , for . Let be the following:
Then, we can obtain the lower bound for the -based approximation error of and by the classification discussion on the intercept .
If . Under this case, we can get:
Otherwise . If , under this case we can have:
Otherwise , we can consider the point and obtain the same result.
Thus overall, we observe
For the worst-case -based approximation error, we have
Appendix C Proof of Theorem 3
We can characterize the lower bound on the worst-case error of using neural networks to approximate load-to-generation mappings in SC-DCOPF problems as follows.
Suppose be the family of piece-wise linear function class generated by a neural network with depth and maximum number of neurons per layer , on the load input domain with the diameter . The maximal number of segment any functions belongs to can have is defined as . Let be the class of all possible with a Lipschitz constant . Let comprises of linear segments with equal length be the following:
where . According to Lemma 2, on the interval , we can have:
Meanwhile, we use the result in , of which the following is an immediate corollary:
The maximal number of linear segments generated from the family of ReLU neural networks with depth (the number of hidden layers) and maximal width (neurons on the hidden layer) is .
By the above corollary, we have . Consequently,
Appendix D Proof of Corollary 4
We next will show how to derive the Corollary 4. Suppose is defined as the upper bound for the worst-case approximation error, that is:
Then, we can derive the following inequality based on the above definition and Theorem 3:
After some transformations, we can obtain the following necessary condition related to the DNN’s scale on the Corollary 4, which can guarantee that the designed DNN’s ever possible to approximate the most difficult load-to-generation mapping with a Lipschitz constant , up to an error of :
Appendix E Proof of Proposition 5
We next will show how to derive the computational complexity of using the DNN model to obtain the generation output from the given input. Recall that the input and the output of the DNN model in DeepOPF are and dimensions, respectively, and the DNN model has hidden layers and each hidden layer has , . The maximal neurons on the hidden layers is neurons. For each neuron in the DNN model, we can regard the computation complexity on each neuron (measured by basic arithmetic operation) as . As we apply the fully-connected architecture, the output of each neuron is calculated by taking a weighted sum of the output from the neurons on the previous hidden layer and passing through a activation function.
Thus, the computational complexity (measured as the number of arithmetic operations) to generate the output from the input by a DNN model consists of the following three parts:
Complexity of computation from the input to the first hidden layer. As each neuron on the first hidden layer will take the input data, thus the corresponding complexity is .
Complexity of computation between the consecutive hidden layers. Since each neuron on the current hidden layer will take the output from each neuron on the previous hidden layer as the input data. Thus, thus the corresponding complexity is .
Complexity of computation from the last hidden layer to the output. As the output of each neuron on the last hidden layer is used to calculated the output, the corresponding complexity is .
Hence, the whole complexity of the calculation by a DNN model is: