I Introduction
In power flow studies, the linearized dc power flow (DCPF) model offers compelling computational advantages over the nonlinear ac power flow (ACPF) model based on the NewtonRaphson (NR) method, which requires iterative solutions. However, the DCPF model results become more inaccurate in the cases where its assumptions no longer hold true, e.g. with high R/X ratios, large phase angles and heavy or light loads present [31]. Additionally, it can be difficult to find the initial conditions for NR based ACPF to converge [8]. This paper presents a framework that produces initial conditions that reduce the ACPF iterations and solution times, and is generalizable to grid topologies of different sizes. We use feedforward artificial neural networks, specifically, one dimensional convolutional neural networks (1D CNNs) to achieve these goals.
Feedforward neural networks with nonlinear activation functions can be used to approximate any continuous functions [34]
. CNNs, in particular, can capture the local features of interest more effectively than other feedforward neural networks such as Multilayer Perceptrons (MLPs), as proven in many computer vision applications
[30]. In the context of this paper, an example of such a local feature is the small voltage angle difference between neighboring buses. Additionally, we choose the 1D CNNs since the signals in the buses we are interested in (real and reactive power, voltage magnitude and phase) can be represented as vectors.
For 1D CNNs trained on DCPF results as input data and corresponding ACPF results as ground truth values, our goal is to produce bus voltage values that, when used as initial conditions to run NR ACPF, result in lower solution iterations and time compared to coldstart (also known as “flat start”) conditions, i.e., for all load (PQ) bus voltages [12], or warmstart conditions such as the ones generated by DCPF or past solutions.
Our proposed method considers only the fluctuations of PQ bus demands, i.e., we vary the real and reactive power demand levels at each load bus and solve for the voltage magnitude and phase at each bus, for a specific set of load bus demands. Fluctuations in the generator (PV) buses, e.g. real power injection variations from wind generation or changes in bus voltage magnitudes are not included in our data, but can be incorporated relatively easily in future studies. In Section II, we review some related studies and provide a highlevel description of the proposed model. We then present the data generation process, CNN training and the hotstart ACPF procedure in detail in Sections III and IV. Finally, in Section V, we present some results based on the IEEE 118bus and the Pegase 2869bus systems [17, 10] available from Matpower [35]. We conclude by giving a short summary and pointing out some limitations of our proposed method, as well as potential directions for future studies on this topic.
Ii Background and Proposed Method
Iia Related Work
There have been attempts at either improving the DCPF results or directly predicting ACPF results using artificial neural networks, specifically MLPs, in the past [9, 1, 2]
. However, they suffer from one or more of the following deficiencies: insufficient dataset size, poorly justified MLP input feature selection which could potentially lead to numerical instability during training, arbitrary and/or unclear performance criteria, and small system sizes where full ACPF can be easily and efficiently computed. Aside from these, there have also been studies on solving or reducing the optimal power flow problem using artificial neural networks
[25, 29, 26].Compared to the MLPs that we trained on the same dataset (formatted differently from as shown in Section IIIC, for implementation purposes), 1D CNNs are capable of producing results with (see Equation 3) that are almost 10 times smaller than those produced by MLPs. Generally speaking, although 1D CNNs take longer to train, they often have fewer model parameters, thus, lower memory requirement than MLPs to achieve the same or better results. This fact can be a major advantage for extremely large systems.
IiB Proposed Method
Our proposed method is as follows. Suppose that for a specific system with buses, we have different load conditions which can be solved by warmstart ACPF. We need to find the respective load bus voltage magnitude and phase values that meet the mismatch tolerance for these load conditions. Let represent this set of load conditions. First, we take a subset of , denoted for “warmstart,” and let the remaining be for “hotstart.” Let , i.e., the number of load conditions in . Next, we compute the DCPF results for all load conditions in , and compute ACPF results with DCPF results as initial conditions (i.e., warmstart) for all load conditions in . We then use the DCPF and ACPF results corresponding to the load conditions in , as input data and output targets to train the 1D CNNs. Finally, for load conditions in , for which we only have the DCPF results, we produce the hotstart conditions for them by passing their DCPF results and corresponding load conditions into the trained 1D CNNs, and compute the ACPF results for these load conditions with the hotstart conditions. This process is shown in Figure 1. Once the 1D CNNs are trained, any new load conditions from the same system but are not in , would follow the path that the load conditions in take in Figure 1 — first compute the DCPF results, then use the trained 1D CNNs to generate the initial conditions to compute the ACPF results.
The reason to treat the CNNpredicted values as hotstart conditions for ACPF instead of as finished products is so that we have a fair comparison between our method and the ACPF model with warmstart conditions, since both, if they converge successfully, will have a final mismatch within the same tolerance level. Clearly, since ACPF is computationally expensive, and neural network training with more data takes more time (empirically, quasilinearly on a single GPU), we would like to find the minimum that provides reasonably good hotstart conditions for load conditions in on the chosen CNN architecture. We also point out that in this study, we are investigating the feasibility of this approach by evaluating its effectiveness on the IEEE 118bus and, more realistically, the larger Pegase 2869bus systems; we are not focused on devising the best possible CNN architecture, which will be discussed in Section IV.
Iii Data Generation
Iiia Load Fluctuation Modeling
As discussed in Section IIB, the first step to create our dataset is to generate load demand fluctuations for all PQ buses in the system. For the th PQ bus in a Matpower case, we extract the default real power demand value,
as mean real power demand, and compute the corresponding standard deviation
as follows (from [7]):(1) 
Next, we generate
samples from the Gaussian distribution
to create the real power demand fluctuations for the th bus if it is a PQ bus; otherwise we keep the default value, i.e., for . We generate the demand realizations for each bus independently and with a fixed random seed for reproducibility. We can now represent our real power demand fluctuations for all buses in a given system as a matrix:To solve the power flow problem, the known values for each load bus are real power P and reactive power Q. Therefore, we now need to generate the Q matrix to similarly represent the fluctuations in reactive power. First, we generate p.f., a vector containing samples of lagging power factor, with a Gaussian distribution truncated between . The choices of and are based on the distributions of power factor values for all PQ buses in the Matpower cases. We choose the truncation lower bound of 0.7 because a utility would step in and fix the power factors lower than that (e.g., by penalizing businesses to discourage low power factors) to avoid loss [22, 27]. We then calculate each entry of the Q matrix as follows:
(2) 
Similar to the matrix, we keep the default value for , if the th bus is not a PQ bus.
IiiB Power Flow Computation
As discussed in Section II, we first run DCPFs based on the P, Q values in and warmstart ACPF for , then use the DCPF and ACPF results corresponding to P, Q values in to train a CNN. For benchmarking purposes, we also run warmstart ACPF for P, Q values in and prensent the warmstart performance in Section V. Once a model is trained, we can then follow Section IIB and only run hotstart ACPF for the load levels in or any new load conditions.
We use Matpower’s rundcpf and runpf functions to perform the dc and ac power flow computations with the mismatch tolerance for runpf set at per unit. For the th execution of rundcpf and runpf, we replace the default , values of the th bus with and from the th column of and Q. We do not change any other values. We then collect the solved voltage magnitude and phase values — (which will always be 1.0) and (in radians) from the DCPF results, along with and (in radians) from the ACPF results. The reason to extract voltage phase values in radians is that large negative phase values in degrees will easily cause exploding loss when passed through the ELU activation function (Equation 4). We collect the DCPF solution time, as a vector of length to calculate the average hotstart ACPF time as described in Section V. We also collect the ACPF solution time and iterations corresponding to the load levels in , , , as vectors with length , which are used to compare with the hotstart ACPF performances.
Finally, since , are in radians, we would like to ensure all the input data are in a similar range, so that the model parameter updates are input unit agnostic. Therefore, we perform the following data processing steps. We subtract 1.0, the nominal voltage from all and , so that they have near 0 mean. We do not normalize the input and target voltage magnitude and phase values to be within a certain range (e.g., in computer vision applications, we could normalize the input data values to be in ) since the lower and upper bound of the ground truth, which are required for the normalization are not known a priori. We also compute the and matrices with entries and in per unit, respectively. We construct the following matrices with the same dimensions as and Q, for , and :
Note that in Section IIB, we assumed none of the load conditions in causes the nonconvergence of ACPF. Therefore, contain the same number of samples, , as . However, this assumption is not realistic, since not all load conditions are guaranteed to converge with warmstart conditions. Therefore, if the th ACPF execution fails to converge and it is a load condition in , we add it to a set that contains all load conditions that fail to converge. We stop our data generation process once samples successfully converges. If , we also note the successful ACPF convergence rate.
For the bus voltage magnitudes and phase values generated by the 1D CNNs, we shift the voltage magnitude values back up by 1.0 and convert the phase values back to be in degrees as required by Matpower, before performing the ACPF computations with these as the initial conditions.
IiiC Dataset Format
We now form the dataset for training. The input to the 1D CNNs, , containing the offset DCPF bus voltage values, and
is a tensor with dimension
and the format shown in Figure 2. Specifically, the th sample in , , is shown in Figure 3.We train two 1D CNNs with the identical architecture (shown in Figure 4), both with as the input — one for producing hotstart bus voltage magnitudes (V model) and the other for bus voltage phases ( model). This approach is to ensure that the 1D CNN model parameter updates are computed based on the loss with respect to distinct targets, and , even though we offset to have near 0 mean.
Additionally, we reshape
to add a “width” dimension of length 1, in order to fit the Width – Height – Channel – Samples format, which the machine learning library we used, Flux
[15], requires.Iv CNN Training
Iva Loss Function and Performance Criteria
The loss function
we use to train our model is the squared norm of the difference between predicted values and ground truth: , where is the prediction and is the ground truth for the th element. This loss is commonly used for optimization in regression problems, and we found that it outperforms the mean square error, another popular loss function in regression problems, in terms of defined below. We do not have an accuracy measure since there is no robust and generalizable way of establishing such a criteria in our problem. The downside of using relative measurements (e.g. mean absolute relative error), in particular in this problem, is that the error would explode when the denominator is close or equal to 0, which is not uncommon in the voltage angle or offset voltage magnitude values. Instead, we compare the initial and final norms on the test set to see how much the norm decreased at the end of training by in Equation 3, where is the norm between dc and ac power flow results for , and is the norm between the predicted hotstart conditions and true ac power flow results for (both and are the average of voltage magnitude and phase results), which we computed for benchmarking purposes but will not be available in practice.(3) 
The effectiveness of the 1D CNNs, however, is best demonstrated by the performance of computing ACPF results for , i.e., by how much the 1D CNN produced bus voltage values can decrease the ACPF iterations and solution time (as discussed in Section V).
IvB CNN Model Architecture and Hyperparameter Selection
We determine the 1D CNN model architecture and select the hyperparameters in the following way. (To see the meaning of the terms used here, please refer to the Appendix for a brief overview of CNNs.) For the IEEE 118bus system, we first use a training set with , and a set of hyperparameters commonly seen in machine learning applications [23, 13]: initial learning rate
, batch size of 64, and maximum epochs of 500. With these hyperparameters selected, we train multiple 1D CNNs with different combinations of number of convolutional layers, kernel sizes and number of channels. We then make a decision on these hyperparameters based on the validation set losses of each candidate architecture. We repeat this process for the
Pegase 2869bus system. Once we arrive at a satisfactory final validation loss, we go back to test our initial hyperparameter choices, similarly by the validation loss. In our case studies, we keep the initial learning rate and maximum epoch the same as initially chosen, and only change the batch size from 64 to 32.We will now discuss our chosen 1D CNN architectures in detail. Figure 4 shows the Height (which equals to the number of buses, ) and Channel dimensions in each convolutional layer, and a single sample (as in Figure 3) as the input to each 1D CNN model. Each of the outputs in Figure 4 is a vector containing the predicted voltage or phase values. Thus, the output would be a matrix when we feed more than one input samples into the 1D CNN. For the Pegase
2869bus system, we use a deeper architecture with 5 convolutional layers and a final fully connected layer. The first convolutional layer has kernel size 7 with channel size 8, zero padding size 3 and stride 1, and identical remaining convolutional layers with kernel size 3, channel size 8, zero padding size 1 and stride 1. The CNN model for the smaller IEEE 118bus system has three identical convolutional layers with kernel size 3, zero padding size 1 and stride 1, i.e., the same as the second to the fifth convolutional layers of the larger model. The fully connected layer for both architectures first reshapes the data to a vector of length
(for 1 sample), then produces the final bus voltage magnitude and phase values. Empirically, adding more than 8 channels to the convolutional layers result in worse of the validation set. We apply the zero paddings and stride 1 to all convolutional layers, and omit pooling layers since we want to keep the hidden layer dimension the same as the input feature vector dimension (, i.e., number of buses), throughout the architecture. The reason for this choice is that CNNs for the systems with odd number of buses, after convolution, pooling and upsampling operations with strides larger than 1, will produce predictions with one extra or one fewer value, i.e., an extra or a missing bus. This is not to say that such architectures will never work in our case, since we can, for example, apply asymmetric padding to solve this offby1 caveat. However, operations such as pooling and upsampling, even if they improve the final predictions, will make the justification of our method more difficult. In particular, the pooling operation is lossy, since it downsamples the input data into a lowdimensional representation.
Since the number of positive and negative values in our datasets are roughly equal, the nonlinear activation function
we use has to account for both. We compared the performance and rate of convergence of multiple popular activation functions — Rectified Linear Unit (ReLU)
[24], LeakyReLU [21], and Exponential Linear Unit (ELU) [5], and chose ELU with the default parameter , which has the piecewise definition and derivative in Equations 4 and 5, respectively.(4) 
(5) 
With the CNN architecture fixed for a particular system, the most important hyperparameter left in this problem is the size of , i.e., we want to find the optimal tradeoff point between training set size (with which the training time scales quasilinearly, as Tables I and II in the next section shows) and the quality of the trained model (i.e., how much its prediction can decrease solution time and iterations). Since the training and validation targets are the ACPF results, a smaller value means a smaller number of warmstart ACPF we have to run in data generation process, as mentioned in Section III.
Since we also computed the ACPF results for for benchmarking purposes, we have the ground truth values for all samples. Thus, we can use the true ACPF results of the load conditions in as the test set (in practice, we would not have these data since we do not compute warmstart ACPF for all samples). We separate into training and validation sets, with a 90/10 split, i.e., the training set will have number of samples and the validation set will have number of samples. Since the dataset is generated with Gaussian distributions instead of gathered from real systems, the training and validation loss values throughout the training are extremely close, and as a result, we do not need a large validation set to adjust model hyperparameters, or use methods such as crossvalidation during training.
Since learning hotstart conditions is a rather uncommon application of CNNs, we cannot take advantage of previously trained models and use transfer learning
[28, 33] to accelerate training. Therefore, we initialize the CNN parameters with the Xavier Initialization [11], and train the models from scratch with the Adam optimizer [18]. During training, we randomly shuffle the batch indices with a fixed random seed before each epoch starts. We also use the following learning rate decay policy [19]: if training set loss does not decrease for 5 consecutive epochs, we decrease the learning rate by a factor of 10 (until ) to encourage the model to jump out of local minimums. Training is terminated if the elapsed epochs reach maximum of 500 or of the validation set becomes less than 0.01%.V Case Study
The dataset generation and power flow computations are done with MATLAB and Matpower on a local computer with Intel Core i78750H CPU and 32 GB RAM. The 1D CNN training is done on a Compute Canada cluster [14] using a single Intel Xeon Gold 5120 CPU and a single NVIDIA Tesla V100 GPU with 16 GB of GPU memory. We use Flux [15]
, an opensource machine learning library developed in the Julia language
[3], for the 1D CNN implementation, training and inference.The following tables show the average solution time and average iteration count for samples, all of which successfully converged. The first row contains the warmstart performance, where , i.e., the average time for warmstart ACPF with DCPF solutions as initial conditions. The rows below contain the hotstart performances as , the number of samples used in CNN training, is varied. The hotstart average time is calculated by , where is the average DCPF solution time, is the average hotstart ACPF solution time, and is the average inference time (which is negligible compared to or ). We report the hotstart results with 1000sample increments for . We start at for the Pegase 2869bus case, since using a smaller produced initial conditions causing nonconvergence for some load conditions. The is calculated as described by Equation 3. In particular, for the warmstart results since .
We also include the training times in the tables. These are extremely large compared to , but they can be amortized over the usable time of the model since it is a onetime only cost. Additionally, training can be performed in parallel on clusters with multiple GPUs, which can greatly reduce the time needed. For example, prior work [4, 16, 32] has shown that the training throughput (number of samples processed per second) increases quasilinearly with the number of GPUs. The main reason they are included here is to show the tradeoff between the training time and the quality of output initial conditions.
Va IEEE 118bus Results
From Table I, we can see that results in a good balance of (i.e., the number of warmstart executions required to train the 1D CNN) and the quality of hotstart conditions. Even though the results have both lower final and solution iterations than results, it has a higher and longer training time. Compared to the warmstart results, the proposed method provides a reduction in solution time, and a reduction in the average solution iterations required, with the chosen .
(ms) 
Avg.
Iter. 
Training
Time (s) 

Warm Start  1.57971  3.000  100%  N/A  
Hot Start  T=1000  1.22120  1.391  0.2185%  120.96713 
T=2000  1.06184  1.022  0.0929%  222.45857  
T=3000  1.04953  1.006  0.05937%  326.47676  
T=4000  1.05426  1.006  0.06306%  437.17291  
T=5000  1.07008  1.000  0.03435%  544.79339 
VB Pegase 2869bus Results
From Table II, is a reasonably good choice for a larger study system, considering the rate of increase of the training time as grows, and the diminishing gain in the quality of hotstart conditions (between and , the improvements in solution time and iterations are and , respectively, but the training time increased by hours, or ). By choosing , and the average ACPF iterations required are decreased by and compared to the warmstart results, respectively.
(ms) 
Avg.
Iter. 
Training
Time (s) 

Warm Start  29.71916  4.000  100%  N/A  
Hot Start  T=3000  27.99254  3.146  0.2452%  4098.02066 
T=4000  27.11124  3.003  0.1688%  5450.61909  
T=5000  24.46336  2.896  0.1395%  6475.0688  
T=6000  22.86895  2.356  0.1038%  8952.58763  
T=7000  23.40030  2.435  0.09801%  14539.66965  
T=8000  20.78429  2.019  0.07374%  19398.17254  
T=9000  20.73071  2.000  0.05416%  24179.44558 
Vi Conclusion and Future Research Directions
In this paper, we propose a generalizable framework to obtain better initial conditions for NewtonRaphson based ACPF using 1D CNNs. The performance of the proposed method on the IEEE 118bus and Pegase 2869bus systems show that it is capable of effectively decreasing both solution time and solution iterations.
Although our proposed method is shown to be generalizable on both large and small systems, we acknowledge one limitation: systems with different topologies (specifically, different number of buses and/or connectivities) would require training different 1D CNNs to generate systemspecific hotstart conditions. Consequently, we need to address the problem of long training time associated with this limitation. As we discussed in Sections IVB and V, the CNN training time can be amortized as it is a onetime cost for each system, and it can be further reduced by applying transfer learning, and by training in parallel on multiple GPUs.
A potential research direction that builds on our proposed method
is to incorporate the effect of ACPF solution times and/or iterations directly into the 1D CNN training stage, e.g., through a similar metaoptimization step as discussed in [29]. We can also include different contingency scenarios in the dataset generation step, and train 1D CNNs to produce initial conditions to perform contingency analysis more efficiently compared to the DCPF model.
Finally, we could also try out different CNN architectures, e.g., deeper architectures such as the fully convolutioonal ResNet in [20], although they will require significantly longer training time and a much larger and diverse dataset to be properly trained.
CNN Overview
We give a brief overview of CNNs and the terms used in previous sections based mostly on [13]. The most important building block of a CNN is the convolutional layer, which contains the “filter” or “kernel” and the output called the “feature map.” These are controlled by adjustable hyperparameters such as kernel size (height, width, channels), stride, and zero padding size. In our 1D CNN, kernel size refers to the dimension of the kernel along the height dimension. The kernel contains the parameters of the CNN to be updated based on the loss computed by a loss function. The kernel values can be initialized randomly, or via a scheme such as the Xavier Initializtion [11]. The operation of the kernel on the input is the 1D convolution commonly seen in many areas of engineering. The “stride” refers to the step size of the moving kernel, which in our case is 1. Our chosen architecture ends with a fully connected layer, which produces a desired number of values each resembling the target values during training, based on the outputs of the last convolutional layer. The parameters in this fully connected layer are the edge weights in a complete bipartite graph between the set of reshaped output of the last convolutional layer and the set of final outputs.
The number of “channels” originates from the RGB channels of images. In our case, the input channel of size 4 represents the four signals in each bus that we consider (real and reactive power, voltage magnitude and phase). The kernel channel sizes have no explicit meaning, but can be roughly thought as filters that extract different hidden features in the data.
Controlling the subsequent feature maps dimensions, as we mentioned before, is helpful since we can avoid upsampling layers by keeping the input and output dimension the same. Figure 6 demonstrates a simple example of a single convolutional layer, and only the first channel of each component is shown. To visualize the convolution operation on all four channels, refer to the demonstration in [6] in the context of the dataset format described in Section IIIC.
Finally, training of CNNs are now almost always done on hardware accelerators designed for large parallel computations, such as GPUs and TPUs. Training samples are usually fed into the CNN in minibatches with size between 1 and , and the errors at the output are backpropagated into the CNN for parameter updates. Separating training samples into minibatches is preferred since training with a single sample at a time results in more frequent, thus computationally more expensive (and less accurate) parameter updates. On the other hand, training by feeding in the full dataset can require large amounts of memory to store all training samples, and can be prone to erroneous early convergence due to local minima. During training, “epochs” represent the number of times that the CNN processes the entire training set, which typically range from a few dozen to a few hundred.
References
 [1] (201712) Artificial neural network based load flow solution of saudi national grid. In 2017 Saudi Arabia Smart Grid (SASG), Vol. , pp. 1–7. External Links: Document, ISSN null Cited by: §IIA.
 [2] (201210) Load flow estimaton in electrical systems using artificial neural networks. In 2012 International Conference and Exposition on Electrical and Power Engineering, Vol. , pp. 276–279. External Links: Document, ISSN null Cited by: §IIA.
 [3] (2017) Julia: a fresh approach to numerical computing. SIAM Review 59 (1), pp. 65–98. External Links: Document Cited by: §V.
 [4] (2017) PowerAI DDL. CoRR abs/1708.02188. External Links: Link, 1708.02188 Cited by: §V.
 [5] (2016) Fast and accurate deep network learning by exponential linear units (elus). See DBLP:conf/iclr/2016, External Links: Link Cited by: §IVB.
 [6] (2019) CS231n: convolutional neural networks for visual recognition. GitHub. Note: https://github.com/cs231n/cs231n.github.io Cited by: §VI.
 [7] (201905) A datadriven load fluctuation model for multiregion power systems. IEEE Transactions on Power Systems 34 (3), pp. 2152–2159. External Links: Document, ISSN 15580679 Cited by: §IIIA.
 [8] (201311) Convergence region of newton iterative power flow method: numerical studies. Journal of Applied Mathematics 2013, pp. . External Links: Document Cited by: §I.
 [9] (201804) Power flow analysis by numerical techniques and artificial neural networks. In 2018 Renewable Energies, Power Systems Green Inclusive Economy (REPSGIE), Vol. , pp. 1–5. External Links: Document, ISSN null Cited by: §IIA.
 [10] (201311) Contingency ranking with respect to overloads in very large power systems taking into account uncertainty, preventive, and corrective actions. IEEE Transactions on Power Systems 28 (4), pp. 4909–4917. External Links: Document, ISSN 15580679 Cited by: §I.

[11]
(201013–15 May)
Understanding the difficulty of training deep feedforward neural networks.
In
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics
, Y. W. Teh and M. Titterington (Eds.), Proceedings of Machine Learning Research, Vol. 9, Chia Laguna Resort, Sardinia, Italy, pp. 249–256. External Links: Link Cited by: §IVB, §VI.  [12] POWER SYSTEM ANALYSIS AND DESIGN. Global Engineering. Cited by: §I.
 [13] (2017) Deep learning. The MIT Press. Cited by: §IVB, §VI.
 [14] Graham. Compute Canada Documentation. External Links: Link Cited by: §V.
 [15] (2018) Flux: elegant machine learning with julia. Journal of Open Source Software. External Links: Document Cited by: §IIIC, §V.
 [16] (2018) Exploring hidden dimensions in parallelizing convolutional neural networks. CoRR abs/1802.04924. External Links: Link, 1802.04924 Cited by: §V.
 [17] (201603) AC Power Flow Data in MATPOWER and QCQP Format: iTesla, RTE Snapshots, and PEGASE. arXiv eprints, pp. arXiv:1603.01533. External Links: 1603.01533 Cited by: §I.
 [18] (2015) Adam: A method for stochastic optimization. See DBLP:conf/iclr/2015, External Links: Link Cited by: §IVB.
 [19] (2017) ImageNet classification with deep convolutional neural networks. Commun. ACM 60, pp. 84–90. Cited by: §IVB.
 [20] (2016) Deeper depth prediction with fully convolutional residual networks. CoRR abs/1606.00373. External Links: Link, 1606.00373 Cited by: §VI.
 [21] (2013) Rectifier nonlinearities improve neural network acoustic models. In in ICML Workshop on Deep Learning for Audio, Speech and Language Processing, Cited by: §IVB.
 [22] (201509) Market Rules for the Ontario Electricity Market. Technical report Cited by: §IIIA.
 [23] (2018) Revisiting small batch training for deep neural networks. CoRR abs/1804.07612. External Links: Link, 1804.07612 Cited by: §IVB.

[24]
(2010)
Rectified linear units improve restricted boltzmann machines
. In Proceedings of the 27th International Conference on International Conference on Machine Learning, ICML’10, Madison, WI, USA, pp. 807–814. External Links: ISBN 9781605589077 Cited by: §IVB.  [25] (2019) Optimal power flow using graph neural networks. ArXiv abs/1910.09658. Cited by: §IIA.
 [26] (2019) DeepOPF: deep neural network for dc optimal power flow. 2019 IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids (SmartGridComm), pp. 1–6. Cited by: §IIA.
 [27] Power factor and your bill. External Links: Link Cited by: §IIIA.
 [28] (2014) CNN features offtheshelf: an astounding baseline for recognition. CoRR abs/1403.6382. External Links: Link, 1403.6382 Cited by: §IVB.
 [29] (2019) Learning an optimally reduced formulation of opf through metaoptimization. ArXiv abs/1911.06784. Cited by: §IIA, §VI.
 [30] (2015) ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115 (3), pp. 211–252. External Links: Document Cited by: §I.
 [31] (200908) DC power flow revisited. IEEE Transactions on Power Systems 24 (3), pp. 1290–1300. External Links: Document, ISSN 15580679 Cited by: §I.
 [32] (2013) Multigpu training of convnets. arXiv preprint arXiv:1312.5853. Cited by: §V.
 [33] (2014) How transferable are features in deep neural networks?. CoRR abs/1411.1792. External Links: Link, 1411.1792 Cited by: §IVB.
 [34] (2018) Universality of deep convolutional neural networks. CoRR abs/1805.10769. External Links: Link, 1805.10769 Cited by: §I.
 [35] (201102) MATPOWER: steadystate operations, planning, and analysis tools for power systems research and education. IEEE Transactions on Power Systems 26 (1), pp. 12–19. External Links: Document, ISSN 15580679 Cited by: §I.
Comments
There are no comments yet.