I Introduction
Parameter Estimation (PE) and State Estimation (SE) are at the core of the Power Systems (PS) operations, control and optimization [1, 2, 3] at all levels, and specifically at the transmission level this manuscript is focusing on. PE and SE are built in many Energy Management Systems (EMS) [4] tasks, such as situational awareness for efficient, comfortable, profitable and secure use, maintenance of equipment, demand management and prediction and short and long term planning. Modern EMS, implemented in the form of the socalled Wide Area Measurement Systems (WAMS) envisioned in 1990s [5] and deployed massively in USA [6] and around the globe within the last decade, are cloud based with links to locally hosted Supervisory Control and Data Acquisition (SCADA) systems, traditionally (since 1970s) used to control site/node local consumption (or generation) of active and reactive powers automatically, and the Phasor Measurement Units (PMUs) [7, 8], synchronized to each other via GPS, positioned at the grid’s central locations (e.g. major generators), and reporting local measurements, i.e. (absolute value of) voltage, phase (against a reference point, so called slack bus) and active and reactive injected powers. Integration of the WAMSPMU technology into the EMS has already generated many new applications, e.g. in monitoring, state estimation, fault localization and protective relaying, islanding detection (see most recent review [9] for extended discussions and references).
PSSE complements analysis of the Power Flow (PF) equations, see e.g. Eqs. (1), by accounting for measurement errors and also taking into account limited observability, i.e. availability of PMU and/or SCADA measurements not at all locations. PSPE extends PSPE accounting for parametric uncertainties within PS, e.g. addittance matrix. Following modern literature on the subject, see e.g. [10] and references there in, SE, PE or the combination of the two, are stated as a data fitting (signal processing) optimization, with many novel solutions reported in recent years [11, 12, 13, 14], in particular these taking advantage of the emerging Machine Learning (ML) methodologies [15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28] (some of these in the context of a related problem of fault detection).
This manuscript continues the thread, started in our recent paper [21] and in a companion submission to CDC [29], suggesting application of the PhysicsInformed ML (PIML) to SE and PE in PS models. Essence of PIML is in resolving the problem as a regression based on the PF equations in their static or dynamic (then working with the socalled swing equations) [30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]. We focus here on the static PF setting, therefore assuming access to a static (or quasistatic) data consistent with solutions of the PF Eqs. (1).
In our prior work on the subject [21] we resolve the SE and ME by training a ML scheme to reproduce the explicit Inverse Map (function) from amplitudes and phases of the voltage potentials to complex powers at the nodes of the PS (that is utilizing the map formally defined below in Section II as
). This specific choice of the input and output in the supervised learning setting of
[21] was made to turn the PF equations into an explicitly executable map (i.e. map not requiring to solve an nonlinear multiparametric algebraic equation). Even thought the approach is appealing technically it is also handicapped in terms of the PS applications because
sampling of the training data is distorted in comparison with the practical inputoutput setting (see Eq. (2) below);
Desire to correct for the problem motivates the alternative approach this manuscript suggests: embedding the standard way of solving the PF Eqs. (1), via the NewtonRaphson (NR) scheme [41, 42], into a PIML algorithm.
We show in this manuscript that:

The NR scheme with a fixed number of iterations can be built in the ML algorithm explicitly thus allowing to take advantage of the automatic differentiation for efficient training (back propagation). We coin it the NewtonRaphson informed Machine Learning (NRML) algorithm.

Only a few iterations of the NR scheme, hardcoded within NRML, are sufficient to get highquality SE and PE reconstruction.

SE is achieved relatively fast, while a longer training is required to achieve acceptable quality of PE.

The Power System Informed approach taken in the manuscript allows to generalize/extrapolate, i.e. achieve a very good quality of the SE reconstruction even when tested in the regimes (on samples) which are sufficiently far from these used in training.
The material in the remainder of the manuscript is organized as follows. We set nomenclature (e.g. for the NR solution of the PF problem) in Section II
. Specifics of the NRML algorithm, e.g. InputOutput data, PE prescribed Loss Function for training and SE validation test, are described in Section
III. Our training and validation experiments with NRML on an exemplary 118node IEEE model are described in Section IV. Section V is reserved for conclusions and discussions of the path forward.Ii Solving the Power Flow Equations
We consider a transmission level power grid over the gridgraph, , where and are the set on nodes (generators or loads) and set of lines, respectively. The Power Flow (PF) equations, governing steady redistribution of power over the system, are stated in terms of the complex injected powers, , and in terms of the complex potentials, , where ; and denote voltage (magnitude) and phase of the (voltage) potential at the node : ,
(1) 
where and is the admittance matrix. The equation (1) can be viewed as the map from the complex powers (input) to the complex potentials, , which is noninjective and implicit and may allow no solution or have multiple solutions. It can also be interpreted in terms of the inverse map, , which is the one utilized in the ML schemes of [21]. (It is injective and explicit function mapping the complex potentials (input) to complex powers (outputs).) In the (transmission system) practical setting inputs and outputs in Eqs. (1) are mixed, depending on if a node corresponds to a generator, , or a load, , where and :
(2) 
and is a special bus – the socalled slack bus (usually associated with the largest generator in the system) where phase and voltage are fixed according to, and . In this manuscript we work with the (practical) PF setting, described by Eqs. (1,2) and also supplemented by the slack node conditions, which is coined the MixedPVPQ map. (This name reflects on the fact that, according to Eq. (2) the map has a mixed input – for the generators and for the loads). Notice, that the MixedPVPQ may have unique, no or multiple solutions, like in the inverse map setting described above.
NewtonRaphson (NR) scheme is normally used to resolve the PF equations in the MixedPVPQ setting. The NR scheme works as follows. The admittance matrix,
, vector of active power injections/consumptions at the loads and generators,
, vector of the reactive demands at the loads, , and vector of voltage magnitudes at the generators, , are fixed. We introduce the vector of voltage magnitudes at the loads, , the vector of phases, , the vectors of the active and reactive power mismatches, and , respectively, which are updated at each step of the NR iterations as follows(3) 
where, and , are increments of the voltage and phase vectors, respectively, acquired (during the elementary NS step). in Eq. (3) is the Jacobian matrix of the PF equations, which is convenient to decompose into submatrices according to
(4) 
where , , , . Eq. (3) should also be complemented by (direct) computation of the vector of the reactive powers injected (or consumed) at the generators, . See Algorithm 1 for the pseudocode of the NR algorithm repeated times within the MixedPVPQ setting. The algorithm is initialized with the “flat start”: , and , where , are voltage setpoints at the generators (in the dimensionless units).
Iii Machine Learning Framework
We assume that the PMUs are installed at generators, therefore providing observations of active and reactive powers, voltages and phases at all the generators and that power injections (active and reactive) are observed at the load nodes (equipped with SCADA sensors):
Therefore we generate observation sets, each labeled by , and minimize the following loss function over the vector of parameters (the admittance matrix ):
(5)  
The minimization is performed with Adam optimizer [43]
(an algorithm for firstorder gradientbased optimization of stochastic objective functions based on adaptive estimates of lowerorder moments) in PyTorch
[44], utilizing for faster convergence tensor representation, automatic differentiation over the grid parameters,
, and alternating between batches of samples at each backward propagation epoch. See Appendix
A for implementation details.Iv Experiments
We experiment with the IEEE118 model. We follow the data generation process described in [21] and prepare five different data sets. Each data set consists of 2000 samples. The data sets are sorted from case #1 to case #5 in terms of their perceived mapping complexities. As observed in [21], the PhysicsInformed ML scheme requires much smaller training sets in comparison with their physicsagnostic counterparts. This observation extends to the present work. Here we assign one out of fifty samples to the training set, the remaining samples are used for validation.
Dependence of the Loss Function (LF), , on the number of epochs is selected as an indicator of the quality of training. Fig 1 (a) shows that the LF decreases monotonically and a rather small values of are reached relatively fast. Moreover, the monotonic decrease of the LF with the training time (number of epochs) is observed at any number of NR steps (even at ) hardcoded into the training process and also for different values of the learning rate . We notice that decay of the LF saturates to a small value in the process of training relatively fast, as seen in Fig 1 (a). This observation is also confirmed in the validation test (see Table 1), therefore reporting State Estimation of a good quality. To test the Parameter Estimation, which measures quality of reconstructing the admittances, we analyze the Admittance Reconstruction Error (ARE)
(6) 
where is the reconstruction error obtained with the initial values of the grid parameters and is the actual admittance matrix. Fig 1 (b), showing dependence of the ARE on the number of the training epochs, suggests that reconstruction of the State Estimation (SE), controlled by the Loss Function, is observed relatively early in the training process when the Parameter Estimation (PE), controlled by the ARE, is still large. ARE decreases with increase in the number of epochs leading, eventually, to the PE of a sufficiently high quality. We also observe in Fig 1 (b) a consistent decrease with the number of the NR iterations, , and also a stronger dependence on than one seen in the LF. At , the ARE plateaus at a much higher value than for the schemes with .
We have also experimented with the learning rate and observed that it has a strong influence on both SE and PE. We have reached the conclusion, that even though increasing learning rate may speed up decrease of the ARE at the early stages of training, the aggressive strategy is suboptimal overall as it does not result in the LF reaching its minimal value. Moreover, when we increase the number of epochs (in the discussed case of the high learning rate) continues to decrease while starts to increase. We relate this trouble to the overfitting phenomenon, and plan to continue the experimental work in the future in order to find an empirically satisfactory stopping criterion.
Following protocol which is standard in ML we conduct extensive validation tests on the samples not used for training. Such tests are important to quality of the SE in terms of the ability to generalize, i.e. extrapolate into regimes not seen in training. Specifically, we introduce the following Validation Error, testing the quality of prediction at the loads:
(7) 
Dependence of the Training Error (Loss Function), Validation Error, Reconstruction Error and Duration (normalized by the training time for ) on the number of NR iterations, , are reported in Table 1. The results shown in the Table were collected for the case of the learning rate, , and the number of epochs, . We observe that the training time increases roughly linearly with the number of NR iterations. On the other hand, we also report that the accuracy of the results stops to improve after .
NR iter.  Train.  Valid.  Adm. Rec.  Duration 

Err.  Err.  Err.  
1  6.79E7  7.68E3  0.493  1.00 
2  5.33E7  1.42E3  0.284  2.36 
3  5.35E7  1.16E3  0.290  3.52 
4  5.34E7  1.15E3  0.290  4.24 
6  5.36E7  1.15E3  0.290  6.45 
8  5.35E7  1.15E3  0.290  8.30 
In Fig. 2, we compare how our method perform on different data sets. Paradoxically, case #3 gives a slightly better reconstruction of while having a slightly higher loss function values. Finally, Fig. 3 shows a comparison between the estimated and real line admittances. The main message here is that the agreement is almost perfect. We also note that for these few lines where the reconstruction of admittances is not perfect, the reconstructed admittances are reported within their physically sensible ranges (positive for conductances and negative for susceptances). We attribute this success of the Parameter Estimation to the exponential parametrization of the susceptances. (See Appendix B for details.)
V Conclusions and Path Forward
In this manuscript we showed how the NewtonRaphson scheme of resolving the Power Flow equation, which is a key tool of standard power system analysis, can be used within ML technique producing efficiently State Estimation and, in parallel, efficient (and interpretable) Parameter Estimation based on the PMU and SCADA data sources. Advantage of the approach, in comparison with other alternatives recently discussed in the literature (in particular by the authors of this manuscript) is that the InputOutput, Mixed PVPQ framework is much closer to the practical engineering reality of the power system operations, control and planning. We therefore believe that the methodology suggested will become an important building block of future development of a number data driven optimization scheme, such as data aware (driven) generation and demand respond dispatch, unit commitment and multiscenario planning.
Most significant technical result of our approach consists in building in the NewtonRaphson (NR) scheme within a modern ML framework, specifically PyTorch, which has allowed us to take advantage of the automatic differentiation and related technical tools developed by the ML community in recent years. Quite remarkably our experiments show that hardcoding into the training procedure only 23 iterations of the NR is sufficient for getting high quality State and Parameter Estimations while keeping computational time and memory requirements relatively light.
We also encountered an implementation problem which requires attention of the ML community developing PyTorch (or other similar products). Our attempt to scale up computations to larger grids run into memory problem, because PyTorch does not yet support sparse (linear algebra) computations, specifically it is lacking ability to solve efficiently large but sparse system of equations (invert large sparse matrices). Building ML computational framework which allows compression of the sparse data structures, such as these associated with the tensor blocks our algorithm is built on (see Appendix A), is important for making the Power System Informed (and more generally Physics Informed) ML schemes practical.
a Batch version of the MixedPVPQ NewtonRaphson Scheme
ML schemes are usually trained over (mini)batches. Batch approximation of the gradient reduces oscillation and noise during the gradient descent and therefore provides a more stable implementation. Most algorithms of the modern ML, e.g. implemented in PyTorch, have been optimized to handle tensor inputs because tensor operations are typically much more efficient then nested loops.
In this Appendix we give technical details, which are critical for efficient implementation, on the tensor operations needed to train the MixedPVPQ Newton Raphson scheme.
Let us define the source incidence matrix
(8) 
and the target incidence matrix
(9) 
then the undirected and directed incidence matrices and read
(10)  
(11) 
In the following we are using Einstein summation convention (summation over repetitive indexes). Components of the admittance matrix can be written in terms of the line admittances and shunt admittances as follows
(12) 
where is the admittance of line and is the shunt admittance at bus . Let us introduce the following two useful elementary tensors
(13)  
(14) 
Then active and reactive injections based on and are
(15)  
(16) 
where is the Kronecker delta. Tensors contributing the Jacobian are
(17)  
(18)  
(19)  
(20) 
with
(21)  
(22)  
(23)  
(24) 
B Parameterization of line admittances
It is highly advisable in any ML approach to adjust different parameters to make their initializations (priors), ranges and expected values comparable. However bare variation in line admittances over the system may spread over more than a decade. To overcome this difficulty we represented in [21] line admittances via resistances and reactances
(25)  
(26) 
taking advantage of the fact that the latter characteristics spread over much narrower range. One wishes reconstructed admittances to be within their physically sensible ranges, this can be achieved by adding the following regularization term to the loss function
(27) 
In this manuscript, we have implemented what we believe is a simpler solution, as achieving the same goal however without regularization. We simply use the following “exponential” parameterization
(28)  
(29) 
References
 [1] F. C. Schweppe and J. Wildes, “Power system staticstate estimation, part i: Exact model,” IEEE Transactions on Power Apparatus and Systems, vol. PAS89, no. 1, pp. 120–125, 1970.
 [2] A. GómezExpósito, A. J. Conejo, and C. Cañizares, Energy Systems: Analysis and Operation. Taylor & Francis Group, 2009.
 [3] G. B. S. Allen J. Wood, Bruce F. Wollenberg, Power Generation, Operation, and Control, 3rd edition. Wiley, 2014.
 [4] A. Abur and A. G. Exposito, Power system state estimation: theory and implementation. CRC press, 2004.
 [5] J. S. Thorp, A. G. Phadke, and K. J. Karimi, “Real time voltagephasor measurement for static state estimation,” IEEE Transactions on Power Apparatus and Systems, vol. PAS104, no. 11, pp. 3098–3106, 1985.
 [6] DOE, “Recovery act: Synchrophasor applications in transmission systems,” https://www.smartgrid.gov/recovery_act/program_impacts/applications_synchrophasor_technology.
 [7] U.S. Energy Information Administration, “New technology can improve electric power system efficiency and reliability – Today in Energy.” https://www.eia.gov/todayinenergy/detail.php?id=5630, accessed 20210131.
 [8] J. De La Ree, V. Centeno, J. S. Thorp, and A. G. Phadke, “Synchronized phasor measurement applications in power systems,” IEEE Transactions on Smart Grid, vol. 1, no. 1, pp. 20–27, 2010.
 [9] M. U. Usman and M. O. Faruque, “Applications of synchrophasor technologies in power systems,” Journal of Modern Power Systems and Clean Energy, vol. 7, pp. 211–226, Mar. 2019.
 [10] G. B. Giannakis, V. Kekatos, N. Gatsis, S. Kim, H. Zhu, and B. F. Wollenberg, “Monitoring and optimization for power grids: A signal processing perspective,” IEEE Signal Processing Magazine, vol. 30, no. 5, pp. 107–128, 2013.
 [11] L. Vanfretti, J. H. Chow, S. Sarawgi, and B. Fardanesh, “A phasordatabased state estimator incorporating phase bias correction,” IEEE Transactions on Power Systems, vol. 26, no. 1, pp. 111–119, 2011.
 [12] S. G. Ghiocel, J. H. Chow, G. Stefopoulos, B. Fardanesh, D. Maragal, B. Blanchard, M. Razanousky, and D. B. Bertagnolli, “Phasormeasurementbased state estimation for synchrophasor data quality improvement and power transfer interface monitoring,” IEEE Transactions on Power Systems, vol. 29, no. 2, pp. 881–888, 2014.
 [13] H. Zhu and G. B. Giannakis, “Power system nonlinear state estimation using distributed semidefinite programming,” IEEE Journal of Selected Topics in Signal Processing, vol. 8, no. 6, pp. 1039–1050, 2014.
 [14] R. Madani, M. Ashraphijuo, J. Lavaei, and R. Baldick, “Power system state estimation with a limited number of measurements,” in 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 672–679, 2016.

[15]
P. N. P. Barbeiro, J. Krstulovic, H. Teixeira, J. Pereira, F. J. Soares, and J. P. Iria, “State estimation in distribution smart grids using autoencoders,” in
2014 IEEE 8th International Power Engineering and Optimization Conference (PEOCO2014), pp. 358–363, 2014. 
[16]
K. R. Mestav, J. LuengoRozas, and L. Tong, “State estimation for unobservable distribution systems via deep neural networks,” in
2018 IEEE Power Energy Society General Meeting (PESGM), pp. 1–5, 2018. 
[17]
W. Li, D. Deka, M. Chertkov, and M. Wang, “Realtime faulted line localization and pmu placement in power systems through convolutional neural networks,”
IEEE Transactions on Power Systems, vol. 34, pp. 4640–4651, Nov 2019.  [18] A. S. Zamzam, X. Fu, and N. D. Sidiropoulos, “Datadriven learningbased optimization for distribution system state estimation,” arXiv:1807.01671, 2019.
 [19] L. Zhang, G. Wang, and G. B. Giannakis, “Realtime power system state estimation and forecasting via deep unrolled neural networks,” IEEE Transactions on Signal Processing, vol. 67, no. 15, pp. 4069–4077, 2019.

[20]
L. Wang, Q. Zhou, and S. Jin, “Physicsguided deep learning for power system state estimation,”
Journal of Modern Power Systems and Clean Energy, vol. 8, no. 4, pp. 607–615, 2020.  [21] L. Pagnier and M. Chertkov, “Physicsinformed graphical neural network for parameter & state estimations in power systems,” arXiv:2102.06349, 2021.
 [22] V. Bolz, J. Rueß, and A. Zell, “Power flow approximation based on graph convolutional networks,” in 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA), pp. 1679–1686, 2019.
 [23] B. Donon, B. Donnot, I. Guyon, and A. Marot, “Graph neural solver for power systems,” in 2019 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 2019.
 [24] D. Owerko, F. Gama, and A. Ribeiro, “Optimal power flow using graph neural networks,” arXiv:1910.09658, 2019.
 [25] W. Liao, B. BakJensen, J. R. Pillai, Y. Wang, and Y. Wang, “A review of graph neural networks and their applications in power systems,” arXiv:2101.10025, 2021.
 [26] C. Kim, K. Kim, P. Balaprakash, and M. Anitescu, “Graph convolutional neural networks for optimal load shedding under line contingency,” in 2019 IEEE Power Energy Society General Meeting (PESGM), pp. 1–5, 2019.
 [27] W. Liao, D. Yang, Y. Wang, and X. Ren, “Fault diagnosis of power transformers using graph convolutional network,” CSEE Journal of Power and Energy Systems, pp. 1–9, 2020.
 [28] K. Chen, J. Hu, Y. Zhang, Z. Yu, and J. He, “Fault location in power distribution systems via deep graph convolutional networks,” IEEE Journal on Selected Areas in Communications, vol. 38, p. 119–131, Jan 2020.
 [29] A. Afonin and M. Chertkov, “Which Neural Network to Choose for PostFault Localization and Dynamic State Estimation in Power Systems?,” Submitted to 60th Control and Decision Conference (CDC), 2021.
 [30] J. Machowski, J. Bialek, and J. R. Bumby, Power system dynamics and stability. John Wiley & Sons, 1997.

[31]
Z. Huang, P. Du, D. Kosterev, and B. Yang, “Application of extended kalman filter techniques for dynamic model parameter calibration,” in
Power & Energy Society General Meeting (PES’09), pp. 1–8, IEEE, 2009.  [32] H.D. Chiang, Direct methods for stability analysis of electric power systems: theoretical foundation, BCU methodologies, and applications. John Wiley & Sons, 2011.
 [33] N. Zhou, S. Lu, R. Singh, and M. A. Elizondo, “Calibration of reduced dynamic models of power systems using phasor measurement unit (pmu) data,” in North American Power Symposium (NAPS), pp. 1–7, IEEE, 2011.
 [34] S. Guo, S. Norris, and J. Bialek, “Adaptive parameter estimation of power system dynamic model using modal information,” IEEE Transactions on Power Systems, vol. 29, no. 6, pp. 2854–2861, 2014.
 [35] N. Zhou, D. Meng, Z. Huang, and G. Welch, “Dynamic state estimation of a synchronous machine using pmu data: A comparative study,” IEEE Transactions on Smart Grid, vol. 6, no. 1, pp. 450–460, 2015.
 [36] Y. C. Chen, J. Wang, A. D. DomínguezGarcía, and P. W. Sauer, “Measurementbased estimation of the power flow jacobian matrix,” IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2507–2515, 2016.
 [37] G. Chavan, M. Weiss, A. Chakrabortty, S. Bhattacharya, A. Salazar, and F. HabibiAshrafi, “Identification and predictive analysis of a multiarea wecc power system model using synchrophasors,” IEEE Transactions on Smart Grid, 2017.
 [38] X. Wang, J. Bialek, and K. Turitsyn, “Pmubased estimation of dynamic state jacobian matrix and dynamic system state matrix in ambient conditions,” IEEE Transactions on Power Systems, 2017.
 [39] A. Y. Lokhov, M. Vuffray, D. Shemetov, D. Deka, and M. Chertkov, “Online learning of power transmission dynamics,” in 2018 Power Systems Computation Conference (PSCC), pp. 1–7, June 2018.
 [40] J. Ostrometzky, K. Berestizshevsky, A. Bernstein, and G. Zussman, “Physicsinformed deep neural network method for limited observability state estimation,” arXiv:1910.06401, 2020.
 [41] A. R. Bergen and V. Vittal, Power systems analysis. Prentice Hall, 2000.
 [42] J. J. Grainger and W. D. J. Stevenson, Power system analysis. 1994.
 [43] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 79, 2015, Conference Track Proceedings (Y. Bengio and Y. LeCun, eds.), 2015.
 [44] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, highperformance deep learning library,” in Advances in Neural Information Processing Systems 32 (H. Wallach, H. Larochelle, A. Beygelzimer, F. dAlchéBuc, E. Fox, and R. Garnett, eds.), pp. 8024–8035, Curran Associates, Inc., 2019.
Comments
There are no comments yet.