I Introduction
Ensuring stable, secure and reliable operations of the power grid is a primary concern for system operators [1]. Security assessment and control actions heavily rely on the accuracy of the assumed power system model and its parameters and of the estimated state [2]. Thus, inaccuracies in state estimation data or in the networked dynamic model can impact the assessment of the system stability and the efficacy of the corresponding control measures. In this paper, we explore the possibility to leverage the proliferation of Phasor Measurement Units (PMUs) that collect time synchronous data in a distributed way, for validating the assumed power system model and the current system state. In particular, our goal is to develop a dataefficient learning framework for performing an online reconstruction of the dynamic model using the minimal number of assumptions and exclusively relying on the PMU measurements.
A number of recent works showed promising results in attacking this problem [3, 4, 5, 6, 7, 8, 9]. Here, we propose to extend the scope of existing works to the problem of extracting the dynamic state matrix from PMU measurements in a purely datadriven way, without assuming any knowledge of model parameters. We take advantage of the separation of scales that exists in the regime of ambient fluctuations around the steady state leading to power system dynamics excited by stochastic load variations. Under quite general and widely accepted assumptions in this ambient regime, we develop a provably consistent maximum likelihood based method that recovers the dynamic state matrix with a low number of observations. Importantly, the proposed methodology can be naturally extended to cases of unknown network topology and partial observations, and has a low computational complexity which is conducive for realtime estimations.
An accurate estimation of the dynamic state matrix has a large number of applications that have been well explored in the literature [10, 11], including model validation and parameter calibration [3, 4], probing the proximity to instability and helping in design of the corresponding emergency control actions [12, 13], optimization and resource allocation [14, 15], as well as identification and analysis of forced oscillations in the system [16]. The potential ability to use the learned dynamic parameters to simultaneously perform a purely measurementbased state estimation of deviations in power consumption from nominal values represents another attractive feature of our framework. A validated state estimation can improve resource allocation for generation redispatch.
The paper is organized as follows: in Section II we formulate the model and the reconstruction problem; in Section III we state our learning method and discuss the convergence properties of the proposed algorithm; in Section IV we illustrate our approach on a test system, and provide an empirical assessment of the performance of our algorithms; finally, in Section V we discuss possible extensions of our method and state some open problems.
Ii Problem formulation
We model the power network by a graph with a set of nodes (buses) and a set of edges (transmission lines) . We consider the regime of ambient oscillations around the steady state that is governed by the dynamics of generator angles. It is common to model ambient dynamics with a classical equivalent model of aggregated generators [17] that corresponds to a networkreduced power system where passive loads are eliminated via Kron reduction [18]. Although this modeling choice is not necessary for our analysis, it facilitates a uniform mathematical description where we can assume that every node in essentially corresponds to a generator with nonzero inertia and damping coefficients with temporal evolution governed by the swing equations [1]:
(1) 
where is the synchronous frequency (60 Hz in U.S.A.); and respectively correspond to the generator rotor angles and speeds; is the net power injection (e.g. the generator mechanical power input); and is the electrical power output. can be further expressed as a sum of power flows out of node : , with
(2) 
where conductance and susceptance correspond to the real and imaginary parts of the complex admittance () associated with each line in the network.
In the vicinity of the synchronous state, the difference of rotor angles is typically small, so that is close to zero for every pair . Therefore, in the regime of moderate ambient fluctuations it is standard [1] to linearize the expression (2) around the current operating point using and . Over the period of time where the voltage magnitudes can be approximately considered as constant, line admittances are characterized by effective susceptances and conductances , that absorb constant voltage magnitudes by definition. Given these simplifications due to DC linearization, we assume that the following relation is valid in expectation in the steady state regime:
(3) 
where denote the mean steady state values of rotor angles, and deviations of phase from these values are small. Note that conductances are typically negligible for power transmission lines and hence is usually omitted in the expression (3) under the assumption of purely inductive lines [1].
Finally, the resulting dynamic model that we consider in this paper takes the following form:
(4) 
where represents the effect of exogenous power deviations, and denotes the relative generator rotor speed measured with respect to the reference synchronous frequency .
From (4), equations for the whole system can be written in the matrix form as
(5) 
where denotes a generic
component vector and
can refer to , , , and ; is a dimensional vector; and and denote zero and identity matrices, respectively. is the susceptanceweighted Laplacian matrix defined as for ; ; and otherwise. Finally, and respectively represent diagonal inertia and damping matrices parametrized by . For compactness, let us rewrite the system (5) as(6) 
where is a shorthand notation for the system state vector at time , and is the dependent vector.
In this paper, our goal is to estimate the dynamic state matrix from PMU data providing time series measurements of dynamic variables . In principle, can be computed if all parameters entering (1) are known. As motivated above, here we deliberately pursue a purely datadriven approach that will provide a characterization of the system dynamic based on the solution of the inverse problem using the observed time series, and hence would allow to validate the assumed dynamic model. However, the dynamic state matrix can be assumed to be constant only at time scales for which the parameters entering (5) remain unchanged. In the ambient fluctuations setting, there exists a natural separation of time scales in parameter variations, see Figure 1. The parameters of generators, such as inertia or damping coefficients can be regarded as stable on the scale of several hours, with a potential slow drift due to droop or local feedback control [2]. Although infrequent changes in the system may provoke a change in , overall voltage magnitudes may be considered stable on the time interval of the order of tens of minutes [1]. The same conclusion holds for the values that may fluctuate due to variations in temperatures that rarely happen on shorter scales compared to tens of minutes. On the other hand, we assume that ambient fluctuations themselves are caused by fluctuations of loads and around base nominal values or generator noise [9], and therefore occur at very short scales of the order of the frequency cycle of Hz. Because of this reason and due to the aggregated nature of loads, the total power deviations are commonly modeled as random Gaussian variations [19]. Therefore, learning must take place using measurements obtained during the interval of time that happens on the scale of minutes, below the scale on which voltage magnitudes might change, and above the scale of load fluctuations . We model
as a zero mean Gaussian noise term with standard deviation
that incorporates the aggregated ambient fluctuations in power injection and consumption.PMU measurements are discrete in nature, and arrive as timeseparated data samples with a typical frequency of several cycles. Therefore, the observed data points approximately follow the dynamics representing the discretization of (6) with a certain step . Using the EulerMaruyama discretization scheme, we get to the first order in :
(7) 
with ,
is the standard multivariate normally distributed noise and
summarizes the resulting scale of fluctuations. It is reasonable to assume that load fluctuations are spatially independent across nodes so thatis diagonal; however, variance at individual buses can be different, so that
for has a meaning of the noise standard deviation at node and for . As we will see below, is an important parameter that drives the reconstruction procedure. Indeed, can not be smaller than the resolution of the PMU data, and should not be too small so thatcould be conveniently interpreted as uncorrelated white noise across time. At the same time,
can also not be too large because in this case would be essentially independent across time, meaning that the dynamic state matrix could not be recovered in principle, and one could only hope to get the estimation of the steady state covariance [20]. In order to facilitate the learning task, it is advantageous to select in such a way that it satisfies the tradeoff between the amount of observations used and the accuracy achieved. In what follows and unless stated otherwise, we will assume that is independent and identically distributed for the purpose of reconstruction of the dynamic state matrix .Iii Estimators
Iiia Maximum likelihood formulation
In this section, we present our estimators for the dynamic state matrix from discrete observations of the system . Consider the empirical crosscorrelation matrices with and without displacement that respectively read
(8)  
(9) 
Based on Eq. (7) we introduce the following estimator for the matrix which exists whenever the crosscorrelation matrix in Eq. (9) is invertible:
(10) 
We refer to this estimator as to the Maximum Likelihood (ML) estimator for the . Notice that the matrices and are at most of rank as expressions (8) and (8) are sums of rank one matrices. This implies that the matrix is not invertible for . When
, the matrix is invertible with probability one since
and are independent normally distributed vectors.Bounds on the expected reconstruction error for the ML estimator are given by the following theorem.
Theorem 1 (Reconstruction error for discrete dynamics)
Let and . The ML estimator (10) reconstructs the matrix with probability at least within a Frobenius norm error bounded by
(11) 
Proof:
The proof is given in Appendix A.
The bound in Theorem 1 is valid without any assumptions on the matrix . For a stable system with a steady state dynamics one expects that concentrates to its expectation as . In this case Theorem 1 shows that the error on the ML estimator decreases as the inverse squareroot of the number of observations and increases linearly with the noise intensity.
The dynamic state matrix describing the continuous dynamics in Eq. (6) is estimated from the discrete dynamic matrix using the relation
(12) 
Guarantees on the reconstruction error for the continuous dynamic matrix follow from Theorem 1.
Corollary 2
Let and . The error on the reconstruction of is with probability at least upperbounded by
(13) 
Proof:
It is a direct application of Theorem 1 with satisfying for .
The main implication of Corollary 2 is that the error on decreases with respect to the product . This product corresponds to the total observation time of the system . Therefore, if the number of data samples is large enough for to concentrate to its average, the reconstruction error only depends on through its inverse square root. Notice that for a fixed observation time, the error does not depend on the discretization, see [21] for an extended discussion on this property.
As mentioned earlier, (10) can be interpreted as the maximum likelihood estimator for the dynamic matrix . This means that the estimator can be seen as the outcome of some minimization procedure. While obtaining through an optimization problem might seem unnecessarily more complicated, formulating the estimator as a minimization procedure renders possible the incorporation of extra information in the estimation. This extra information can take the form of a prior on the inertia, damping parameters or line susceptances in the system, or it can serve as a prior on the location of zero elements in . As it is crucial to keep the learning time below the typical time for which system parameters drift (see Figure 1), adding extra information in the reconstruction help in accomplishing this task by increasing the learning accuracy for a fixed observation period. The precise minimization procedure is given by the following proposition.
Proposition 3 (MaximumLikelihood estimator)
Given observations resulting from the discrete linear dynamics (7), the maximum likelihood estimator of represents the solution of the following leastsquares regression
(14) 
Moreover, for , the minimum of the leastsquare regression is achieved by .
Proof:
The proof is given in Appendix B.
Notice that the first rows of the dynamic state matrix in Eq. (5) are . Even though it seems natural to incorporate this information in the estimation procedure, in fact it appears to be unnecessary as these rows are not directly affected by noise (given that for ) and are always reconstructed perfectly. However the situation is different for the diagonal lowerright block of , that is subject to noise. Thanks to the optimization formulation in Eq. (14), we can restrict the regression to matrices that have a diagonal lowerright block to obtain the following constrained estimator,
(15) 
While the constrained estimator (15) provides a more accurate reconstruction than its unconstrained counterpart (10), it is computationally more expensive as it requires to solve an optimization problem. This tradeoff between computational power and accuracy can be crucial for practical applications.
IiiB Extensions
Other extensions of the ML estimator can be considered based on prior information that we can incorporate in the leastsquares regression (14). For instance, when the system parameters are expected to drift slowly it is beneficial to incorporate a prior on the matrix . This prior can take the form of a Gaussian prior centered around a previous reconstruction. This translates into a leastsquared regression with a Tikhonov regularization [22], i.e.
(16) 
where is a parameter indicating our degree of confidence that the current matrix is close to its previous estimate .
There exists instances when the system parameters might not have been previously estimated, for example when the topology has been modified due to outages, lines tripping or controller changes [9], but the topology of the observed grid is known to be sufficiently sparse. In this case it would be beneficial to promote the sparsity of the matrix with a Laplace prior . This prior leads to the following LASSO estimator
(17) 
The LASSO estimator proves to be much more efficient than the unconstrained leastsquares (14) when the matrix is sparse, rendering the sample requirement very weakly dependent of the size of the problem [21].
Finally in cases where the state of only a subset of buses is observed [20], it is still possible to retrieve part
of the dynamic state matrix corresponding to the visible part of the system. This can be done with the socalled “sparse plus lowrank” heuristic producing
and , the minimizers of(18) 
where summarizes the effects of the unobserved measurements and the nuclear norm penalty represents a convex surrogate for the low rank constraint, see [23] for more details.
Iv Case study
We illustrate the performance of our learning algorithms on the IEEE 39bus 10generator test system [24]; the topology of this system is shown in Figure 2 (A). We assume that PMUs are located at the generator buses. First, we perform Kron reduction and eliminate passive loads, obtaining the parameters of the system (5) obeyed by the generators [9]. Given the established parameters of the reduced network and hence the ground truth , we use the discretized representation (7) to simulate the dynamics and produce the time series on and for each generator using the smallest resolution sec (1 cycle) for which our model (4) is still valid; in Figure 2 (B), we show one example of such run with the simulated data over minutes. In all simulations, the load variation is fixed at the level of p.u. [9]. We use data obtained in this way in all reconstruction experiments reported below.
Notice that PMUs might have a lower time resolution, subsampling these data points with a different time step , for instance once every cycles (for many measuring devices, the maximum sampling rate corresponds to or ). Moreover, because of the data processing reasons, one might wish to deliberately sample data at a lower frequency for the reconstruction purposes. Therefore, it is instructive to check the sensitivity of the algorithmic performance to the chosen subsampling step . But prior to that, we need to establish the measure of performance for the two estimators introduced in this paper: the Unconstrained Maximum Likelihood (UML) estimator (10) and the Constrained Maximum Likelihood (CML) estimator (15), where the word “constrained” means that the support pattern of has been explicitly enforced. We use both algorithms to produce an estimate of the dynamic state matrix . For a given time separation between measurement samples , first the discrete matrix at the corresponding scale (7) is estimated, and then the dynamic state matrix is recovered using the linear approximation (12).
In order to account for the additional sparsity structure present in , we supplement the application of (10) in the UML estimator with a postestimation thresholding of matrix elements that are known to be zero, in particular in the block corresponding to the diagonal submatrix in (5). We quantify the quality of the estimation using the relative Frobenius error , defined as
(19) 
where is the recovered matrix and is the ground truth dynamic state matrix used to produce data (at finer discretization sec).
The dependence on is shown in Figure 3 (A). In this figure, the observation window is fixed to min, and therefore the number of samples seen by the algorithms decreases with as . According to the Corollary 2, the reconstruction error should stay constant in . We see that for both algorithms we don’t see any clear plateau in dependence even for small . This is due to the increasing with error in the firstorder approximation (7) of the finegrained dynamics and hence to the level of validity of (12). Nevertheless, we see that for both estimators is growing slowly with , and the error seen for cycles is very close to the one realized when the finest possible discretization is taken, although the algorithms use three times less samples in the former case. Moreover, given that cycles represents a normal sampling rate for many measurement devices, we use this sampling rate in the subsequent experiments. The final thing that we observe is that the algorithm CML that exploits the structure of systematically yields a lower error compared to UML even with the added heuristic postreconstruction thresholding; on the other hand, it should be noted that CML is slower as one needs to solve the optimization problem (15) instead of just inverting the matrix (in the present experiments, the optimization was carried out using the Ipopt solver). Still, both algorithms run in seconds for this test case, which represents a premise for an online realtime implementation.
In Figure 3 (B), we study the performance of our estimator as a function of the number of samples for a fixed sampling step cycles (and hence for growing observation time from to minutes). We see that the experiment confirms the conclusion of the Corollary 2: the estimated dynamic state matrix quickly converges to the ground truth matrix , with CML algorithm achieving the relative error of by min and by min. This fact shows that it is possible to estimate the dynamic state matrix to an impressive accuracy under the time constraints outlined in Figure 1.
Besides the accurate prediction of the dynamic state matrix per se, a desirable feature of the estimators would consist in an accurate prediction of the properties of this matrix, in particular including the spectral properties [9]
. Indeed, the critical eigenvalue is known to serve as measures of proximity to the instability
[10, 12, 13], while the associated critical eigenvector might provide useful information on the system response and facilitate the design of control actions such as the generation redispatch
[12, 16]. In Figure 4, we test the accuracy of the critical eigenvalues prediction using the samples obtained within the min observation interval at the sampling rate cycles. Is is apparent that the critical eigenvalues are predicted to a good accuracy, which shows that our learning procedure can be used for the online monitoring of the system stability.V Discussion and conclusions
In this work, we explored the maximum likelihood based approach to the problem of estimation of the dynamic state matrix from PMU measurements. In particular, we constructed and tested two leastsquares estimators, one based on a fast inversion of the empirical covariance matrix, and another one based on the solution of a convex quadratic regression taking full advantage of the problem structure and leading to a more accurate reconstruction, but at an expense of a slightly higher computational time. These two estimators realize a common tradeoff in practical applications between the accuracy and the speed of computations. In this contribution, we have verified the properties of our algorithms on synthetic data from standard IEEE test case; in future studies, it would be instructive to test their performance on data collected from realworld PMUs.
As clarified in section III, our framework is very broad and can naturally accommodate the regularized online learning by incorporating the previously learned models as a prior thus potentially decreasing the computational time even further, as well as extensions to the case of sparse network topologies and to realistic scenario of incomplete observations due to a partial PMU coverage [20]. It would be useful to perform a theoretical finitesample analysis of these extensions in the future similarly to the analysis presented in this paper, as well as to provide an empirical assessment of the algorithm performance in realistic problems.
As we commented while motivating the dynamic learning problem, estimated parameters can be useful in a number of tasks related to optimization, control and security of the transmission grid. Another attractive feature of our methodology consists in an ability to perform datadriven state estimation of power fluctuations through estimation of the matrix related to power fluctuations (7), as explained in the proof of the Proposition 3 in Appendix B. It is important to quantify the potential advantage of using the learningbased method in these applications.
Some of relevant open questions that we did not address in this work include the construction of optimal estimators in the cases of more general noise distributions than the Gaussian case (for example, the powerlaw distributed noise), as well as for noise correlated in space and time. We anticipate that in the regime of weak spatial and temporal coupling the estimators introduced in this paper should be sufficiently robust and perform reasonably well, but this statement should be thoroughly checked on realistic test cases.
Finally, an interesting and natural direction for future exploration consists in extending our scheme to nonstationarity and to higherorder dynamic models.
Appendix A Proof of Theorem 1
First multiply both sides of Eq. (7) by and perform a sum over to obtain
(A.20) 
where
(A.21) 
Since , the crosscorrelation matrix is invertible. Multiplying both sides of Eq. (A.20) by gives . In expectation the difference in Frobenius norm between the ML estimator and the matrix A is upperbounded by the following expression
(A.22)  
(A.23)  
(A.24) 
where in two last lines we have used CauchySchwarz inequality for the Frobenius norm and for the expectation, respectively. In order to compute the expected Frobenius norm of , we first evaluate the expectations of its elements squared
(A.25)  
(A.26) 
where we used that and if . It is now easy to compute the expected Frobenius norm of the matrix ,
(A.27) 
The final step of the proof consists in combining Eq. (A.24), Eq. (A.27) and Markov inequality.
Appendix B Proof of Proposition 3
The likelihood function is the probability density function (PDF) of the observation given the parameters
that define the model. Since and the noise is identically and independently distributed, we can related the PDF of for a given to the PDF of a single , i.e.(B.28) 
The maximum likelihood estimator is given by the argmax of the likelihood in Eq. (B.28) with respect to . Equivalently one can minimize the opposite of the logarithm of the likelihood to obtain
(B.29) 
After replacing the normally distributed PDF for , we arrive at the following optimization problem
(B.30) 
Notice that in Eq. (B.30), the optimization over is independent of the value of and can be performed separately, yielding the optimization problem (14) that after some algebra can be equivalently represented as
(B.31) 
Whenever is invertible, the minimization in Eq. (B.31) can be done analytically and gives . The estimate can then be obtained by solving (B.30).
References
 [1] P. Kundur, N. J. Balu, and M. G. Lauby, Power system stability and control. McGrawhill New York, 1994, vol. 7.
 [2] P. W. Sauer, M. A. Pai, and J. H. Chow, Power System Dynamics and Stability: With Synchrophasor Measurement and Power System Toolbox. John Wiley & Sons, 2017.

[3]
Z. Huang, P. Du, D. Kosterev, and B. Yang, “Application of extended kalman filter techniques for dynamic model parameter calibration,” in
Power & Energy Society General Meeting, 2009. PES’09. IEEE. IEEE, 2009, pp. 1–8.  [4] N. Zhou, S. Lu, R. Singh, and M. A. Elizondo, “Calibration of reduced dynamic models of power systems using phasor measurement unit (pmu) data,” in North American Power Symposium (NAPS), 2011. IEEE, 2011, pp. 1–7.
 [5] S. Guo, S. Norris, and J. Bialek, “Adaptive parameter estimation of power system dynamic model using modal information,” IEEE Transactions on Power Systems, vol. 29, no. 6, pp. 2854–2861, 2014.
 [6] N. Zhou, D. Meng, Z. Huang, and G. Welch, “Dynamic state estimation of a synchronous machine using pmu data: A comparative study,” IEEE Transactions on Smart Grid, vol. 6, no. 1, pp. 450–460, 2015.
 [7] Y. C. Chen, J. Wang, A. D. DomínguezGarcía, and P. W. Sauer, “Measurementbased estimation of the power flow jacobian matrix,” IEEE Transactions on Smart Grid, vol. 7, no. 5, pp. 2507–2515, 2016.
 [8] G. Chavan, M. Weiss, A. Chakrabortty, S. Bhattacharya, A. Salazar, and F. HabibiAshrafi, “Identification and predictive analysis of a multiarea wecc power system model using synchrophasors,” IEEE Transactions on Smart Grid, 2017.
 [9] X. Wang, J. Bialek, and K. Turitsyn, “Pmubased estimation of dynamic state jacobian matrix and dynamic system state matrix in ambient conditions,” IEEE Transactions on Power Systems, 2017.
 [10] J. Machowski, J. Bialek, and J. R. Bumby, Power system dynamics and stability. John Wiley & Sons, 1997.
 [11] H.D. Chiang, Direct methods for stability analysis of electric power systems: theoretical foundation, BCU methodologies, and applications. John Wiley & Sons, 2011.
 [12] G. Ghanavati, P. D. Hines, and T. I. Lakoba, “Identifying useful statistical indicators of proximity to instability in stochastic power systems,” IEEE Transactions on Power Systems, vol. 31, no. 2, pp. 1360–1368, 2016.
 [13] T. Van Cutsem and C. Vournas, Voltage stability of electric power systems. Springer Science & Business Media, 1998, vol. 441.
 [14] B. K. Poolla, S. Bolognani, and F. Dorfler, “Optimal placement of virtual inertia in power grids,” IEEE Transactions on Automatic Control, 2017.
 [15] D. Deka, H. Nagarajan, and S. Backhaus, “Optimal topology design for disturbance minimization in power grids,” in American Control Conference (ACC), May 2017, pp. 2719–2724.
 [16] S. MendozaArmenta and I. Dobson, “Applying a formula for generator redispatch to damp interarea oscillations using synchrophasors,” IEEE Transactions on Power Systems, vol. 31, no. 4, pp. 3119–3128, 2016.
 [17] J. H. Chow, Power system coherency and model reduction. Springer, 2013.
 [18] F. Dorfler and F. Bullo, “Kron reduction of graphs with applications to electrical networks,” IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 60, no. 1, pp. 150–163, 2013.

[19]
R. Singh, B. C. Pal, and R. A. Jabr, “Statistical representation of distribution system loads using gaussian mixture model,”
IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 29–37, 2010.  [20] D. Deka, A. Zare, A. Lokhov, M. Jovanovic, and M. Chertkov, “State and noise covariance estimation in power grids using limited nodal pmus,” in IEEE Global Conference on Signal and Information Processing, 2017.
 [21] J. Bento, M. Ibrahimi, and A. Montanari, “Learning networks of stochastic differential equations,” in Advances in Neural Information Processing Systems, 2010, pp. 172–180.
 [22] M. Vauhkonen, D. Vadasz, P. A. Karjalainen, E. Somersalo, and J. P. Kaipio, “Tikhonov regularization and prior information in electrical impedance tomography,” IEEE Transactions on Medical Imaging, vol. 17, no. 2, pp. 285–293, April 1998.

[23]
A. Jalali and S. Sanghavi, “Learning the dependence graph of time series with
latent factors,” in
Proceedings of the 29th International Conference on Machine Learning (ICML)
, 2012, pp. 473–480.  [24] P. Demetriou, M. Asprou, J. QuirosTortos, and E. Kyriakides, “Dynamic ieee test systems for transient analysis,” IEEE Systems Journal, 2015.
Comments
There are no comments yet.