I Introduction
Reliability and operational efficiency of modern distribution systems are currently being challenged by high penetration of unpredictable renewable energy resources, largescale deployment of electric vehicles, and ‘humanintheloop’ demand response programs. As a consequence, reverse power flow as well as voltage magnitude fluctuations are prevailing in nowadays residential grids [4]. For instance, solar power generation may drop by of the photo voltaic (PV) nameplate rating within one minute due, for example, to intermittent cloud coverage [3], which will result in a sizable voltage sag if no action is taken. The role of networked control in power systems is to maintain desired operations, while preventing contingency events involving voltage and/or frequency instabilities from developing into largescale cascades and blackouts. To protect electrical devices, bus voltage magnitudes in distribution grids are typically regulated to be within a certain range, e.g., around their nominal values. A common practice to achieve this is through reactive power compensation.
Traditional approaches have relied on utilityowned devices including loadtapchanging transformers, voltage regulators, and capacitor banks to control reactive power injection into the grid. Although these devices perform well in certain cases, slow response times, discrete control actions, and lifespan limitations discourage them from fast reactive power control [6]. Recent advances in smart inverters offer new opportunities by circumventing these limitations. Despite their advantages, computing the optimal setpoints for smart inverters can be cast as an instance of the optimal power flow task, which entails solving a nonconvex optimization problem [6, 11]. Furthermore, to deal with the renewable energy uncertainties as well as unreliable communication links (which cause delay and even communication failures), stochastic, online, decentralized, and localized smart inverter control schemes have been developed [11, 16, 13, 23, 21]. Nonetheless, centralized solvers suffer from high computational complexity, and decentralized and localized schemes algorithms converge slowly.
To bypass these hurdles, recent proposals have engaged machine learning approaches for fast networked control and monitoring
[14, 17, 20, 19, 18]. A support vector machinebased method was devised in
[10] to approximate a nearoptimal inverter control rule. In [17], the authors developed a voltage regulation scheme using deep reinforcement learning. Deep (recurrent) neural networks were used for power system state estimation and forecasting in
[20]. By exploiting the power grid topology, a physicsaware neural network was proposed for state estimation [19]. Related schemes leveraging deep neural networks that ‘learntooptimize’ also appeared in resource allocation [12] and outage detection [22]. Unfortunately, training existing supervised learning models for reactive power control, requires largescale labeled training data, which are difficult to be obtained in realworld physical systems. Reinforcement learning approaches on the other hand, entail prior knowledge on designing the socalled reward functions and often converge slowly.
Different from existing efforts, in this work an unsupervised statistical learning approach is developed for computationally intensive and timesensitive reactive power control. Specifically, a deep neural network is used to parameterize the functional relationship between the grid state vector and the optimal reactive power compensation. The computational complexity of solving nonconvex optimization problems is shifted to offline training of a deep neural network. In the training phase, by feeding grid state vectors obtained from historical data or through simulations, the weight parameters of the deep neural network are updated iteratively via policy gradient method. In the online inference phase, or realtime implementation, one just needs to pass the observed state vector into the trained deep neural network, and obtains a nearoptimal reactive power control at the output. Our modelfree approach requires no system knowledge and is computationally inexpensive. It also bypasses the need for data labels, and tackles the optimal reactive control problem through policy gradients.
Regarding the remainder of this paper, Section II introduces our system model. Section III outlines the reactive power control problem formulation, followed by the proposed statistical learning solver in Section IV. Numerical tests using a realworld feeder are presented in Section V, with concluding remarks drawn in Section VI.
Notation. Lower (upper) case boldface letters denote column vectors (matrices), with the exception of power flow vectors , and normal letters represent scalars. Calligraphic symbols are reserved for sets, and represents the distribution over space .
Ii System Model
Consider a radial power distribution network modeled by a tree graph , where denotes the set of buses, and the set of edges. The tree is rooted at the substation bus indexed by , and all branch buses are collected in . For each bus , let denote its squared voltage magnitude, and denote its complex power injection, where and with superscript () specifying generation (consumption).
Thanks to the radial distribution grid topology, every nonroot bus has a unique parent bus, denoted by ; and they are joined through the th distribution line , whose impedance is given by . Let represent the complex power flow from buses to seen at the ‘front’ end, and represent the magnitude square of the current over line . For future reference, collect all nodal and line quantities into column vectors , , , , , , , and . See Fig. 1 for a depiction.
The radial grid can be described by the sotermed branch flow model [2], which enforces the following equations for all
(1a)  
(1b)  
(1c)  
(1d) 
where the set collects all children buses for bus .
Traditionally, for a smart inverter located at bus with nominal power capacity , and a solar panel equipped at this bus with an nameplate active power capacity , it should hold that . In addition, the reactive power generated by the inverter is constrained by , where is the smart inverter output. However, to capture the special scenario that no reactive power can be provided when the maximum inverter output is reached (i.e., ), oversized inverters’ nameplate capacity (i.e., ) is used in practice. For instance, the reactive power compensation provided by inverter can be , if choose and limit to instead of , regardless of the instantaneous PV output [11]. Under this policy, the reactive injection region is the timeinvariant convex set
(2) 
where , and denotes the number of inverters in the grid. Moreover, the voltage magnitude at every bus should be maintained within a prespecified range, i.e., . In practice this range is chosen to be of its nominal value. For future use, rewrite voltage regulation constraints at all buses in a compact way as
(3) 
In distribution grids, it holds that and when bus only has a capacitor; while , , when bus is a purely load bus; and a distributed generation bus not only consumes power denoted by , , but also generate active power , and provide negative or positive reactive power . Moreover, active power consumption and solar generation can be predicted through the hourly and realtime market (see e.g., [11]), or by means of running load demand (solar generation) prediction algorithms [20].
Iii Problem Formulation
In the envisioned distribution network operation scenario, active power is controlled at a coarse timescale. Depending on the variability of active power and cyber resources (sensing, communication, and computation delays), reactive power compensation occurs over time intervals indexed by , which could either be realtime market periods, e.g., minutes, or even shorter, e.g., seconds. Let denote the active and reactive power injections at all nonroot buses during control period . The total power loss across all distribution lines can be expressed as . Given load consumptions and generation at the beginning of each interval , the goal of reactive power control is to find feasible reactive power injections for smart inverters such that the power loss across all distribution lines is minimized while maintaining all bus voltage magnitudes within a prescribed range. Formally, the reactive power control problem is formulated as follows
(4) 
where admits the following form
(5a)  
(5b)  
(5c)  
(5d)  
(5e)  
(5f) 
Clearly, constraints (5b)–(5d) and (5f) are linear with respect to system variables . Nevertheless, constraints in (5e) are quadratic equalities, depicting a nonconvex feasible set and rendering the optimization problem nonconvex and NPhard in general [7].
To address this issue, these equalities in (5e) have been recently relaxed to convex inequalities described by the hyperbolic constraints [7]
(6) 
Substituting (6) into (5) yields
(7a)  
s.to  (7b)  
(7c) 
where (7c) can also be equivalently expressed as a secondorder cone
(8) 
Constraints (7b) and (7c) represent now a convex feasible set, and the problem in (7) can be solved by standard convex programming methods. Interestingly, it has been shown that under certain conditions, at the optimal solution of (7), equalities are attained in (8); see details in e.g., [gan2012exactnessconvex]. In this case, the optimal solution of the original problem (5) is recovered too.
It is worth pointing out that problem (4) formally characterizes the optimal reactive power control policies for a diverse set of networked control problems, including e.g., voltage regulation, Volt/VAR control, and optimal power flow [gan2012exactnessconvex], by choosing suitable objective functions. If active and reactive power injections were both known precisely in advance and remained constant within period , the optimal reactive power compensation would be found by solving (4). However, such conditions are hardly met in contemporary distribution systems, due partly to i) timevarying active and reactive injections; and, ii) noisecontaminated observations caused by direct measurements, delayed estimates, or inaccurate forecasts. To bypass these challenges, minimizing the averaged power loss over the power injections provides an alternative to the static reactive power control formulation in (4), given by
(9) 
For notational convenience, let us define the state vector
, which is assumed to be a stationary random process, and rewrite the loss function
as . Substituting this display into the original problem (9), yields(10) 
Rather than the unreliable and possibly obsolete instantaneous found through (4), problem (10) is expected to yield smoother power control decisions. But, evaluating the expectation in (10
) is nearly impossible in practice, even if the probability density function of
was known. Challenge also comes from the computational burden of dealing with the nonconvex constraint (5e). To approximate in a computationally efficient manner, a statistical learning approach is developed next.Iv Statistical Learning
The rapid growth in renewable generation is displacing traditional forms of energy generation while increasing the need for controllable and flexible resources to balance fluctuations in load and generation. In this section, we introduce a novel parameterization form of the reactive power control problem, as well as a learning solver based on a deep neural network.
Iva Parameterization
Instead of solving (10) exactly, consider a parametrization for the reactive power compensation as follows
(11) 
where is some function given by e.g., a deep neural network, and collects all unknown parameters. Building on this, finding the optimal reactive power control in (10) boils down to finding the optimal parameter vector , such that the expected loss is minimized; that is,
(12) 
To find , a natural approach is to apply gradient descent type algorithms. To this aim, one needs to obtain the gradient of the objective function in (12) with respect to , i.e., . In practice however, there is no analytic form of as a function of or . In (5), for instance, the loss function depends only implicitly on . Instead, we can observe the function value for any grid operating point [cf. (5)], which can be used to estimate the gradient. This motivates development of a modelfree approach [5]. Specifically, for a given set of iterates and reactive power realizations , the corresponding loss function values can be observed from the system. Using and , the parameter vector can be updated through the policy gradient method [15], which constructs a gradient estimate with only function observations.
A control policy here is a mapping from state vectors to reactive power control decisions (a.k.a. actions) . Consider first the stochastic control policy , specifying a conditional distribution of all possible decisions given the current state . Denoting the probability of taking action at state as , the gradient of with respect to can be written as
(13a)  
(13b)  
(13c)  
(13d) 
where denotes the probability of state , and is drawn from the distribution . Here, the computation of is translated to evaluating the expectation of function multiplied by the gradient of the policy distribution . This is indeed useful when we have an analytic form for . In such case, we may further replace the expectation on the righthand side (13) with a sample mean. Specifically, by using previous function observations, we obtain the following gradient estimate
(14) 
where is the injected reactive power into the distribution grid, drawn from the distribution , and is the corresponding observed loss function value obtained by solving (5).
Previously, it was assumed that the policy is stochastic. In deterministic cases, where the distribution is a delta function, i.e., . To evaluate in (14), one may approximate the delta function with a known density function centered around . To capture the power constraint
, a truncated Gaussian distribution with a fixed support on the domain
is considered in next subsection.IvB Modelfree learning
To find the policy , we restrict ourselves to the increasingly popular set of parameterizations, known as deep neural networks [8]
. Indeed, deep neural networks have recently demonstrated remarkable performance in numerous fields, including computer vision, speech recognition, and robotics. A deep neural network can effectively tackle the ‘curse of dimensionality’ by extracting lowdimensional representation for highdimensional data
[8].Consider a feedforward deep neural network connected to a truncated Gaussian probability density function block; see Fig. 2 for an illustration. It takes as input the state vector , followed by
fully connected hidden layers with ReLU activation functions. The output of the deep neural network is a set of mean and standard deviation pairs
, each corresponding to truncated Gaussian distributions. By feeding the outputs of the deep neural network into the probability density function block, the reactive power compensation vector is sampled from . Stacking all the weights of the deep neural network into the vector , we have a function approximation to estimate the reactive power compensation .Using the gradient estimate in (14), the weights can be successively updated as follows
(15) 
where is a preselected learning rate. This update in (15) is a modelfree approach, since it does not require explicit knowledge about the actual form of the function or distribution of . Different from a traditional supervised approach where requires a set of a given training labeled data [20], the developed method here is unsupervised; hence circumvents the need for labeled data and directly solves (10).
The proposed reactive power control procedure is tabulated in Alg. 1. It is implemented in two phases, namely offline training and online inference phases, as shown in Fig 3. Specifically, in the training phase, historical/simulated datum is fed into the deep neural network. For a given input datum , our network spits out a reactive power compensation . Subsequently, the distribution network returns a loss for this stateaction pair (which can also be found by solving (5)). Finally, a gradient estimate can be obtained using the policy gradient method in (14), based on which the neural network weight parameters are updated following (15). The trained deep neural network will be utilized in the inference phase. By taking the realtime state vector as input, the trained deep neural network outputs the optimal reactive power compensation to be implemented in the grid. Note that the proposed statistical learning approach is desirable for realtime reactive power control, as it shifts the computational burden of tackling nonconvex optimization to offline training of a neural network.
V Numerical Tests
In this section, the performance of our proposed statistical learning scheme was evaluated on a realworld bus feeder with high penetration of renewables [6]; see Fig. 4. This feeder is integrated with smart inverters located on buses , , , , and , with capacities , , , , and kW, respectively. A power factor of was assumed for all loads.
The training and test data were obtained by splitting the consumption and solar generation from the Smart project collected on August 24, 2011 [3]. The CVX toolbox [9] was used to solve the SOCP problem in (7) to evaluate . The deep neural network used here consists of three fully connected hidden layers, with , and neurons per layer, respectively. To carry out the simulations, we used ‘TensorFlow’ [1] on an NVIDIA Titan X GPU with 12 GB RAM. The weight parameters of the deep neural network were updated using the backpropagation algorithm with ‘Adam’ optimizer. The learning rate was fixed to , and the batch size was throughout epochs of tests.
To assess the performance of the proposed approach, the following baseline was considered. Assuming perfect observations of active and reactive power injections at the beginning of slot , the optimal reactive power control can be found by solving the following problem
(16a)  
s.to  (16b) 
where is treated as an optimization variable. It should be noted that tackling this problem in realtime is computationally expensive, while the proposed approach finds after performing only several matrixvector multiplications. The red curve in Fig. 5 shows the observed loss for the proposed approach, while the blue one depicts loss for the deterministic optimal one obtained via (16) during the training phase. The light colour curves correspond to the actual observed losses, while the dark ones are the running averaged ones. Clearly, our modelfree approach learns to make optimal decisions . In the inference phase, the loss of the proposed approach versus the baseline is presented in Fig. 6. This plot demonstrates that the proposed modelfree approach finds nearoptimal reactive power control decisions. The running time of the proposed approach is one order of magnitude less than the optimizationbased approach.
Vi Conclusions
In this work, a statistical learning framework for reactive power control in distribution grids was developed. Uncertainties and delays in acquiring grid state motivate well this learning framework. The nonconvexity of the underlying optimization, and lack of model knowledge makes reactive power control a challenge in modern grids, if not impossible, to solve directly. The theory of statistical learning empowered by nonlinear functional approximation property of deep neural networks provided a fresh viewpoint to solve this problem. In particular, this work modeled the reactive power control policy via a deep neural network. The weights of the deep neural network were updated in an unsupervised and modelfree fashion, circumventing the need for labeled data as well as an explicit model for the system. Our proposed method is computationally inexpensive, since all computational complexity is shifted to the training phase. Preliminary numerical results on the realworld bus distribution network using real load data corroborate the merits of our developed approach. This work opens up several interesting directions for future research. Robust methods for reactive power control in the presence of corrupted or adversarial observations is worth investigating. Exploiting topology of the power grid to design physicsinformed architecture is also pertinent.
References
 [1] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” arXiv:1603.04467, 2016.
 [2] M. Baran and F. F. Wu, “Optimal sizing of capacitors placed on a radial distribution system,” IEEE Trans. Power Del., vol. 4, no. 1, pp. 735–743, Jan. 1989.
 [3] S. Barker, A. Mishra, D. Irwin, E. Cecchet, P. Shenoy, and J. Albrecht, “Smart*: An open data set and tools for enabling research in sustainable homes,” SustKDD, vol. 111, no. 112, p. 108, Aug. 2012.
 [4] P. M. Carvalho, P. F. Correia, and L. A. Ferreira, “Distributed reactive power generation control for voltage rise mitigation in distribution networks,” IEEE Trans. Power Syst., vol. 23, no. 2, pp. 766–772, April 2008.
 [5] M. Eisen, C. Zhang, L. F. Chamon, D. D. Lee, and A. Ribeiro, “Learning optimal resource allocations in wireless systems,” IEEE Trans. Signal Process., vol. 67, no. 10, pp. 2775–2790, 2019.
 [6] M. Farivar, C. R. Clarke, S. H. Low, and K. M. Chandy, “Inverter VAR control for distribution systems with renewables,” in Proc. of SmartGridComm., Brussels, Belgium, Oct. 2011, pp. 457–462.
 [7] L. Gan, N. Li, U. Topcu, and S. Low, “On the exactness of convex relaxation for optimal power flow in tree networks,” in Proc. of Conf. Decision and Control, Maui, HI, USA, Dec. 2012, pp. 465–471.
 [8] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA: MIT press, 2016.
 [9] M. Grant and S. Boyd, “CVX: Matlab software for disciplined convex programming, version 2.1,” 2014.
 [10] M. Jalali, V. Kekatos, N. Gatsis, and D. Deka, “Designing reactive power control rules for smart inverters using support vector machines,” arXiv:1903.01016, 2019.
 [11] V. Kekatos, G. Wang, A. J. Conejo, and G. B. Giannakis, “Stochastic reactive power management in microgrids with renewables,” IEEE Trans. Power Syst., vol. 30, no. 6, pp. 3386–3395, Dec. 2015.

[12]
W. Lee, M. Kim, and D. Cho, “Deep power control: Transmit power control scheme based on convolutional neural network,”
IEEE Commun. Lett., vol. 22, no. 6, pp. 1276–1279, June 2018.  [13] W. Lin, R. Thomas, and E. Bitar, “Realtime voltage regulation in distribution systems via decentralized PV inverter control,” in Proc. of Hawaii Intl. Conf. System Sciences, Waikoloa Village, Hawaii, Jan. 26, 2018.
 [14] O. Sondermeijer, R. Dobbe, D. Arnold, C. Tomlin, and T. Keviczky, “Regressionbased inverter control for decentralized optimal power flow and voltage regulation,” arXiv:1902.08594, 2019.
 [15] R. S. Sutton, D. A. McAllester, S. P. Singh, and Y. Mansour, “Policy gradient methods for reinforcement learning with function approximation,” in Proc. of Adv. Neural Inf. Process. Syst., 2000, pp. 1057–1063.
 [16] G. Wang, V. Kekatos, A. J. Conejo, and G. B. Giannakis, “Ergodic energy management leveraging resource variability in distribution grids,” IEEE Trans. Power Syst., vol. 31, no. 6, pp. 4765–4775, Nov. 2016.
 [17] Q. Yang, G. Wang, A. Sadeghi, G. B. Giannakis, and J. Sun, “Twotimescale voltage control in distribution grids using deep reinforcement learning,” arXiv:1904.09374, 2019.
 [18] A. Zamzam and K. Baker, “Learning optimal solutions for extremely fast AC optimal power flow,” arXiv:1910.01213, 2019.
 [19] A. S. Zamzam and N. D. Sidiropoulos, “Physicsaware neural networks for distribution system state estimation,” arXiv:1903.09669, 2019.
 [20] L. Zhang, G. Wang, and G. B. Giannakis, “Realtime power system state estimation and forecasting via deep unrolled neural networks,” IEEE Trans. Signal Process., vol. 67, no. 15, pp. 4069–4077, Aug. 2019.
 [21] Y. Zhang, M. Hong, E. Dall’Anese, S. V. Dhople, and Z. Xu, “Distributed controllers seeking AC optimal power flow solutions using ADMM,” IEEE Trans. Smart Grid, vol. 9, no. 5, pp. 4525–4537, Sept. 2018.
 [22] Y. Zhao, J. Chen, and H. V. Poor, “A learningtoinfer method for realtime power grid multiline outage identification,” IEEE Trans. Smart Grid, pp. 1–1, 2019.
 [23] H. Zhu and H. J. Liu, “Fast local voltage control under limited reactive power: Optimality and stability analysis,” IEEE Trans. Power Syst., vol. 31, no. 5, pp. 3794–3803, Dec. 2016.
Comments
There are no comments yet.