DER Forecast using Privacy Preserving Federated Learning

by   Venkatesh Venkataramanan, et al.

With increasing penetration of Distributed Energy Resources (DERs) in grid edge including renewable generation, flexible loads, and storage, accurate prediction of distributed generation and consumption at the consumer level becomes important. However, DER prediction based on the transmission of customer level data, either repeatedly or in large amounts, is not feasible due to privacy concerns. In this paper, a distributed machine learning approach, Federated Learning, is proposed to carry out DER forecasting using a network of IoT nodes, each of which transmits a model of the consumption and generation patterns without revealing consumer data. We consider a simulation study which includes 1000 DERs, and show that our method leads to an accurate prediction of preserve consumer privacy, while still leading to an accurate forecast. We also evaluate grid-specific performance metrics such as load swings and load curtailment and show that our FL algorithm leads to satisfactory performance. Simulations are also performed on the Pecan street dataset to demonstrate the validity of the proposed approach on real data.



There are no comments yet.


page 1

page 2

page 3

page 4


Privacy Preserving Demand Forecasting to Encourage Consumer Acceptance of Smart Energy Meters

In this proposal paper we highlight the need for privacy preserving ener...

FederatedNILM: A Distributed and Privacy-preserving Framework for Non-intrusive Load Monitoring based on Federated Deep Learning

Non-intrusive load monitoring (NILM), which usually utilizes machine lea...

A Federated Learning Framework for Non-Intrusive Load Monitoring

Non-intrusive load monitoring (NILM) aims at decomposing the total readi...

Federated Learning for Short-term Residential Energy Demand Forecasting

Energy demand forecasting is an essential task performed within the ener...

Consumer Privacy Protection using Flexible Thermal Loads

Due to the increasing adoption of smart meters, there are growing concer...

Optimizing Smart Grid Aggregators and Measuring Degree of Privacy in a Distributed Trust Based Anonymous Aggregation System

A smart grid is an advanced method for supplying electricity to the cons...

Federated Learning with Hyperparameter-based Clustering for Electrical Load Forecasting

Electrical load prediction has become an integral part of power system o...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Internet-of-Things (IoT) is becoming attractive in a wide range of applications in energy, transportation, healthcare, manufacturing, and others. The rapid adoption of IoT and IoT-networks are leading to an unprecedented growth in the volumes of data that are generated by these devices. Juniper Research forecasts that the total number of IoT device endpoints will hit 83 billion by 2024 [18]. Specifically, utilities are expected to one of the highest users, with 1.37 billion endpoints. Gartner Inc.’s report also states that “Electricity smart metering, both residential and commercial will boost the adoption of IoT among utilities” [11]. Cloud computing has been proposed as a method for storing and analyzing such large volumes of data, due to the several advantages such as cost efficiency, and computing and storage capabilities [33]. However, a pure centralized cloud-based data storage and analytics approach becomes unrealistic due to the ever-rising data privacy concerns. General Data Protection Regulation (GDPR) in Europe lays out strict guidelines for users’ data privacy, while similar laws are present in US Consumer Privacy Bill of Rights [30]. Distributed methods that can efficiently and privately communicate the desired information and enable decisions are becoming more and more attractive.

In addition to privacy, other challenges that limit a centralized approach for analyzing IoT-data are the need for fast processing, low latency and sufficient bandwidth of the underlying communication network [37]. The cloud data centers are often deployed in locations that are far from those of data owners leading to high latency in communication, and insufficient time for real-time operation. At the same time, enabling technologies such as edge computing, wherein edge nodes such as smartphones, sensor, micro servers, autonomous vehicles and home gateways are increasingly becoming smarter [33]. This in turn implies that a distributed framework with fast computing as well as fast and reliable communication to other agents in the network is becoming a feasible and viable alternative.

The focus of this paper is the application of IoT networks in forecasting of DERs in distribution systems while ensuring privacy and security for the users. In this paper, we define DERs as electricity producing assets such as solar PV and other distributed generation (DG), and controllable loads that are capable of providing a grid function [29]. In general, distribution system operators have limited visibility into their systems, as very few measurements are available. As DER penetration grows, this limitation becomes even more of a concern and can compromise power balance of supply and demand and therefore grid reliability. Load forecasting is an essential in distribution system operation, with long-term forecasting necessary for planning studies while mid-term and short-term load forecasting are key in day-to-day operations [26]. However with the increase in DERs short-term load forecasting (STLF) has been proven to be a challenging task because of increased volatility. DG forecasting has also experienced similar problems, with prediction accuracy still posing challenges during operation [39]. State-of-the-art benchmarks [1, 7]

have found that deep neural networks (DNN) are a promising solution for the STLF problem at the household level, due to their ability to capture complex and nonlinear patterns. However, neural networks require a lot of accurate and diverse training data to accurately capture these nonlinear behaviors, which is a challenge at the consumer level.

The challenges of privacy as well as the requirement of large training data can be met using a distributed machine learning paradigm, Federated Learning (FL) [40, 23, 24, 13]

. FL is a machine learning framework where each device participates in training a central model without sending actual data, but only exchanges gradient information in training phase and sends prediction estimates in deployment. Current state of the art such as 

[35, 32] requires that all data records are transferred from smart meters to a centralized computational infrastructure through communication networks for the training of the ML models. Consumer data is typically privately owned, and sharing of this sensitive behavioral data might have negative consequences. A solution based on FL and the use of IoT networks overcomes this hurdle of revealing consumer data to third parties. A lack of diverse training data often leads to overfitting [17] while deploying ML approaches. This can also be overcome using the FL method by accessing data from different nodes in a diverse IoT network such as data from several home energy managers, DGs, and Electric Vehicles. A general overview of the DER prediction process is shown in Fig. 1.

Fig. 1: An overview of the DER prediction process using federated learning

The two most common issues that arise when it comes to IoT networks are privacy and security. In the context of the latter, a compelling example that has been cited in the literature is the MIRAI botnet attack [3] which demonstrates that a power grid disruption can occur when a large number of IoT devices are compromised. In this paper, we focus our attention on privacy. The equally important issue of security with the use of IoT networks is not addressed in this paper.

The contributions of this paper are as follows: (i) the development of a privacy-preserving algorithm based on Federated Learning, that allows exchange of information between IoT nodes and a global server leading to useful decisions, (ii) validation of the proposed algorithm using a simulation study of 1000 IoT nodes, with each node representing DERs such as home energy managers, electric vehicles, and solar photovoltaics, (iii) demonstration of grid services such as prediction of load swings and load curtailments based on the FL-based DER forecast, and (iv) validation of the proposed approach using real field data, Pecan street dataset [31].

The overall organization of this paper is as follows. In Section II, related work is presented. The DER prediction approach using FL and neural networks is described in Section III. In Section IV, the simulation setup for validating the proposed approach is described. Simulation scenarios for demonstrating the accuracy of the proposed approach, as well as validation using real world data is presented in Section V. Finally, we wrap up with some discussion and conclusion in Section VI.

Ii Related work

In what follows, we summarize the current state of the art grouping the related work into three different categories, DER prediction, Federated Learning, and IoT in smart grids.

Ii-a DER prediction

Apart from forecasting at the bulk energy level and at the substation level, recent efforts of grid modernization has led to an increased focus of load forecasting closer to the consumer [35, 15, 38, 22, 21, 16]. The need for load forecasting at the consumer level is driven by the increasing number of DERs, and the need for finer tuned control of resources that DER penetration necessitates. The authors of [35] apply a clustering technique to provides a forecast for aggregated residential load based on the practice theory of human behavior. Similar clustering techniques have been adopted for a day-ahead prediction successfully [15]. The authors of [38] focus on the load forecasting of a residential building with a one-hour resolution, while the authors of [22] forecast at both individual and aggregate levels with half-hourly data. Thanks to monitoring of the household appliances by separate meters, single customer forecasting improves through correlations between specific appliances [21]. In [16], the authors show that the spatial correlations between different appliances used in a household can be leveraged to increase the accuracy of single household load forecasting.

The studies listed above focus only on load forecasting, with a particular focus at residential level. An aggregate level forecasting has begun to be attempted only recently [19, 20, 34]. The authors of [19] propose a forecasting model at feeder level which estimates the PV penetration, and then integrates this information into load forecasting [19] for considering different PV penetration scenarios at the aggregate level to achieve an effective demand-side management approach. The authors of [20] investigate the short-term net energy forecasting for a micro-neighborhood consisting of 75 single houses, with 15 minutes’ temporal resolution data. Reference [34] illustrates the use of aggregated net energy forecasting in the context of a secure energy trading platform. These studies reveal that the PV generation behind the meter increases the uncertainty, which in turn, the complexity of the net energy forecasting problem even at low-aggregate level. Much of the above literature, however, does not address the problem of user privacy when dealing with load prediction either at the consumer-level or at the aggregated level.

Ii-B Federated learning

Federated learning (FL) is a new machine learning paradigm that trains the ML algorithm in a distributed fashion, allowing the user data to remain local. This opens up new applications for ML algorithms, where privacy is paramount. However, several challenges still need to resolved before FL can be successfully applied to power grid problems. The reader is referred to  [40, 23, 24] for surveys on FL approaches. References [23] and [24] discuss the performance of FL as compared to ML, which becomes important for critical infrastructure applications, as accuracy and implementation constraints undergo greater scrutiny for such applications.

The studies listed above provide a general overview of the federated learning process. The work in [13, 36], on the other hand, address the specific problem as in this paper, the application of FL to load forecasting. Authors in [13] adopt the FL framework with a well designed parameter server-client architecture, and apply this architecture for estimating HVAC performance within a building. In this paper, we utilize an architecture very similar to that in [13]

for grid-wide services such as DER forecasting and load swing prediction. We pay close attention to how the neural network is designed and how its hyperparameters are selected so as to meet the constraints of the data and the overall problem of DER prediction. Authors in 

[36] also address load forecasting using FL, with a focus on communication efficiency of edge equipments as well as personalization of the local data at the global model so as to ensure accurate forecasting at the household level. In contrast, we focus in this paper on grid-wide needs at the distribution grid, and develop a new FL implementation, and show how the FL algorithm can be trained so as to ensure privacy of consumer data and at the same time lead not only to accurate DER forecasts but also desired grid-services.

Ii-C IoT device application in smart grids

IoT devices are experiencing an exponential growth in all sectors including transportation and smart cities [12]. With much of the innovation in smart grids occurring at the grid edge, IoT is poised to play a major role in enabling dynamic power balance at all points of the grid, especially at the distribution grid level. Much of the existing literature focuses on applications of IoT devices to smart homes, demand response, and related smart communities based applications [25, 27, 41, 4, 5]. IoT devices have already become ubiquitous at the home level, with the proliferation of smart home devices such as Nest thermostats, and platforms such as Google Home and Alexa [25]. IoT devices have been proposed in the management of extended outage conditions, such as automated fault location, isolating and service restoration (FLISR) in [27]. IoT devices also provide a way to determine the position of the defective parts, separates them, and applies switching task to recover the largest number of healthy part of the affected energy feeder by having increased sensors deployed in control devices [41]. Also, at the advanced level, this function can be developed by using self-healing methods that are able to activate the participation of the customers as well as of dispersed generation units [41]. Various real-world deployments detail the benefit of increased sensing using IoT devices for various applications ranging from traffic management to urban innovation [4]. Implementing these strategies leads to increase the reliability, power quality and profits [5]. Further applications of IoT technology in the power system domain can be found in the survey [5].

In the specific context of this paper relating to accurate DER forecast, IoT technology is poised to play a major role. The two major hurdles in its implementation are privacy and security. This paper pertains to the first, and proposes the use of Federated Learning so as to ensure privacy preservation and leverage the IoT technology to lead to accurate DER forecasts.

Iii DER prediction using Federated Learning

We begin with an overview of the neural network procedure to be utilized in FL. The underlying problem is a nonlinear mapping

between a vector of inputs

and an output . A neural network is an extraordinarily effective tool in learning this mapping, and constitutes a bulk of the ML approaches used for learning [39, 7, 22]. A typical process by which a neural network learns the nonlinearity is through a training and testing phase.

The training phase consists of a vector of inputs

collected at epoch

, each of which leads to an output . The total number of samples in epoch is defined as

. A typical neural network architecture, referred to as a deep-learning network consists of multiple layers, interspersed with activation function with the output

related to in the form indicated in (1).


where denotes an activation function,

is the number of neurons, and

are the weights of layer , is a bias term, and represents the input-output relation for a network with one hidden layer. The training procedure then consists of adjusting the weights as



corresponds to the gradient of a loss function

which is defined as,


where is a learning rate, and is a vector of all weights at an iteration at epoch . A typical loss function is a mean squared prediction error over samples which is defined as,


Through a repeated training of these weights, the output of the neural network is then allowed to approximate the true value, . The testing phase then consists of using a distinct set of , which is different from the training phase, and using the trained neural network to predict the corresponding output .

One can use the above neural network to forecast the power consumption (or generation) of a device (or a renewable generation source) in the following manner. Suppose denotes , the power-consumption forecast at time , and the input denotes a vector , where corresponds to a past instant relative to ; that is, the power consumption on a given day at a given hour may be correlated with power consumptions at the same hour during the previous day, the power consumption a few minutes before , or a combination thereof. The underlying relation between such a vector of previous consumptions before , , and the consumption at time can be viewed as a nonlinear relation . The neural network as in (1) is tasked with learning this mapping using the training procedure using which the weights are trained using several samples until the loss function falls below a threshold . That the neural network has indeed been satisfactorily trained is tested by fixing the neural network with the converged weights and evaluating the loss function for a distinct set of inputs not utilized for training and testing using the procedure as described above. We shall refer to this overall neural network as a global model . That is, the global model can perform the computations in (1)-(3) repeatedly to forecast the power consumption of a device . This is the typical procedure using in any ML that uses neural networks.

It should however be noted that the above procedure implies that data from device in the form of has to be sent to a global model repeatedly during the training process. That is, every device will then have to share the data for every and every device repeatedly. Given the pervasive and large number of IoT devices that generate this data, it is unrealistic to expect all users to consent to their data being accessed to create machine learning models in such a manner, especially if the devices are not owned by the utility. For this purpose, we introduce a variation in the ML training process, using the tenets of FL [40], and is described below.

Fig. 2: The schematic of neural network training using federated learning is shown here. The steps (1)-(6) are repeated until

In what follows we assume that all IoT devices under consideration can be grouped into two types, and , where corresponds to home energy managers that may manage a collection of home appliances including HVAC, smart thermostats, smart refrigerators, dishwashers, laundry machines, and other home appliances. Devices correspond to rooftop solar panels, EVs, and storage units. Both types of devices are assumed to be connected to the secondary feeder network, with the total number of devices given by and , with . The power consumed by and at time is given by and . We refer to each of these devices and as a federate for the FL process. During the training process, the adjustment of the weights of the neural networks proceeds described in (1)-(3) in the following manner.

The global model sends initial weights at time . The local model then generates several samples that corresponds to a range of time-instants as well as the resulting true output , and the predicted for the weights using (1). The corresponding loss function is then computed by the local model as in (4) and the gradient as in (3). The local model, which corresponds to federate , then sends this gradient to the global model . The global model then updates its weights as using the collection of all gradients from the federates as,


where is a pre-determined set of weights that combines all gradients. The global model then sends the updated weight back to each of federates . The federate in turn collects new input-output pairs using the updated weights using (1), computes the new loss function using (4), and the new gradient using (3). This new gradient is then sent by the federate to the global model , and the whole training process repeats. The output of the neural network corresponds to the predicted outputs or for federate , at time . The overall schematic of the neural network training using FL is presented in Fig. 2.

It should be noted that throughout the training process, each local model only sends the gradient to the global model, does not need to send . As a result, the actual data , , and the predicted data stays local to the federate . The only information revealed to the global model is and therefore preserves privacy of the device . In addition, the FL procedure described above is such that the global model is able to utilize a large variety of federates , and therefore leverage a rich variety of training data leading to be better prediction. Finally, our procedure is applicable in equal measure to both IoT devices of type and , and therefore includes both distributed generation and distributed loads.

1 Algorithm FL()
2       Set ;
3       For ;
4       Initialize weights at the global model ;
5       Send to each federate ;
6       LocalTraining();
7       Set ;
8       Compute using (5);
9       Set ;
10       Repeat until , a pre-determined tolerance;
11       return ;
12 FL
13 Procedure LocalTraining()
14       Set in (1);
15       of samples using (1);
16       Compute using (4);
17       Compute using (3);
18       return ;
Algorithm 1 Federated Learning Algorithm

A typical process of DER-forecast can occur in the following manner. Collect the input-output pair for a federate for several samples . An example of , where denotes the actual power consumption, and denotes the minutes prior to time . The number of samples =2880, obtained by collecting data every 15 minutes over a period of 30 days. The overall training procedure of the FL-based neural network is summarized in Algorithm 1.

A more simplified training procedure can be adopted, compared to Algorithm 1, and is utilized in the results reported in the subsequent sections. This is briefly described here. The total number of samples available each day for training is 96. Rather than use 96 samples from all 30 days, a day was chosen at random, and the computations at each Federate were carried out using the mini-batch of samples from that day. A tolerance of was chosen. A total of 150 epochs was found to be sufficient to achieve this desired tolerance. The neural network consisted of 2 hidden layers with 20 neurons in each layer. All of these hyperparameters of the simplified training procedure are shown in Table 1.

Hyper-parameter Result
Mini-batch size 96 samples
Learning rate 0.001
Maximum epoch 150
TABLE I: Neural network hyper-parameters

In the testing phase, the model performance is evaluated by testing with a new set of data from the next month, from a similar season to ensure that the model is still valid. In the case studies, we will use the root mean square error (RMSE), which coincides with the square root of the loss function defined in (4) to quantify the performance of the FL algorithm. That is,


Iv Simulation setup

To validate the concept of federated learning for privacy preserving load prediction, we create a numerical testbed that comprises of three components - (1) Power physics simulation, (2) Federated learning platform, and (3) Grid service model.

Iv-a Power physics simulation

The physical layer simulation needs to generate the IoT level data corresponding to the various grid components. Various physics simulators such as building models, GridLAB-D, and others can be used for this purpose. GridLAB-D  [8] provides detailed models for various power system components, with active ongoing upgrades to existing models. GridLAB-D uses an advanced algorithm to solve the power system simultaneously by solving the states for all the different devices at the same time, and not sequentially and therefore offers the flexibility to develop complex models, and implement user developed algorithms for various control purposes. In the context of the current problem, we will utilize GridLAB-D to generate physical data from the distribution system ranging from the primary feeders and secondary feeders, all the way to consumer buildings, and individual IoT devices such as HVAC units, EVs, and rooftop PVs. This makes the platform suitable to validate the overall forecast using the FL Algorithm 1.

Fig. 3: A simple peak shaving algorithm to minimize power swings

Iv-B Federated Learning platform

Fig. 4: Load prediction using Federated learning

To implement federated learning, we leverage the approach in  [13] denoted as the Building Federated Learning (BuildFL) platform. BuildFL adopts the popular parameter server (PS) architecture, in which a server is created to store the parameters of the ML model (in this case, the weights of the neural network) and then serves them to the clients, which are the federates . The federates creates a local update by training with the data set it collects and exchanges model parameters to the global model

. In BuildFL, building sites are workers and a cloud server is created to serve as the parameter server. The BuildFL platform integrates machine learning libraries such as PyTorch, which is mainly used for computer vision or natural language processing, and its library only supports gradient-based models. To support other models that facilitate building analytics, such as bagging and boosting algorithms, BuildFL integrates Scikit-Learn, which includes the library of models such as random forest, boosting tree, etc.

BuildFL provides a uniform function template for distributed training of different models. As detailed in Section II, a DNN with two hidden layers was chosen as the ML architecture in this paper. The hyper-parameters of the neural network have been defined in Table I

and stochastic gradient descent (SGD) 

[6] is chosen as the optimization algorithm to update the weights.

Iv-C Grid service model

In order to demonstrate the impact of accurate DER forecast using our proposed FL method, we carry out a particular grid service using the forecast, which corresponds to predicting load swings and curtailing them using a peak shaving algorithm. Such a load swing prediction is of growing importance with increased DER penetration, as evidenced by the focus on the ”duck curve” problem [9]. This problem corresponds to the large unserved load and its increase over a short period of time as solar based resources disappear, which occurs on a daily basis at dusk. We assume that a certain percentage of the loads are flexible and directly controlled by the utility, and that once these loads are accurately forecasted, then they can be commanded to follow a particular command signal. One such algorithm is described in Fig. 3, and we will utilize this to demonstrate the impact of the FL-based DER forecast described in Section II.

Fig. 5: Time of maximum power swings

The details of the load swing prediction proceed as follows. First, a threshold value is computed by generating a time-series , which corresponds to power consumption change in over , and determining the 90th percentile of over a 24-hour period. A load swing is said to occur at if . One can therefore predict load swings using the power consumption forecast that is carried out using the FL method described above by computing and determining load swings using . Once these load swings are predicted, the next step in the grid-service model is to command a flexible load to be curtailed as described in Fig. 3. In the next section, we will demonstrate this grid service using data obtained from 1000 nodes.

Fig. 6: Reduction in power swings in consecutive days

V Use-cases

In this section, we validate the proposed FL-method for DER forecast using a use-case of a distribution feeder with 1000 nodes. The goal is to (i) predict over a (a) 24-hour period the power consumption at time of 1000 nodes at 15 minutes prior to , i.e., at for all 1000 nodes (b) the number of load swings of these nodes at also , and (ii) curtail these load swings at some of these nodes using a peak shaving procedure detailed in Fig. 3

. GridLAB-D is used to simulate the distribution feeder with 1000 nodes, each of one was modeled to emulate a distribution feeder node. The GridLAB-D model was built as one distribution feeder with 1000 nodes, and the load shapes were synthetically generated. We accomplished this by starting from a single default load profile from GridLAB-D, and created the other 999 load profiles synthetically using a standard Gaussian distribution (zero mean,

) around this single load profile. Algorithm 1 was then employed to predict the power consumption of each of those nodes. The input vector was chosen to be a six-dimensional vector as in Section II, with  = 15min,  = 30min,  = 60min,  = 90min,  = 120min, and  = 1440min.

The results from the load prediction and the peak shaving procedure are presented below. Finally, an evaluation of this method is carried out using a realistic dataset from the Pecan Street Inc. Dataport [31].

V-a Prediction of load

We illustrate in Fig. 4 the prediction of the power consumption of a node over a 24-hour period. This prediction (shown in orange) is compared with the actual consumption (shown in blue). The prediction and data correspond to a node picked at random out of 1000 nodes. The training occurred over several epochs of data collected every 15 minutes over a 30 day period. The results illustrate the accuracy of the FL approach in estimating the power consumption by the nodes. The prediction accuracy is observed to improve with larger amounts of training data, and sensitive to the hyper-parameters. Weekends are observed to have a significant difference over weekday consumption, which is a characteristic of the nature of the chosen distribution feeder. An RMSE was calculated as in (6) and averaged for all the 1000 nodes and was found to be 1.642, validating the accuracy of the proposed FL method.

We illustrate the usefulness of the power consumption prediction using a second performance metric. For this purpose, we first plot the average number of load swings for the 1000 nodes over a 30 day period in Fig. 5, with a load swing as defined above in Section III. The X-axis represents the time of day at which the load swings happen over a 24 hour period, and the Y-axis represents the average number of swings that occur at that time over a month. The orange curve represents the actual number of load swings determined from the GridLAB-D data, while the blue curve represents the predicted load swings from the FL procedure. It is observed from the figure that the FL procedure is able to predict the time of swings accurately, as evidenced by the peaks occuring at the same time in both curves, even though with an under-prediction in the magnitude. More sophisticated procedures for defining load swings may help improve this prediction.

Fig. 7: Comparison of prediction and actual data from Pecan street dataset for a week long period

V-B Reduction in power swings

We now utilize the load-swing prediction to carry out a load-curtailment, assuming that flexible loads that are compatible with Demand Response are present among the 1000 nodes. The threshold was set to 31kW based on the 90th percentile of all power consumption over the 1000 nodes over a 2-month period. When at , if it was observed that at a particular node, a direct load control signal was sent to curtail the power consumption at that node by enforcing a reduction of 31kW, as indicated in Fig. 3. Figure 6 illustrates the result of such a peak-shaving algorithm over a 24 hour period. The orange bar represents the power swing without any load curtailment, while the blue bar represents the load swings after peak shaving. While the results indicate an overall reduction in the number of load swings on most days, it clearly illustrates that the power prediction is not able to mitigate all swings from happening. The negative ticks are also a result of an over-prediction of the power swings.

We have made an assumption here that a peak shaving algorithm as in Fig. 3 can be implemented, in deriving a grid performance as shown in Fig. 6. This requires not only the loads to be flexible but also curtailable when demanded by the utility. Such a direct load control may not always be possible and may require a transactive control architecture with suitable incentives or rebate programs [2].

V-C Pecan street validation

The FL procedure is also tested on real world data from Pecan street. Pecan street is a project that provides data of household consumption and DER behavior from various locations such as Austin [31]. As in the earlier Gridlab-D exercise, here too the neural network is trained for using data from a single building collected specifically over 30 weekdays, and then the trained model is used to predict the load for a random weekday in the next month. Fig. 7 shows the comparison between the actual load consumption and the prediction, and the RMSE value for the model’s prediction is 1.98, once again demonstrating the accuracy obtainable using the proposed FL method. A similar procedure can be adopted for predicting power consumption over weekends by suitably collecting training data over weekends rather than weekdays, over a period of time.

V-D Discussion

It is important to note the effect of selecting the right features on the accuracy of the trained neural network model [28, 10, 14]. The underlying data associated with the DER demonstrates a significant periodicity, which is evident in Fig. 4 and Fig. 6. It is important to pick appropriate features of the historical data to ensure that the model estimates the non-linearities properly. Authors in [28] have made such an observation that a choice of similar days for training drastically improves the forecast accuracy. Previous studies have shown that filtering out weekends, public holidays, and other out-of-normal consumption patterns can also significantly improve prediction accuracy [14]. In the Pecan street experiment in particular, we have only focused on weekdays, and filtered out other holidays from the data. When such a filtering was not included, the prediction error was observed to significantly increases, to as much as 30%. In addition to historical data, other features can also be used to improve the prediction accuracy, such as weather, ambient temperature, wind speed, and price of electricity (especially with demand response).

The accuracy of the FL approach has been observed to be somewhat less accurate than the traditional centralized ML one [23, 13]. We have not carried out a comparison of the proposed FL approach with a general ML in this paper, as our objective was to demonstrate a DER prediction mechanism that ensures user’s privacy. Further research and experimentation is required to better explore the trade-offs between privacy and accuracy in an FL method.

Vi Summary and Concluding Remarks

In this paper, we have considered the problem of DER prediction in a distribution grid at the consumer level using IoT nodes. Each IoT node is assumed to represent DERs such as renewable resources (ex. solar PV), storage such as electric vehicles, and home energy managers consisting of an aggregated set of flexible loads such as HVAC and thermostatically controlled appliances. The problem of accurate DER prediction is of paramount importance in distribution grids, not only because distribution systems have limited sensing capability but also because they can directly impact grid-specific performance such as power balance. The challenge, however, is that typical approaches used for DER forecast are data-centric and require consumers to share their consumption/generation data that can compromise their privacy. The contribution of this paper is a distributed algorithm that is based on Federated Learning which transmits a model of the consumption and generation patterns without revealing consumer data. We have described in this paper this privacy-preserving algorithm in detail, and carry out its validation using a simulation study of 1000 IoT nodes, leading to a forecast with an RMSE of 1.3. We have also demonstrated grid services such as prediction of load swings and load curtailments based on the forecast. Finally, we validated the proposed approach using real field data, with an RMSE of 1.98. Future research will involve a more detailed incorporation of grid physics such as optimal power flow and DER constraints such as adjustable loads, communication efficiencies of FL, and market structures with transactive energy and related incentives.


  • [1] A. Almalaq and J. J. Zhang (2018) Evolutionary deep learning-based energy consumption prediction for buildings. IEEE Access 7, pp. 1520–1531. Cited by: §I.
  • [2] A. Annaswamy and T. Nudell (2017) Transactive control–what’s in a name. Smart Grid Newsletter. Cited by: §V-B.
  • [3] M. Antonakakis, T. April, M. Bailey, M. Bernhard, E. Bursztein, J. Cochran, Z. Durumeric, J. A. Halderman, L. Invernizzi, M. Kallitsis, et al. (2017) Understanding the mirai botnet. In 26th USENIX security symposium (USENIX Security 17), pp. 1093–1110. Cited by: §I.
  • [4] H. Arasteh, V. Hosseinnezhad, V. Loia, A. Tommasetti, O. Troisi, M. Shafie-khah, and P. Siano (2016) Iot-based smart cities: a survey. In 2016 IEEE 16th International Conference on Environment and Electrical Engineering (EEEIC), Vol. , pp. 1–6. External Links: Document Cited by: §II-C.
  • [5] G. Bedi, G. K. Venayagamoorthy, R. Singh, R. R. Brooks, and K. Wang (2018) Review of internet of things (iot) in electric power and energy systems. IEEE Internet of Things Journal 5 (2), pp. 847–870. External Links: Document Cited by: §II-C.
  • [6] L. Bottou (2012) Stochastic gradient descent tricks. In Neural networks: Tricks of the trade, pp. 421–436. Cited by: §IV-B.
  • [7] S. Bouktif, A. Fiaz, A. Ouni, and M. A. Serhani (2018)

    Optimal deep learning LSTM model for electric load forecasting using feature selection and genetic algorithm: comparison with machine learning approaches

    Energies 11 (7), pp. 1636. Cited by: §I, §III.
  • [8] D. P. Chassin, K. Schneider, and C. Gerkensmeyer

    GridLAB-d: an open-source power systems modeling and simulation environment

    In 2008 IEEE/PES Transmission and Distribution Conference and Exposition, Cited by: §IV-A.
  • [9] P. Denholm, M. O’Connell, G. Brinkman, and J. Jorgenson (2015) Overgeneration from solar energy in california. a field guide to the duck chart. Technical report National Renewable Energy Lab.(NREL), Golden, CO (United States). Cited by: §IV-C.
  • [10] G. M. U. Din and A. K. Marnerides (2017) Short term power load forecasting using deep neural networks. In 2017 International Conference on Computing, Networking and Communications (ICNC), Vol. , pp. 594–598. External Links: Document Cited by: §V-D.
  • [11] Gartner Inc. (2019) Gartner says 5.8 billion enterprise and automotive iot endpoints will be in use in 2020; available at Cited by: §I.
  • [12] J. Gubbi, R. Buyya, S. Marusic, and M. Palaniswami (2013) Internet of things (iot): a vision, architectural elements, and future directions. Future generation computer systems 29 (7), pp. 1645–1660. Cited by: §II-C.
  • [13] Y. Guo, D. Wang, A. Vishwanath, C. Xu, and Q. Li (2020) Towards federated learning for HVAC analytics: a measurement study. In Proceedings of the Eleventh ACM International Conference on Future Energy Systems, pp. 68–73. Cited by: §I, §II-B, §IV-B, §V-D.
  • [14] W. He (2017) Load forecasting via deep neural networks. Procedia Computer Science 122, pp. 308–314. Cited by: §V-D.
  • [15] T. Hong and S. Fan (2016) Probabilistic electric load forecasting: a tutorial review. International Journal of Forecasting 32 (3), pp. 914–938. Cited by: §II-A.
  • [16] Y. Hong, Y. Zhou, Q. Li, W. Xu, and X. Zheng (2020) A deep learning method for short-term residential load forecasting in smart grid. IEEE Access 8 (), pp. 55785–55797. External Links: Document Cited by: §II-A.
  • [17] Y. Huang (2009) Advances in artificial neural networks–methodological development and application. Algorithms 2 (3), pp. 973–1007. Cited by: §I.
  • [18] Juniper Research (2020) IoT-the internet of transformation 2020; available at Cited by: §I.
  • [19] Z. A. Khan and D. Jayaweera (2019) Smart meter data based load forecasting and demand side management in distribution networks with embedded pv systems. IEEE Access 8, pp. 2631–2644. Cited by: §II-A.
  • [20] P. Kobylinski, M. Wierzbowski, and K. Piotrowski (2020) High-resolution net load forecasting for micro-neighbourhoods with high penetration of renewable energy sources. International Journal of Electrical Power & Energy Systems 117, pp. 105635. Cited by: §II-A.
  • [21] W. Kong, Z. Y. Dong, D. J. Hill, F. Luo, and Y. Xu (2018) Short-term residential load forecasting based on resident behaviour learning. IEEE Transactions on Power Systems 33 (1), pp. 1087–1088. External Links: Document Cited by: §II-A.
  • [22] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang (2019)

    Short-term residential load forecasting based on lstm recurrent neural network

    IEEE Transactions on Smart Grid 10 (1), pp. 841–851. External Links: Document Cited by: §II-A, §III.
  • [23] T. Li, A. K. Sahu, A. Talwalkar, and V. Smith (2020) Federated learning: challenges, methods, and future directions. IEEE Signal Processing Magazine 37 (3), pp. 50–60. Cited by: §I, §II-B, §V-D.
  • [24] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y. Liang, Q. Yang, D. Niyato, and C. Miao (2020) Federated learning in mobile edge networks: a comprehensive survey. IEEE Communications Surveys & Tutorials 22 (3), pp. 2031–2063. Cited by: §I, §II-B.
  • [25] Y. Meng, W. Zhang, H. Zhu, and X. S. Shen (2018) Securing consumer iot in the smart home: architecture, challenges, and countermeasures. IEEE Wireless Communications 25 (6), pp. 53–59. External Links: Document Cited by: §II-C.
  • [26] E. Mocanu, P. H. Nguyen, M. Gibescu, and W. L. Kling (2016) Deep learning for estimating building energy consumption. Sustainable Energy, Grids and Networks 6, pp. 91–99. Cited by: §I.
  • [27] M. S. Mohd Hafizi, N. A. Mat Leh, N. A. Kamarzaman, and N. H. Ishak (2018) Developing a monitoring system for tripping fault detection via iot. In 2018 9th IEEE Control and System Graduate Research Colloquium (ICSGRC), Vol. , pp. 110–115. External Links: Document Cited by: §II-C.
  • [28] Q. Mu, Y. Wu, X. Pan, L. Huang, and X. Li (2010) Short-term load forecasting using improved similar days method. In 2010 Asia-Pacific Power and Energy Engineering Conference, Vol. , pp. 1–4. External Links: Document Cited by: §V-D.
  • [29] National Academies of Sciences Engineering and Medicine (2021) The future of electric power in the United States; available at The National Academies Press. Cited by: §I.
  • [30] E. Parliament and C. of the European Union (2020) General data protection regulation 2016/679; available at Cited by: §I.
  • [31] Pecan Street Inc. (2020) Pecan street inc. dataport; available at Cited by: §I, §V-C, §V.
  • [32] H. Shi, M. Xu, and R. Li (2017) Deep learning for household load forecasting—a novel pooling deep rnn. IEEE Transactions on Smart Grid 9 (5), pp. 5271–5280. Cited by: §I.
  • [33] W. Shi, J. Cao, Q. Zhang, Y. Li, and L. Xu (2016) Edge computing: vision and challenges. IEEE internet of things journal 3 (5), pp. 637–646. Cited by: §I, §I.
  • [34] D. Smith, P. Wang, M. Ding, J. Chan, B. Spak, X. Guan, P. Tyler, T. Rakotoarivelo, Z. Lin, and T. Abbasi (2020) Privacy-preserved optimal energy trading, statistics, and forecasting for a neighborhood area network. Computer 53 (5), pp. 25–34. Cited by: §II-A.
  • [35] B. Stephen, X. Tang, P. R. Harvey, S. Galloway, and K. I. Jennett (2015) Incorporating practice theory in sub-profile models for short term aggregated residential load forecasting. IEEE Transactions on Smart Grid 8 (4), pp. 1591–1598. Cited by: §I, §II-A.
  • [36] A. Taïk and S. Cherkaoui (2020) Electrical load forecasting using edge computing and federated learning. In ICC 2020 - 2020 IEEE International Conference on Communications (ICC), Vol. , pp. 1–6. External Links: Document Cited by: §II-B.
  • [37] O. A. Wahab, A. Mourad, H. Otrok, and T. Taleb (2021) Federated machine learning: survey, multi-level classification, desirable criteria and future directions in communication and networking systems. IEEE Communications Surveys Tutorials 23 (2), pp. 1342–1397. External Links: Document Cited by: §I.
  • [38] L. Wen, K. Zhou, and S. Yang (2020) Load demand forecasting of residential buildings using a deep learning model. Electric Power Systems Research 179, pp. 106073. Cited by: §II-A.
  • [39] A. K. Yadav and S. Chandel (2014) Solar radiation prediction using artificial neural network techniques: a review. Renewable and sustainable energy reviews 33, pp. 772–781. Cited by: §I, §III.
  • [40] Q. Yang, Y. Liu, T. Chen, and Y. Tong (2019) Federated machine learning: concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST) 10 (2), pp. 1–19. Cited by: §I, §II-B, §III.
  • [41] M. Yun and B. Yuxin (2010) Research on the architecture and key technology of internet of things (iot) applied on smart grid. In 2010 International Conference on Advances in Energy Engineering, pp. 69–72. Cited by: §II-C.