Deep Reinforcement Learning-Assisted Federated Learning for Robust Short-term Utility Demand Forecasting in Electricity Wholesale Markets

Short-term load forecasting (STLF) plays a significant role in the operation of electricity trading markets. Considering the growing concern of data privacy, federated learning (FL) is increasingly adopted to train STLF models for utility companies (UCs) in recent research. Inspiringly, in wholesale markets, as it is not realistic for power plants (PPs) to access UCs' data directly, FL is definitely a feasible solution of obtaining an accurate STLF model for PPs. However, due to FL's distributed nature and intense competition among UCs, defects increasingly occur and lead to poor performance of the STLF model, indicating that simply adopting FL is not enough. In this paper, we propose a DRL-assisted FL approach, DEfect-AwaRe federated soft actor-critic (DearFSAC), to robustly train an accurate STLF model for PPs to forecast precise short-term utility electricity demand. Firstly. we design a STLF model based on long short-term memory (LSTM) using just historical load data and time data. Furthermore, considering the uncertainty of defects occurrence, a deep reinforcement learning (DRL) algorithm is adopted to assist FL by alleviating model degradation caused by defects. In addition, for faster convergence of FL training, an auto-encoder is designed for both dimension reduction and quality evaluation of uploaded models. In the simulations, we validate our approach on real data of Helsinki's UCs in 2019. The results show that DearFSAC outperforms all the other approaches no matter if defects occur or not.


page 1

page 9


DearFSAC: An Approach to Optimizing Unreliable Federated Learning via Deep Reinforcement Learning

In federated learning (FL), model aggregation has been widely adopted fo...

Federated Learning for Short-term Residential Energy Demand Forecasting

Energy demand forecasting is an essential task performed within the ener...

Exploring Deep Reinforcement Learning-Assisted Federated Learning for Online Resource Allocation in EdgeIoT

Federated learning (FL) has been increasingly considered to preserve dat...

Towards the Practical Utility of Federated Learning in the Medical Domain

Federated learning (FL) is an active area of research. One of the most s...

Advanced Statistical Learning on Short Term Load Process Forecasting

Short Term Load Forecast (STLF) is necessary for effective scheduling, o...

Enhancing WiFi Multiple Access Performance with Federated Deep Reinforcement Learning

Carrier sensing multiple access/collision avoidance (CSMA/CA) is the bac...

I Introduction

In recent years, many countries and regions have gradually opened up their electricity trading markets, in which utility companies (UC) purchase electricity from power plants (PP) in a wholesale market, and then sell it to consumers in a retail market. As the number of UCs increases, Texas gradually occupies the largest share of the electricity trading market in the US. By November , more than UCs have been operating in the Electric Reliability Council of Texas [26]

. As power supply and demand have a significant influence on energy transactions, market shares, and profits in competitive electricity markets, a precise load forecast, especially short-term load forecasting (STLF), is essential for electricity price estimation


In the smart grid, with the widespread deployment of advanced metering infrastructure (AMI) in various buildings, such as residential buildings, commercial buildings, industrial buildings, etc., approaches of STLF on electricity have been actively studied. The work of [13] adopts long short-term memory (LSTM) networks to extract temporal features in electricity consumption data of residential buildings, which has become a popular way in STLF. Based on [13]

, research on STLF using deep learning (DL) springs up. As UCs can collect abundant consumer profiles, such as historical load data, household characteristics, and behavior patterns through AMIs, STLF of residential buildings and communities is well developed

[20, 18, 2, 12, 16]. In addition, there also exist some DL-based STLF approaches on industrial load data [5, 3, 32], commercial load data [7], hospital load data [35], and so on.

To maintain the stability of electricity trading markets, STLF on UCs’ demand is also necessary for PPs. However, as there is a great competition of pricing in wholesale markets in Texas [1], it is not realistic for PPs to gain UCs’ data. Besides, PPs have no AMIs or any other devices to access consumers’ profiles. Therefore, an approach of obtaining an accurate STLF model for PPs without intruding UCs’ data privacy is strongly needed.

Considering the growing concern of protecting data privacy, federated learning (FL) [14], which aims at providing general solutions while ensuring data privacy and security, is adopted in most research on STLF [28, 24], indicating the feasibility of adopting FL between one PP and UCs to help the PP obtain an accurate STLF model.

Nevertheless, as mentioned above, the competition in wholesale markets is intense, implying that malicious UCs inevitably exist. Due to FL’s distributed nature, bad conditions increasingly occur in FL-based STLF model training on PPs. In this paper, we call all bad conditions defects. As stated in research [23], malicious hackers or UCs can conduct various attacks, such as data poisoning attacks, to training data or trained models. Besides, uneven quality of communication can also introduce errors during uploading local models or downloading the global model. Thus, a robust design is eagerly required to be proposed, rather than simply adopting FL.

To tackle the challenge, deep reinforcement learning (DRL) [27]

is adopted. In DRL, an agent is trained to interact with the environment, which has the strong capability of solving real-time decision tasks with significant uncertainty. As DRL becomes increasingly prevalent in solving problems which can be modeled as Markov decision process (MDP)

[22], many studies of adopting DRL in smart grid emerge [9, 21, 8]. On the other hand, the work of [31] combines FL and DRL by selecting a certain number of clients from all clients through a DRL model to deal with image classification tasks. Motivated by the above works, we adopt a DRL algorithm to output the optimal weights for uploaded models so that the PP can aggregate model parameters to the global model which can be guaranteed an improvement in each round despite defects occurring.

As we know, there are just few works about adopting FL in probabilistic solar load forecasting [34, 15]. Considering the mature development and wide deployment of deterministic load forecasting (DLF) in demand, we still construct STLF model based on DLF in this paper.

The main contributions are summarized as follows:

  • An approach of obtaining a STLF model for PPs without intruding data privacy: As far as we know, there is no work of adopting FL to obtain a STLF model for PPs. In this paper, FL is adopted to aggregate UCs’ STLF models to a global model with high accuracy for PPs. In return, UCs can download the global model for better local STLF. Above all, the data privacy of UCs in wholesale markets is protected through FL.

  • A DRL-based design for robustness against defects: Considering defects occurring in wholesale markets, uploaded models, which have different quality, may harm the performance of the global model during model aggregation on the PP server. To alleviate the model degradation caused by defects, a DRL algorithm, soft actor-critic (SAC), is adopted to assign optimal weights to uploaded models to guarantee efficient model aggregation, which makes the FL process significantly robust.

  • Model dimension reduction and quality evaluation: Since high-dimensional model parameters are uploaded to the server, as well as the assigned weights are continuous, only relying on DRL leads to massive time and computational resources for convergence. Thus, inspired by some techniques of dealing with the defects in FL [29, 33], an auto-encoder, called quality evaluation embedding network (QEEN), is constructed before the DRL model to reduce the dimension of uploaded models and evaluate their quality to accelerate the DRL training.

To sum up, a DRL-assisted FL approach, named DEfect-AwaRe federated soft actor-critic (DearFSAC), is proposed to robustly integrate an STLF model for PPs using UCs’ local models.

The remaining parts of this paper proceed as follows. In Section II, we formulate the communication between one PP and UCs under an FL paradigm. DearFSAC is elaborated in Section III. In Section IV, we evaluate our approach through simulations. Finally, we conclude our work in Section V.

Ii System Model

Ii-a FL between one PP and UCs

Assume there are UCs and one PP. The UCs are set as clients who upload their models to the server, i.e. the PP. Firstly, clients upload their STLF models. Then, the server aggregates uploaded local models to the global model and allows clients to download the global model for further local training. The two steps form a loop, illustrated in Fig. 1.

However, during the FL process, various defects may occur and decrease the accuracy of the PP’s STLF model. Briefly, in this paper, we consider the following defects:

  • Data integrity attacks (DIAs): As the quality of input data affects the forecast accuracy directly, DIAs, where hackers access supposedly protected data and inject false information, are harmful to load forecasting and hard to be detected directly [17]. In FL, DIAs weaken both local models and the global model.

  • Communication noises: Due to the frequent transmission of model parameters between UCs and the PP [23], the uploaded parameters can be inaccurate, which randomly weaken the performance of model aggregation.

Two issues should be addressed in this FL framework:

  • Since UCs’ data privacy is concerned in wholesale markets, the PP’s STLF model should be designed to well capture the hidden temporal features only from the historical load data and time data.

  • Since various defects occur in FL process, a robust model aggregation approach is needed to obtain a global model with high STLF accuracy.

Fig. 1: The FL framework of model aggregation for one PP using UCs’ models with defects occurring.
Notation Meaning

The weight vector

The th UC
The electricity consumption dataset owned by
The loss of a model evaluated on
The objective of STLF adopting FL with defects occurring
Total number of UCs in wholesale market
The proportion of selecting uploaded models in each round
The proportion of defective models among all clients
The time period of
Local model parameters of
Global Model parameters of the server
Defective local model
Features of
True electricity consumption of
Predicted electricity demand of
TABLE I: Summary of Main Notations

Ii-B Formulation

The main notations of this work are summarized in Table I. clients and corresponding raw datasets are defined in Definition 2.1:

Definition 2.1: Let be the set of clients of an FL process, and ’s dataset is , where is the historical electricity consumption and time data, and is a vector containing the true electricity demand. For each client , represents a vector containing the predicted demand of . To conduct model aggregation, the STLF models of are with the same structure. Then, the parameters of and the server are represented by parameter vectors and , respectively, where is the total number of one model’s parameters. Thus, as is downloaded by all clients, and is inputted without any preprocess, can be computed as:


Based on Definition 2.1, to minimize the averaged loss of testing on , the objective of FL can be formulated into an empirical risk minimization (ERM) problem as follows:


where is downloaded by clients, and represents the loss of the model trained on , formulated as:


where is a subset of .

For the server, the objective is to find the optimal global model parameters:


For better performance, at each round, models among all uploaded models are randomly selected as a subset to participate in model aggregation, where and is a proportion within [31]. Assume we have a weight vector outputted by the DRL model. The PP multiplies and all selected model parameters to get the global model:


Next, considering that defects occur in FL process or smart grid, we define defective models in Definition 2.2:

Definition 2.2: Assume there is a subset contains models among all uploaded models affected by defects, where and is a proportion within . If is defective, it can be denoted as . In each FL communication round, we assume there is a subset of defective models among selected models, denoted as .

Fig. 2: The overall architecture of DearFSAC, in which the LSTM forecasting model is introduced in Subsection III-B, QEEN is introduced in Subsection III-C, and the DRL model is introduced in Subsection III-D.

The framework mentioned above can be illustrated in Fig. 1. As is much higher than , and it is uncertain which models are defective, fixed weights, such as averaged ones, will conduct poor model aggregation. Therefore, an optimal weights assignment approach is required. The objective of FL model aggregation with defects occurring can be formulated as:


where if th model is in , it is a defective model . Then, an optimal weight vector is needed to be found:


Iii Proposed DEfect-AwaRe Federated Soft Actor-critic Approach

Iii-a Overall Architecture

A novel FL-based framework, DearFSAC, is proposed to accurately forecast the total electricity demand of UCs for the PP to generate appropriate electricity while defects occur. The overall architecture of DearFSAC is shown in Fig. 2. In general, this approach mainly consists of

modules, which are: 1) STLF model based on LSTM; 2) The quality evaluation embedding network (QEEN); 3) The DRL model based on soft actor-critic (SAC). As the PP just has historical data and time data, the STLF model should be capable of capturing hidden temporal features. Furthermore, during the FL process, considering different quality of uploaded models and various defects, the DRL model based on SAC is adopted to assign optimal weights to uploaded models to conduct efficient aggregation. Besides, just inputting model parameters into the DRL model will lead to curse of dimensionality and quite slow convergence. Therefore, QEEN is designed to reduce uploaded model parameters’ dimension and evaluate these models’ quality to provide more effective information for faster convergence of the DRL model.

Iii-B STLF model based on LSTM

Based on the work from [13], we adopt the LSTM [11] network as the structure of the STLF model. In order to make the LSTM model work, the inputs need to be time series. Therefore, we create sliding windows with and from the data, where and are the length in days of one sliding window and the interval of sliding windows’ starting points, respectively. In our work, each daily consumption profile consists of hourly intervals, and thus the width of all feature vectors should be

. After data cleaning and feature selection, input features containing temporal information are generated, which include:

  • The sequence of electricity consumption for the past time intervals .

  • The incremental sequence of the hourly time indices for the past time steps .

  • The corresponding day of a week is within , which is simply mapped to and using trigonometric functions.

  • The binary holiday marks of corresponding dates are within .

To ensure the feature dimensions are consistent, we extend , , and to 24-dimensional vectors , , and , respectively. After we normalize and to and , respectively, where elements in and are both within , the input of the LSTM model can be constructed into a matrix of the five transposed vectors:


As , , , , and are all one-dimensional vectors, the dimension of a sliding window is . After the above preprocessing, the training input of ’s forecasting model is a set of sliding windows, which can be represented by , in which is the first sliding window computed from

th time step. To update LSTM model parameters, mean square error (MSE) is adopted as the loss function. Then at FL round

, the parameters can be updated as follows:


where are the updated model parameters, is the training loss, and is the learning rate of local training.

When using this model, at time , the input is a sliding window counting back time steps from , and output of is the hourly electricity demand at time , where is the time index in the dataset .

Iii-C Dimension Reduction and Quality Evaluation via QEEN

For quality evaluation and dimension reduction, QEEN, an auto-encoder is introduced before the DRL model. For training efficiency of QEEN, we upload all local model parameters to the server. In other words, we set . Then we design loss for the embedding of and loss for quality prediction.

Firstly, we define the embedding vectors in Definition 3.1:

Definition 3.1: We feed each into the encoder composed of two FC layers to get the embedding vector of the th model. After obtaining all embedding vectors, we put the embedding vector concatenation into the decoder to produce a decoded representation which approximates .

For faster training, we design the decoder into parallel FC layers , where is the th parallel FC layer corresponding to the th layer of the original model structure [36], and is the number of layers of the original model. Next, for the th model, the embedding vector is fed into the th parallel layer to get decoded layer parameters of the original model. By concatenating each layer by layer, the entire decoded model parameters are obtained.

As multiple defects have different impacts on local models, we define defect marks and quality evaluation marks in Definition 3.2:

Definition 3.2: To train the QEEN for quality evaluation, we set the defect marks as the ground truth, where represents the severity of defects in . The defect mark is computed based on the accuracy of over a fixed validation dataset on the server:


where refers to the accuracy of testing on the dataset . Then, we compute the quality evaluation marks as:


where is the quality prediction of the , and is the quality evaluation module composed of two FC layers.

Next, we compare with . After getting , we use MSE loss function to compute and :


where is the QEEN parameter, is the number of uploaded model parameters, and is the th parameter of the th model.

Finally, we set different weights and for and , respectively, to update the QEEN parameter using joint gradient descent.

Iii-D Optimal Weight Assignment via DRL

Iii-D1 MDP Modeling

The process of the FL-based STLF can be modeled as an MDP. Assume the PP has a target mean absolute percentage error (MAPE) which is usually close to . At each round, UCs are randomly selected to conduct local training on their own datasets and upload model parameters to the PP. After receiving uploaded information as , the DRL model outputs , which is composed of weights to be assigned to all uploaded models. The details and explanations of , , and of the MDP are defined as follows:

State : At round , the state is denoted as a vector , where denotes the embedding vector of model parameters of , denotes the embedding vector of the server’s model parameters, denotes the local training loss of , and denotes the action at the previous round.

Action : The action, denoted as , is a weight vector, calculated by the DRL model, for a randomly selected subset of models at round . All the weights in are within and satisfy the constraint . After obtaining the weight vectors, the server aggregates local model parameters to the global model parameters as follows:


Reward : At round , the current reward guides to maximize the cumulative return , i.e. the goal of DRL. We design a compound reward by combining two sub-rewards with appropriate weights and a discount factor , which can be formulated as:


In Eq. (18), aims to minimize the global model’s MAPE. The exponential term represents the MAPE gap, where , which is usually within , is the global model’s MAPE on the held-out validation dataset at round . To mitigate the slow convergence caused by diminishing marginal effect [31], we use , a positive constant, to ensure an exponential growth of . Assume is , thus is within , and the term is within . As the more rounds the agent takes, the less cumulative reward the agent obtains, we need to punish the agent for finishing training in more rounds. Therefore, the second term is used as the time penalty at each round to set within for faster convergence.

Eq. (19) aims to provide auxiliary information for the agent to reduce exploration time. After obtaining the quality prediction mark from QEEN, we normalize to within to calculate the MSE loss of and , where . Similar to Eq. (18), Eq. (19) is set to negative for the time penalty.

Iii-D2 Soft Actor-Critic

We adopt SAC [10] to solve the MDP. At the end of each round , the tuple , which is denoted as , is recorded in the replay buffer .

In SAC, the action

is sampled from a Gaussian distribution

, whose mean

and standard deviation

are outputted from the actor network of the DRL model:


For each iteration, SAC samples a batch of from and updates the DRL network parameters. To deal with poor sampling efficiency and data unbalance in DRL [27], we adopt two techniques of replay buffer named emphasizing recent experience (ERE) [30] and prioritized experience replay (PER) [25] to sample data with priority and emphasis.

The update procedure of SAC is shown in Fig. 2, where is the action of next time step, i.e. the next FL round, and is the action of last time step, i.e. the last FL round.

Iii-E Workflow of DearFSAC

The workflow of the DearFSAC approach is summarized in Algorithm 1, and are described as follows.

  • At the first communication round, the global model parameters are initialized on the PP, denoted as (line ).

  • All UCs send download request to the PP and download the latest global STLF model as the local STLF model. After local training, all UCs obtained corresponding updated models and training loss. During the local training, defects may occur in datasets and model parameters (lines ).

  • The PP selects UCs to upload models and training loss. After obtaining embedding vectors by inputting model parameters into QEEN, the PP concatenate embedding vectors, training loss, and the action at last round to get the state. By inputting the state into the DRL model, the mean

    and variance

    of the Gaussian distribution [10] are obtained to sample current action . Finally, by multiplying and , the PP obtains the global model and send it to selected UCs. The whole process loops until convergence (line ).

1:  Input: Initial global model ; the set of UCs ; the number of selection ; the learning rate of local training ; initial action ; data owned by each UCs , where ; empty prioritized experience replay buffer .
2:  Output: The global STLF model .
3:  Initial and for the PP;
4:  for each FL iteration do
5:         UCs execute:
6:         for each local model owned by do
7:                , if then ;
8:                ;
9:                ;
10:                ;
11:                If selected, upload and to the PP;
12:         end for
14:         PP executes:
15:         Select UCs to upload;
16:         Reset , , and to record , , and respectively;
17:         ;
18:         for each selected UC do
19:               ;
20:               ;
21:         end for
22:         , if then ;
23:         , where ;
24:  end for
Algorithm 1 Workflow of the DearFSAC Approach

Iv Simulations & Analysis

Iv-a Simulation Setup

Iv-A1 Datasets

We conduct simulations using data from Nuuka open API [4], containing the basic information and energy data of Helsinki’s utility and service properties. As mention in Section II

, the aim of this work is to obtain an accurate STLF model using FL framework with defects occurring. To alleviate affect of different characteristics, we use K-means to cluster. After data cleansing and clustering, data of

UCs spanning years from st March to st March are used in this work. Each UC is set as a client. All of the clients have hourly resolution data. Considering the seasonal factors, we further split the dataset into four seasons. Then, the data before the last week of each season is used for training, and the last one week of each season is set as the test dataset.

Iv-A2 Defect Types

As mentioned above, the proportion of defective models is . Then we design scenarios in Table II based on types of defects as follows:

  • DIAs:

    Before making sliding windows, we simulate a normally-distributed DIAs on the training dataset by randomly selecting

    of all data points to alter their loads by multiplying , where is sampled from a Gaussian distribution with mean and standard deviation [17].

  • Communication noises:

    We apply the signal to noise ratio (SNR) to model the noises, and the modified parameters

    is calculated as follows:


    where indicates the level of noises.

  • Mixed defect: Consider more severe conditions, we design the mixed defect by adding both DIAs and communication noises into the same client.

Iv-A3 Comparison Approaches

To evaluate the performance of our approach, approaches are adopted for comparison:

  • Centralized learning (CL-LSTM): The data is gathered to train a global model. Note that the data privacy may be intruded in this learning framework. Based on CL, we can construct the LSTM as CL-LSTM.

  • Federated learning (FL-LSTM): As the most common approach in FL [19], federated averaged (FedAvg), which assigns averaged weights to local models, is adopted based on the proposed LSTM.

  • DearFSAC without QEEN (FL-LSTM-SAC): QEEN is an auto-encoder that needs extra computational costs for training. Besides, during FL communication, the process of QEEN may spend more time. Therefore, FL-LSTM-SAC is adopted to evaluate the necessity of QEEN in our proposed approach.

Iv-A4 Evaluation Metrics

The following metrics are used to evaluate the performance of our approach on the test dataset:

  • MAPE is a percentage quantifying the size of the prediction error, which is defined as:


    where is the actual value, is the predicted value, and is the number of predicted values.

  • The root mean square error (RMSE) quantifies the error in terms of electricity, which is defined as:


To evaluate the performance of approaches more precisely, we conduct simulations for times and compute the averaging metrics.

Scenario Defect type
I None
II Data integrity attacks
III Communication noises
IV Mixed defect
TABLE II: scenarios of FL-based STLF training in wholesale markets
Parameter description Value
FL Total training round 1000

Local training epoch

LSTM Hidden layer size 512
Learning rate 1e-4
Length of sliding windows 24
Interval of sliding windows 1
Optimizer Adam
SAC Target MAPE 2%
’s base number 64
Reward weight set (0.5,0.5)
Decay rate 0.99
Buffer size 1e5
Soft update rate 5e-3
Learning rate 3e-4
QEEN Loss weight set (0.5,0.5)
TABLE III: Hyperparameters of the proposed approach

Iv-A5 Details of Setup

All simulations and approaches are coded in Python. DearFSAC and the above comparison approaches are implemented using Pytorch, and conducted on a personal computer with an NVIDIA GeForce RTX 2080 Ti GPU. Referring to

[13, 10], the hyperparameters of the proposed DearFSAC are listed in Table III.

Iv-B Performance Analysis

In this subsection, UCs participate in FL training under scenarios, in which is set as , and is set as for Scenario II, III, and IV. For DIAs, we set , , and as , , and , respectively. For communication noises, we set as .

Through comparisons, the approaches are evaluated in terms of MAPE and RMSE in Table IV, where the best results under each scenario are in bold. Note that there is a probability of adding defects into the CL model in each epoch.

As shown in Table IV, the results under Scenario I show that the MAPE and RMSE of DearFSAC are slightly lower than the other FL-based approaches and much lower than CL-LSTM. However, under Scenario II, III, and IV, DearFSAC maintains almost the same performance as Scenario I and outperforms the other approaches. The reasons lie in key points: (1) When no defects occur, nearly averaged weights can conduct feasible model aggregation and obtain an accurate global model because of the K-means clustering. CL-LSTM is a general model trained on the whole dataset and lacks personalization. However, FL-based approaches use model aggregation to share parameters for preventing overfitting. Since the FL training process is not a convex problem, the distributed training manner may achieve a better performance when better sub-optimal solutions are found. (2) When defects reduce the model quality, errors are accumulated in the global model during the FL process. Due to DearFSAC’s capability of assigning nearly optimal weights to uploaded models, only DearFSAC can effectively conduct model aggregation for each FL communication round.

The load forecasting curves of approaches under scenarios on th February are shown in Fig. 3. We can see that though the actual demand fluctuates, approaches under Scenario I still forecast accurately, indicating that our LSTM-based STLF model is effective on this dataset. Under Scenario II, III, and IV, FL-LSTM-SAC performs similarly well, showing the effectiveness of the DRL algorithm. Even so, DearFSAC performs better than FL-LSTM-SAC, proving the superiority of our proposed approach. On the contrary, the performance of the other two approaches is not so good. When DIAs occur, as shown in Scenario II, the forecasting results of CL-LSTM and FL-LSTM deviate from the actual demand, but the general trend is similar to the one of actual demand, indicating that DIAs can only affect the individual values of forecasting and have little impact on the tendency. Nevertheless, as shown in Scenario III, communication noises are seemingly capable of affecting the forecasting tendency of CL-LSTM, while the one of FL-LSTM is more stable. More severely, the mixed defects significantly harm the STLF performance of CL-LSTM and FL-LSTM, where FL-LSTM still seems to have a general trend similar to the actual demand. The reason is that through model aggregation, the global model can alleviate the impact of communication noises, whereas the errors are also accumulated in each FL communication round if defective models obtain respectable weights during model aggregation.

RMSE (kW) MAPE (%)
Scenario I Scenario II Scenario III Scenario IV Scenario I Scenario II Scenario III Scenario IV
TABLE IV: Comparison of performance using different approaches under scenarios
Fig. 3: approaches’ hourly forecasting on 1721 Lpk Aleksi ja Dh Alexia utility’s load on th February under scenarios.
Runtime (second)
TABLE V: Runtime for -round training under Scenario IV

Then, to assess the computational efficiency, runtime for the -round FL training is compared, where CL-LSTM conducts training for epochs. As shown in Table V, the runtime of FL-based approaches is much lower than that of CL-LSTM, because the FL-based framework executes parallel local training in each UC to update models in a distributed manner. Besides, though DRL-adopted approaches cost more time than FL-LSTM, the runtime of FL-LSTM-SAC and DearFSAC is still acceptable. Furthermore, compared with FL-LSTM-SAC, DearFSAC’s runtime is even shorter. The reason is that directly inputting high-dimensional model parameters into the DRL model costs plenty of time, while QEEN spends a little time significantly reducing the model dimension for faster DRL computation.

To prove that QEEN indeed outputs embedding vectors containing defect information, we adopt t-SNE to visualize each embedding vector in a two-dimension space, setting as and changing . As shown in Fig. 4, embedding vectors are divided into clusters by QEEN, where normal and defective models are labeled in advance, indicating that QEEN can disguise defective model parameters and normal model parameters effectively.

Fig. 4: QEEN divides model embedding vectors to clusters, where is set to %, %, and %.

Iv-C Robustness Analysis

In this subsection, we evaluate the robustness of DearFSAC by changing of Scenario II and III. Then, we change , , and under Scenario II, and under Scenario III. Note that we set and as and respectively. The proposed DearFSAC is compared with FL-LSTM.

Iv-C1 Different proportions of defective models

Firstly, we conduct simulations under Scenario II and III while changing the value of to study how the proportion of defective models affects the performance. As shown in Fig. 5, as increases, the MAPE and RMSE of FL-LSTM increase dramatically, indicating that the larger the is, the worse the performance of FL-LSTM is. However, DearFSAC performs well despite the value of . The reason is that if is small, just a few defects affect the model aggregation, which can be covered by other normal model parameters in each round. Nevertheless, if grows, errors in each round will be increasingly accumulated. Rather than cover defects, model aggregation will be deteriorated by defective models and output a worse and worse global model which is seriously harmful to UCs’ local models. In addition, we can conclude that influences DIAs and communication noises similarly in this paper.

Fig. 5: MAPE and RMSE of FL-LSTM and DearFSAC with different under Scenario II and III.

Iv-C2 Different levels of DIAs

To simulate different levels of DIAs, we adjust , , and under Scenario II. Then we compare MAPE of FL-LSTM and DearFSAC.

As shown in Fig. 6, we set both the values of and to be between and by increments of . We can see that when or increases, the MAPE of FL-LSTM gets larger, indicating that the larger the DIA level is, the harder the defects are. On the contrary, our proposed approach maintains good performance, which just gets worse slightly. The reason is that DearFSAC can recognize DIAs based on QEEN and assign lower weights to models with stronger DIAs. Besides, the variation of affect the model performance more seriously with larger , showing that is the more dominant factor in DIAs.

Furthermore, we change to study how the proportion of data points under DIAs affects the global model performance. In addition to MAPE, the specific squared error of each point is also recorded for more clear observation. Fig. 7 shows that as the increases, there is a steady growth of the MAPE of FL-LSTM, while DearFSAC’s MAPE keeps stable. The observation enhances the conclusion that DearFSAC has the capability of recognizing models with different levels of DIAs.

Fig. 6: MAPE of FL-LSTM and DearFSAC with different and under Scenario II.
Fig. 7: Squared errors and MAPE comparison of FL-LSTM and DearFSAC with different under Scenario II.
Fig. 8: Squared errors and MAPE comparison of FL-LSTM and DearFSAC with different under Scenario III.

Iv-C3 Different levels of communication noises

To simulate different levels of communication noises, we adjust under Scenario III. Then we compare the performance of FL-LSTM and DearFSAC. As shown in Fig. 8, the performance of FL-LSTM increasingly reduces as the increases, while the one of DearFSAC just increases a little and still keeps in a low range. The reason is that the smaller is, the more severe the communication noises are. When models are extremely defective, DearFSAC can assign reasonably low weights to them while FL-LSTM assigns averaged weights and introduces large errors to the global model. Thus, DearFSAC is capable of holding a feasible performance when the communication noises are serious.

Iv-D Scalability Analysis

In this subsection, we evaluate the scalability of DearFSAC by changing total client number and selecting proportion under Scenario I and IV. We set , , , and as , , , and respectively. As shown in Fig. 9, small variation occurs in performance when and are both small. However, if fixing , the performance drops with increasing, and the same phenomenon applies to , showing that the number of UCs and selection proportion of uploaded models should be both kept within a small range. Besides, compared with the performance under Scenario I, our proposed approach has a slightly worse performance under Scenario IV. The reason is that increasing number of models to be aggregated leads to larger computational costs. As DearFSAC can not assign perfect weights to all selected models, accumulated errors cause a larger probability of introducing defects into the global model. Note that when is , small makes the performance a little worse, indicating that too few models participating in the aggregation will weaken the global model. In general, our proposed approach is scalable when changing the total number of UCs and the selection proportion of uploaded models.

Fig. 9: MAPE of DearFSAC with different and under Scenario I and Scenario IV.

V Conclusion & Future Works

In this paper, DearFSAC, a DRL-assisted FL approach, is proposed to robustly integrate STLF models for individual PPs. By adopting FL, one PP can obtain an accurate STLF model just using UCs’ local models, which protects data privacy. Considering defects, the SAC algorithm is adopted to conduct robust model aggregation. In addition, for better convergence of our approach, QEEN, an auto-encoder, is designed for both dimension reduction and quality evaluation of uploaded models. In simulations, our approach shows the superiority and robustness of the proposed approach in utility demand forecasting.

In the future, we will focus on the following three aspects: 1) Stronger robustness considering more types of defects; 2) improvement under more distributed scenarios; 3) extension of more types of energy resources.


  • [1] 2020 state of the market report for the ercot electricity markets. Note: Cited by: §I.
  • [2] M. Afrasiabi, M. Mohammadi, M. Rastegar, L. Stankovic, S. Afrasiabi, and M. Khazaei (2020)

    Deep-based conditional probability density function forecasting of residential loads

    IEEE Transactions on Smart Grid 11 (4), pp. 3646–3657. Cited by: §I.
  • [3] A. Ahmad, N. Javaid, M. Guizani, N. Alrajeh, and Z. A. Khan (2016) An accurate and fast converging short-term load forecasting model for industrial applications in a smart grid. IEEE Transactions on Industrial Informatics 13 (5), pp. 2587–2596. Cited by: §I.
  • [4] Nuuka open api. Note: Cited by: §IV-A1.
  • [5] A. Bracale, G. Carpinelli, P. De Falco, and T. Hong (2017) Short-term industrial load forecasting: a case study in an italian factory. In 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), pp. 1–6. Cited by: §I.
  • [6] E. Ceperic, V. Ceperic, and A. Baric (2013) A strategy for short-term load forecasting by support vector regression machines. IEEE Transactions on Power Systems 28 (4), pp. 4356–4364. Cited by: §I.
  • [7] G. Chitalia, M. Pipattanasomporn, V. Garg, and S. Rahman (2020)

    Robust short-term electrical load forecasting framework for commercial buildings using deep recurrent neural networks

    Applied Energy 278, pp. 115410. Cited by: §I.
  • [8] C. Feng and J. Zhang (2019) Reinforcement learning based dynamic model selection for short-term load forecasting. In 2019 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), pp. 1–5. Cited by: §I.
  • [9] M. Glavic (2019) (Deep) reinforcement learning for electric power system control and related problems: a short review and perspectives. Annual Reviews in Control 48, pp. 22–35. Cited by: §I.
  • [10] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine (2018) Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. In

    International conference on machine learning

    pp. 1861–1870. Cited by: 3rd item, §III-D2, §IV-A5.
  • [11] S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §III-B.
  • [12] Y. Hong, Y. Zhou, Q. Li, W. Xu, and X. Zheng (2020) A deep learning method for short-term residential load forecasting in smart grid. IEEE Access 8, pp. 55785–55797. Cited by: §I.
  • [13] W. Kong, Z. Y. Dong, D. J. Hill, F. Luo, and Y. Xu (2017) Short-term residential load forecasting based on resident behaviour learning. IEEE Transactions on Power Systems 33 (1), pp. 1087–1088. Cited by: §I, §III-B, §IV-A5.
  • [14] J. Li, Y. Ren, S. Fang, K. Li, and M. Sun (2020) Federated learning-based ultra-short term load forecasting in power internet of things. In 2020 IEEE International Conference on Energy Internet (ICEI), pp. 63–68. Cited by: §I.
  • [15] J. Lin, J. Ma, and J. Zhu (2021) A privacy-preserving federated learning method for probabilistic community-level behind-the-meter solar generation disaggregation. IEEE Transactions on Smart Grid 13 (1), pp. 268–279. Cited by: §I.
  • [16] W. Lin, D. Wu, and B. Boulet (2021)

    Spatial-temporal residential short-term load forecasting via graph neural networks

    IEEE Transactions on Smart Grid 12 (6), pp. 5373–5384. Cited by: §I.
  • [17] J. Luo, T. Hong, and S. Fang (2018) Benchmarking robustness of load forecasting models under data integrity attacks. International Journal of Forecasting 34 (1), pp. 89–104. Cited by: 1st item, 1st item.
  • [18] A. A. Mamun, Md. Sohel, N. Mohammad, Md. S. Haque Sunny, D. R. Dipta, and E. Hossain (2020) A comprehensive review of the load forecasting techniques using single and hybrid predictive models. IEEE Access 8 (), pp. 134911–134939. External Links: Document Cited by: §I.
  • [19] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. Cited by: 2nd item.
  • [20] S. Oprea and A. Bâra (2019) Machine learning algorithms for short-term load forecast in residential buildings using smart meters, sensors and big data solutions. IEEE Access 7, pp. 177874–177889. Cited by: §I.
  • [21] R. Park, K. Song, and B. Kwon (2020) Short-term load forecasting algorithm using a similar day selection method based on reinforcement learning. Energies 13 (10), pp. 2640. Cited by: §I.
  • [22] M. L. Puterman (1990) Markov decision processes. Handbooks in operations research and management science 2, pp. 331–434. Cited by: §I.
  • [23] N. B. S. Qureshi, D. Kim, J. Lee, and E. Lee (2022) Poisoning attacks against federated learning in load forecasting of smart energy. In NOMS 2022-2022 IEEE/IFIP Network Operations and Management Symposium, pp. 1–7. Cited by: §I, 2nd item.
  • [24] M. Savi and F. Olivadese (2021) Short-term energy consumption forecasting at the edge: a federated learning approach. IEEE Access 9, pp. 95949–95969. Cited by: §I.
  • [25] T. Schaul, J. Quan, I. Antonoglou, and D. Silver (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952. Cited by: §III-D2.
  • [26] SRP time-of-use price plan. Note: Cited by: §I.
  • [27] R. S. Sutton and A. G. Barto (2018) Reinforcement learning: an introduction. MIT press. Cited by: §I, §III-D2.
  • [28] A. Taïk and S. Cherkaoui (2020) Electrical load forecasting using edge computing and federated learning. In ICC 2020-2020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §I.
  • [29] T. Tuor, S. Wang, B. J. Ko, C. Liu, and K. K. Leung (2021) Overcoming noisy and irrelevant data in federated learning. In

    2020 25th International Conference on Pattern Recognition (ICPR)

    pp. 5020–5027. Cited by: 3rd item.
  • [30] C. Wang and K. Ross (2019) Boosting soft actor-critic: emphasizing recent experience without forgetting the past. arXiv preprint arXiv:1906.04009. Cited by: §III-D2.
  • [31] H. Wang, Z. Kaplan, D. Niu, and B. Li (2020) Optimizing federated learning on non-iid data with reinforcement learning. In IEEE INFOCOM 2020-IEEE Conference on Computer Communications, pp. 1698–1707. Cited by: §I, §II-B, §III-D1.
  • [32] Y. Wang, J. Chen, X. Chen, X. Zeng, Y. Kong, S. Sun, Y. Guo, and Y. Liu (2020) Short-term load forecasting for industrial customers based on tcn-lightgbm. IEEE Transactions on Power Systems 36 (3), pp. 1984–1997. Cited by: §I.
  • [33] S. Yang, H. Park, J. Byun, and C. Kim (2022) Robust federated learning with noisy labels. IEEE Intelligent Systems. Cited by: 3rd item.
  • [34] X. Zhang, F. Fang, and J. Wang (2020)

    Probabilistic solar irradiation forecasting based on variational bayesian inference with secure federated learning

    IEEE Transactions on Industrial Informatics 17 (11), pp. 7849–7859. Cited by: §I.
  • [35] P. Zhao, Z. Zhang, H. Chen, and P. Wang (2021) Hybrid deep learning gaussian process for deterministic and probabilistic load forecasting. In 2021 IEEE/IAS Industrial and Commercial Power System Asia (I CPS Asia), Vol. , pp. 456–463. External Links: Document Cited by: §I.
  • [36] M. Zinkevich, M. Weimer, A. J. Smola, and L. Li (2010)

    Parallelized stochastic gradient descent.

    In NIPS, Vol. 4, pp. 4. Cited by: §III-C.