I Introduction
In recent years, many countries and regions have gradually opened up their electricity trading markets, in which utility companies (UC) purchase electricity from power plants (PP) in a wholesale market, and then sell it to consumers in a retail market. As the number of UCs increases, Texas gradually occupies the largest share of the electricity trading market in the US. By November , more than UCs have been operating in the Electric Reliability Council of Texas [26]
. As power supply and demand have a significant influence on energy transactions, market shares, and profits in competitive electricity markets, a precise load forecast, especially shortterm load forecasting (STLF), is essential for electricity price estimation
[6].In the smart grid, with the widespread deployment of advanced metering infrastructure (AMI) in various buildings, such as residential buildings, commercial buildings, industrial buildings, etc., approaches of STLF on electricity have been actively studied. The work of [13] adopts long shortterm memory (LSTM) networks to extract temporal features in electricity consumption data of residential buildings, which has become a popular way in STLF. Based on [13]
, research on STLF using deep learning (DL) springs up. As UCs can collect abundant consumer profiles, such as historical load data, household characteristics, and behavior patterns through AMIs, STLF of residential buildings and communities is well developed
[20, 18, 2, 12, 16]. In addition, there also exist some DLbased STLF approaches on industrial load data [5, 3, 32], commercial load data [7], hospital load data [35], and so on.To maintain the stability of electricity trading markets, STLF on UCs’ demand is also necessary for PPs. However, as there is a great competition of pricing in wholesale markets in Texas [1], it is not realistic for PPs to gain UCs’ data. Besides, PPs have no AMIs or any other devices to access consumers’ profiles. Therefore, an approach of obtaining an accurate STLF model for PPs without intruding UCs’ data privacy is strongly needed.
Considering the growing concern of protecting data privacy, federated learning (FL) [14], which aims at providing general solutions while ensuring data privacy and security, is adopted in most research on STLF [28, 24], indicating the feasibility of adopting FL between one PP and UCs to help the PP obtain an accurate STLF model.
Nevertheless, as mentioned above, the competition in wholesale markets is intense, implying that malicious UCs inevitably exist. Due to FL’s distributed nature, bad conditions increasingly occur in FLbased STLF model training on PPs. In this paper, we call all bad conditions defects. As stated in research [23], malicious hackers or UCs can conduct various attacks, such as data poisoning attacks, to training data or trained models. Besides, uneven quality of communication can also introduce errors during uploading local models or downloading the global model. Thus, a robust design is eagerly required to be proposed, rather than simply adopting FL.
To tackle the challenge, deep reinforcement learning (DRL) [27]
is adopted. In DRL, an agent is trained to interact with the environment, which has the strong capability of solving realtime decision tasks with significant uncertainty. As DRL becomes increasingly prevalent in solving problems which can be modeled as Markov decision process (MDP)
[22], many studies of adopting DRL in smart grid emerge [9, 21, 8]. On the other hand, the work of [31] combines FL and DRL by selecting a certain number of clients from all clients through a DRL model to deal with image classification tasks. Motivated by the above works, we adopt a DRL algorithm to output the optimal weights for uploaded models so that the PP can aggregate model parameters to the global model which can be guaranteed an improvement in each round despite defects occurring.As we know, there are just few works about adopting FL in probabilistic solar load forecasting [34, 15]. Considering the mature development and wide deployment of deterministic load forecasting (DLF) in demand, we still construct STLF model based on DLF in this paper.
The main contributions are summarized as follows:

An approach of obtaining a STLF model for PPs without intruding data privacy: As far as we know, there is no work of adopting FL to obtain a STLF model for PPs. In this paper, FL is adopted to aggregate UCs’ STLF models to a global model with high accuracy for PPs. In return, UCs can download the global model for better local STLF. Above all, the data privacy of UCs in wholesale markets is protected through FL.

A DRLbased design for robustness against defects: Considering defects occurring in wholesale markets, uploaded models, which have different quality, may harm the performance of the global model during model aggregation on the PP server. To alleviate the model degradation caused by defects, a DRL algorithm, soft actorcritic (SAC), is adopted to assign optimal weights to uploaded models to guarantee efficient model aggregation, which makes the FL process significantly robust.

Model dimension reduction and quality evaluation: Since highdimensional model parameters are uploaded to the server, as well as the assigned weights are continuous, only relying on DRL leads to massive time and computational resources for convergence. Thus, inspired by some techniques of dealing with the defects in FL [29, 33], an autoencoder, called quality evaluation embedding network (QEEN), is constructed before the DRL model to reduce the dimension of uploaded models and evaluate their quality to accelerate the DRL training.
To sum up, a DRLassisted FL approach, named DEfectAwaRe federated soft actorcritic (DearFSAC), is proposed to robustly integrate an STLF model for PPs using UCs’ local models.
Ii System Model
Iia FL between one PP and UCs
Assume there are UCs and one PP. The UCs are set as clients who upload their models to the server, i.e. the PP. Firstly, clients upload their STLF models. Then, the server aggregates uploaded local models to the global model and allows clients to download the global model for further local training. The two steps form a loop, illustrated in Fig. 1.
However, during the FL process, various defects may occur and decrease the accuracy of the PP’s STLF model. Briefly, in this paper, we consider the following defects:

Data integrity attacks (DIAs): As the quality of input data affects the forecast accuracy directly, DIAs, where hackers access supposedly protected data and inject false information, are harmful to load forecasting and hard to be detected directly [17]. In FL, DIAs weaken both local models and the global model.

Communication noises: Due to the frequent transmission of model parameters between UCs and the PP [23], the uploaded parameters can be inaccurate, which randomly weaken the performance of model aggregation.
Two issues should be addressed in this FL framework:

Since UCs’ data privacy is concerned in wholesale markets, the PP’s STLF model should be designed to well capture the hidden temporal features only from the historical load data and time data.

Since various defects occur in FL process, a robust model aggregation approach is needed to obtain a global model with high STLF accuracy.
Notation  Meaning 

The weight vector 

The th UC  
The electricity consumption dataset owned by  
The loss of a model evaluated on  
The objective of STLF adopting FL with defects occurring  
Total number of UCs in wholesale market  
The proportion of selecting uploaded models in each round  
The proportion of defective models among all clients  
The time period of  
Local model parameters of  
Global Model parameters of the server  
Defective local model  
Features of  
True electricity consumption of  
Predicted electricity demand of  
IiB Formulation
The main notations of this work are summarized in Table I. clients and corresponding raw datasets are defined in Definition 2.1:
Definition 2.1: Let be the set of clients of an FL process, and ’s dataset is , where is the historical electricity consumption and time data, and is a vector containing the true electricity demand. For each client , represents a vector containing the predicted demand of . To conduct model aggregation, the STLF models of are with the same structure. Then, the parameters of and the server are represented by parameter vectors and , respectively, where is the total number of one model’s parameters. Thus, as is downloaded by all clients, and is inputted without any preprocess, can be computed as:
(1) 
Based on Definition 2.1, to minimize the averaged loss of testing on , the objective of FL can be formulated into an empirical risk minimization (ERM) problem as follows:
(2) 
where is downloaded by clients, and represents the loss of the model trained on , formulated as:
(3) 
where is a subset of .
For the server, the objective is to find the optimal global model parameters:
(4) 
For better performance, at each round, models among all uploaded models are randomly selected as a subset to participate in model aggregation, where and is a proportion within [31]. Assume we have a weight vector outputted by the DRL model. The PP multiplies and all selected model parameters to get the global model:
(5) 
Next, considering that defects occur in FL process or smart grid, we define defective models in Definition 2.2:
Definition 2.2: Assume there is a subset contains models among all uploaded models affected by defects, where and is a proportion within . If is defective, it can be denoted as . In each FL communication round, we assume there is a subset of defective models among selected models, denoted as .
The framework mentioned above can be illustrated in Fig. 1. As is much higher than , and it is uncertain which models are defective, fixed weights, such as averaged ones, will conduct poor model aggregation. Therefore, an optimal weights assignment approach is required. The objective of FL model aggregation with defects occurring can be formulated as:
(6)  
(7)  
where if th model is in , it is a defective model . Then, an optimal weight vector is needed to be found:
(8) 
Iii Proposed DEfectAwaRe Federated Soft Actorcritic Approach
Iiia Overall Architecture
A novel FLbased framework, DearFSAC, is proposed to accurately forecast the total electricity demand of UCs for the PP to generate appropriate electricity while defects occur. The overall architecture of DearFSAC is shown in Fig. 2. In general, this approach mainly consists of
modules, which are: 1) STLF model based on LSTM; 2) The quality evaluation embedding network (QEEN); 3) The DRL model based on soft actorcritic (SAC). As the PP just has historical data and time data, the STLF model should be capable of capturing hidden temporal features. Furthermore, during the FL process, considering different quality of uploaded models and various defects, the DRL model based on SAC is adopted to assign optimal weights to uploaded models to conduct efficient aggregation. Besides, just inputting model parameters into the DRL model will lead to curse of dimensionality and quite slow convergence. Therefore, QEEN is designed to reduce uploaded model parameters’ dimension and evaluate these models’ quality to provide more effective information for faster convergence of the DRL model.
IiiB STLF model based on LSTM
Based on the work from [13], we adopt the LSTM [11] network as the structure of the STLF model. In order to make the LSTM model work, the inputs need to be time series. Therefore, we create sliding windows with and from the data, where and are the length in days of one sliding window and the interval of sliding windows’ starting points, respectively. In our work, each daily consumption profile consists of hourly intervals, and thus the width of all feature vectors should be
. After data cleaning and feature selection, input features containing temporal information are generated, which include:

The sequence of electricity consumption for the past time intervals .

The incremental sequence of the hourly time indices for the past time steps .

The corresponding day of a week is within , which is simply mapped to and using trigonometric functions.

The binary holiday marks of corresponding dates are within .
To ensure the feature dimensions are consistent, we extend , , and to 24dimensional vectors , , and , respectively. After we normalize and to and , respectively, where elements in and are both within , the input of the LSTM model can be constructed into a matrix of the five transposed vectors:
(9) 
As , , , , and are all onedimensional vectors, the dimension of a sliding window is . After the above preprocessing, the training input of ’s forecasting model is a set of sliding windows, which can be represented by , in which is the first sliding window computed from
th time step. To update LSTM model parameters, mean square error (MSE) is adopted as the loss function. Then at FL round
, the parameters can be updated as follows:(10)  
(11) 
where are the updated model parameters, is the training loss, and is the learning rate of local training.
When using this model, at time , the input is a sliding window counting back time steps from , and output of is the hourly electricity demand at time , where is the time index in the dataset .
IiiC Dimension Reduction and Quality Evaluation via QEEN
For quality evaluation and dimension reduction, QEEN, an autoencoder is introduced before the DRL model. For training efficiency of QEEN, we upload all local model parameters to the server. In other words, we set . Then we design loss for the embedding of and loss for quality prediction.
Firstly, we define the embedding vectors in Definition 3.1:
Definition 3.1: We feed each into the encoder composed of two FC layers to get the embedding vector of the th model. After obtaining all embedding vectors, we put the embedding vector concatenation into the decoder to produce a decoded representation which approximates .
For faster training, we design the decoder into parallel FC layers , where is the th parallel FC layer corresponding to the th layer of the original model structure [36], and is the number of layers of the original model. Next, for the th model, the embedding vector is fed into the th parallel layer to get decoded layer parameters of the original model. By concatenating each layer by layer, the entire decoded model parameters are obtained.
As multiple defects have different impacts on local models, we define defect marks and quality evaluation marks in Definition 3.2:
Definition 3.2: To train the QEEN for quality evaluation, we set the defect marks as the ground truth, where represents the severity of defects in . The defect mark is computed based on the accuracy of over a fixed validation dataset on the server:
(12) 
where refers to the accuracy of testing on the dataset . Then, we compute the quality evaluation marks as:
(13) 
where is the quality prediction of the , and is the quality evaluation module composed of two FC layers.
Next, we compare with . After getting , we use MSE loss function to compute and :
(14)  
(15) 
where is the QEEN parameter, is the number of uploaded model parameters, and is the th parameter of the th model.
Finally, we set different weights and for and , respectively, to update the QEEN parameter using joint gradient descent.
IiiD Optimal Weight Assignment via DRL
IiiD1 MDP Modeling
The process of the FLbased STLF can be modeled as an MDP. Assume the PP has a target mean absolute percentage error (MAPE) which is usually close to . At each round, UCs are randomly selected to conduct local training on their own datasets and upload model parameters to the PP. After receiving uploaded information as , the DRL model outputs , which is composed of weights to be assigned to all uploaded models. The details and explanations of , , and of the MDP are defined as follows:
State : At round , the state is denoted as a vector , where denotes the embedding vector of model parameters of , denotes the embedding vector of the server’s model parameters, denotes the local training loss of , and denotes the action at the previous round.
Action : The action, denoted as , is a weight vector, calculated by the DRL model, for a randomly selected subset of models at round . All the weights in are within and satisfy the constraint . After obtaining the weight vectors, the server aggregates local model parameters to the global model parameters as follows:
(16) 
Reward : At round , the current reward guides to maximize the cumulative return , i.e. the goal of DRL. We design a compound reward by combining two subrewards with appropriate weights and a discount factor , which can be formulated as:
(17)  
(18)  
(19) 
In Eq. (18), aims to minimize the global model’s MAPE. The exponential term represents the MAPE gap, where , which is usually within , is the global model’s MAPE on the heldout validation dataset at round . To mitigate the slow convergence caused by diminishing marginal effect [31], we use , a positive constant, to ensure an exponential growth of . Assume is , thus is within , and the term is within . As the more rounds the agent takes, the less cumulative reward the agent obtains, we need to punish the agent for finishing training in more rounds. Therefore, the second term is used as the time penalty at each round to set within for faster convergence.
IiiD2 Soft ActorCritic
We adopt SAC [10] to solve the MDP. At the end of each round , the tuple , which is denoted as , is recorded in the replay buffer .
In SAC, the action
is sampled from a Gaussian distribution
, whose mean are outputted from the actor network of the DRL model:(20) 
For each iteration, SAC samples a batch of from and updates the DRL network parameters. To deal with poor sampling efficiency and data unbalance in DRL [27], we adopt two techniques of replay buffer named emphasizing recent experience (ERE) [30] and prioritized experience replay (PER) [25] to sample data with priority and emphasis.
The update procedure of SAC is shown in Fig. 2, where is the action of next time step, i.e. the next FL round, and is the action of last time step, i.e. the last FL round.
IiiE Workflow of DearFSAC
The workflow of the DearFSAC approach is summarized in Algorithm 1, and are described as follows.

At the first communication round, the global model parameters are initialized on the PP, denoted as (line ).

All UCs send download request to the PP and download the latest global STLF model as the local STLF model. After local training, all UCs obtained corresponding updated models and training loss. During the local training, defects may occur in datasets and model parameters (lines ).

The PP selects UCs to upload models and training loss. After obtaining embedding vectors by inputting model parameters into QEEN, the PP concatenate embedding vectors, training loss, and the action at last round to get the state. By inputting the state into the DRL model, the mean
and variance
of the Gaussian distribution [10] are obtained to sample current action . Finally, by multiplying and , the PP obtains the global model and send it to selected UCs. The whole process loops until convergence (line ).
Iv Simulations & Analysis
Iva Simulation Setup
IvA1 Datasets
We conduct simulations using data from Nuuka open API [4], containing the basic information and energy data of Helsinki’s utility and service properties. As mention in Section II
, the aim of this work is to obtain an accurate STLF model using FL framework with defects occurring. To alleviate affect of different characteristics, we use Kmeans to cluster. After data cleansing and clustering, data of
UCs spanning years from st March to st March are used in this work. Each UC is set as a client. All of the clients have hourly resolution data. Considering the seasonal factors, we further split the dataset into four seasons. Then, the data before the last week of each season is used for training, and the last one week of each season is set as the test dataset.IvA2 Defect Types
As mentioned above, the proportion of defective models is . Then we design scenarios in Table II based on types of defects as follows:

DIAs:
Before making sliding windows, we simulate a normallydistributed DIAs on the training dataset by randomly selecting
of all data points to alter their loads by multiplying , where is sampled from a Gaussian distribution with mean and standard deviation [17]. 
Communication noises:
We apply the signal to noise ratio (SNR) to model the noises, and the modified parameters
is calculated as follows:(21) where indicates the level of noises.

Mixed defect: Consider more severe conditions, we design the mixed defect by adding both DIAs and communication noises into the same client.
IvA3 Comparison Approaches
To evaluate the performance of our approach, approaches are adopted for comparison:

Centralized learning (CLLSTM): The data is gathered to train a global model. Note that the data privacy may be intruded in this learning framework. Based on CL, we can construct the LSTM as CLLSTM.

Federated learning (FLLSTM): As the most common approach in FL [19], federated averaged (FedAvg), which assigns averaged weights to local models, is adopted based on the proposed LSTM.

DearFSAC without QEEN (FLLSTMSAC): QEEN is an autoencoder that needs extra computational costs for training. Besides, during FL communication, the process of QEEN may spend more time. Therefore, FLLSTMSAC is adopted to evaluate the necessity of QEEN in our proposed approach.
IvA4 Evaluation Metrics
The following metrics are used to evaluate the performance of our approach on the test dataset:

MAPE is a percentage quantifying the size of the prediction error, which is defined as:
(22) where is the actual value, is the predicted value, and is the number of predicted values.

The root mean square error (RMSE) quantifies the error in terms of electricity, which is defined as:
(23)
To evaluate the performance of approaches more precisely, we conduct simulations for times and compute the averaging metrics.
Scenario  Defect type 

I  None 
II  Data integrity attacks 
III  Communication noises 
IV  Mixed defect 
Parameter description  Value  
FL  Total training round  1000 
Local training epoch 
1  
LSTM  Hidden layer size  512 
Learning rate  1e4  
Length of sliding windows  24  
Interval of sliding windows  1  
Optimizer  Adam  
SAC  Target MAPE  2% 
’s base number  64  
Reward weight set  (0.5,0.5)  
Decay rate  0.99  
Buffer size  1e5  
Soft update rate  5e3  
Learning rate  3e4  
QEEN  Loss weight set  (0.5,0.5) 
IvA5 Details of Setup
IvB Performance Analysis
In this subsection, UCs participate in FL training under scenarios, in which is set as , and is set as for Scenario II, III, and IV. For DIAs, we set , , and as , , and , respectively. For communication noises, we set as .
Through comparisons, the approaches are evaluated in terms of MAPE and RMSE in Table IV, where the best results under each scenario are in bold. Note that there is a probability of adding defects into the CL model in each epoch.
As shown in Table IV, the results under Scenario I show that the MAPE and RMSE of DearFSAC are slightly lower than the other FLbased approaches and much lower than CLLSTM. However, under Scenario II, III, and IV, DearFSAC maintains almost the same performance as Scenario I and outperforms the other approaches. The reasons lie in key points: (1) When no defects occur, nearly averaged weights can conduct feasible model aggregation and obtain an accurate global model because of the Kmeans clustering. CLLSTM is a general model trained on the whole dataset and lacks personalization. However, FLbased approaches use model aggregation to share parameters for preventing overfitting. Since the FL training process is not a convex problem, the distributed training manner may achieve a better performance when better suboptimal solutions are found. (2) When defects reduce the model quality, errors are accumulated in the global model during the FL process. Due to DearFSAC’s capability of assigning nearly optimal weights to uploaded models, only DearFSAC can effectively conduct model aggregation for each FL communication round.
The load forecasting curves of approaches under scenarios on th February are shown in Fig. 3. We can see that though the actual demand fluctuates, approaches under Scenario I still forecast accurately, indicating that our LSTMbased STLF model is effective on this dataset. Under Scenario II, III, and IV, FLLSTMSAC performs similarly well, showing the effectiveness of the DRL algorithm. Even so, DearFSAC performs better than FLLSTMSAC, proving the superiority of our proposed approach. On the contrary, the performance of the other two approaches is not so good. When DIAs occur, as shown in Scenario II, the forecasting results of CLLSTM and FLLSTM deviate from the actual demand, but the general trend is similar to the one of actual demand, indicating that DIAs can only affect the individual values of forecasting and have little impact on the tendency. Nevertheless, as shown in Scenario III, communication noises are seemingly capable of affecting the forecasting tendency of CLLSTM, while the one of FLLSTM is more stable. More severely, the mixed defects significantly harm the STLF performance of CLLSTM and FLLSTM, where FLLSTM still seems to have a general trend similar to the actual demand. The reason is that through model aggregation, the global model can alleviate the impact of communication noises, whereas the errors are also accumulated in each FL communication round if defective models obtain respectable weights during model aggregation.
RMSE (kW)  MAPE (%)  

Scenario I  Scenario II  Scenario III  Scenario IV  Scenario I  Scenario II  Scenario III  Scenario IV  
CLLSTM  
FLLSTM  
FLLSTMSAC  
DearFSAC 
Runtime (second)  

CLLSTM  
FLLSTM  
FLLSTMSAC  
DearFSAC 
Then, to assess the computational efficiency, runtime for the round FL training is compared, where CLLSTM conducts training for epochs. As shown in Table V, the runtime of FLbased approaches is much lower than that of CLLSTM, because the FLbased framework executes parallel local training in each UC to update models in a distributed manner. Besides, though DRLadopted approaches cost more time than FLLSTM, the runtime of FLLSTMSAC and DearFSAC is still acceptable. Furthermore, compared with FLLSTMSAC, DearFSAC’s runtime is even shorter. The reason is that directly inputting highdimensional model parameters into the DRL model costs plenty of time, while QEEN spends a little time significantly reducing the model dimension for faster DRL computation.
To prove that QEEN indeed outputs embedding vectors containing defect information, we adopt tSNE to visualize each embedding vector in a twodimension space, setting as and changing . As shown in Fig. 4, embedding vectors are divided into clusters by QEEN, where normal and defective models are labeled in advance, indicating that QEEN can disguise defective model parameters and normal model parameters effectively.
IvC Robustness Analysis
In this subsection, we evaluate the robustness of DearFSAC by changing of Scenario II and III. Then, we change , , and under Scenario II, and under Scenario III. Note that we set and as and respectively. The proposed DearFSAC is compared with FLLSTM.
IvC1 Different proportions of defective models
Firstly, we conduct simulations under Scenario II and III while changing the value of to study how the proportion of defective models affects the performance. As shown in Fig. 5, as increases, the MAPE and RMSE of FLLSTM increase dramatically, indicating that the larger the is, the worse the performance of FLLSTM is. However, DearFSAC performs well despite the value of . The reason is that if is small, just a few defects affect the model aggregation, which can be covered by other normal model parameters in each round. Nevertheless, if grows, errors in each round will be increasingly accumulated. Rather than cover defects, model aggregation will be deteriorated by defective models and output a worse and worse global model which is seriously harmful to UCs’ local models. In addition, we can conclude that influences DIAs and communication noises similarly in this paper.
IvC2 Different levels of DIAs
To simulate different levels of DIAs, we adjust , , and under Scenario II. Then we compare MAPE of FLLSTM and DearFSAC.
As shown in Fig. 6, we set both the values of and to be between and by increments of . We can see that when or increases, the MAPE of FLLSTM gets larger, indicating that the larger the DIA level is, the harder the defects are. On the contrary, our proposed approach maintains good performance, which just gets worse slightly. The reason is that DearFSAC can recognize DIAs based on QEEN and assign lower weights to models with stronger DIAs. Besides, the variation of affect the model performance more seriously with larger , showing that is the more dominant factor in DIAs.
Furthermore, we change to study how the proportion of data points under DIAs affects the global model performance. In addition to MAPE, the specific squared error of each point is also recorded for more clear observation. Fig. 7 shows that as the increases, there is a steady growth of the MAPE of FLLSTM, while DearFSAC’s MAPE keeps stable. The observation enhances the conclusion that DearFSAC has the capability of recognizing models with different levels of DIAs.
IvC3 Different levels of communication noises
To simulate different levels of communication noises, we adjust under Scenario III. Then we compare the performance of FLLSTM and DearFSAC. As shown in Fig. 8, the performance of FLLSTM increasingly reduces as the increases, while the one of DearFSAC just increases a little and still keeps in a low range. The reason is that the smaller is, the more severe the communication noises are. When models are extremely defective, DearFSAC can assign reasonably low weights to them while FLLSTM assigns averaged weights and introduces large errors to the global model. Thus, DearFSAC is capable of holding a feasible performance when the communication noises are serious.
IvD Scalability Analysis
In this subsection, we evaluate the scalability of DearFSAC by changing total client number and selecting proportion under Scenario I and IV. We set , , , and as , , , and respectively. As shown in Fig. 9, small variation occurs in performance when and are both small. However, if fixing , the performance drops with increasing, and the same phenomenon applies to , showing that the number of UCs and selection proportion of uploaded models should be both kept within a small range. Besides, compared with the performance under Scenario I, our proposed approach has a slightly worse performance under Scenario IV. The reason is that increasing number of models to be aggregated leads to larger computational costs. As DearFSAC can not assign perfect weights to all selected models, accumulated errors cause a larger probability of introducing defects into the global model. Note that when is , small makes the performance a little worse, indicating that too few models participating in the aggregation will weaken the global model. In general, our proposed approach is scalable when changing the total number of UCs and the selection proportion of uploaded models.
V Conclusion & Future Works
In this paper, DearFSAC, a DRLassisted FL approach, is proposed to robustly integrate STLF models for individual PPs. By adopting FL, one PP can obtain an accurate STLF model just using UCs’ local models, which protects data privacy. Considering defects, the SAC algorithm is adopted to conduct robust model aggregation. In addition, for better convergence of our approach, QEEN, an autoencoder, is designed for both dimension reduction and quality evaluation of uploaded models. In simulations, our approach shows the superiority and robustness of the proposed approach in utility demand forecasting.
In the future, we will focus on the following three aspects: 1) Stronger robustness considering more types of defects; 2) improvement under more distributed scenarios; 3) extension of more types of energy resources.
References
 [1] 2020 state of the market report for the ercot electricity markets. Note: https://www.puc.texas.gov/industry/electric/reports/ERCOT_annual_reports/Default.aspx Cited by: §I.

[2]
(2020)
Deepbased conditional probability density function forecasting of residential loads
. IEEE Transactions on Smart Grid 11 (4), pp. 3646–3657. Cited by: §I.  [3] (2016) An accurate and fast converging shortterm load forecasting model for industrial applications in a smart grid. IEEE Transactions on Industrial Informatics 13 (5), pp. 2587–2596. Cited by: §I.
 [4] Nuuka open api. Note: https://helsinkiopenapi.nuuka.cloud/swagger/index.html Cited by: §IVA1.
 [5] (2017) Shortterm industrial load forecasting: a case study in an italian factory. In 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGTEurope), pp. 1–6. Cited by: §I.
 [6] (2013) A strategy for shortterm load forecasting by support vector regression machines. IEEE Transactions on Power Systems 28 (4), pp. 4356–4364. Cited by: §I.

[7]
(2020)
Robust shortterm electrical load forecasting framework for commercial buildings using deep recurrent neural networks
. Applied Energy 278, pp. 115410. Cited by: §I.  [8] (2019) Reinforcement learning based dynamic model selection for shortterm load forecasting. In 2019 IEEE Power & Energy Society Innovative Smart Grid Technologies Conference (ISGT), pp. 1–5. Cited by: §I.
 [9] (2019) (Deep) reinforcement learning for electric power system control and related problems: a short review and perspectives. Annual Reviews in Control 48, pp. 22–35. Cited by: §I.

[10]
(2018)
Soft actorcritic: offpolicy maximum entropy deep reinforcement learning with a stochastic actor.
In
International conference on machine learning
, pp. 1861–1870. Cited by: 3rd item, §IIID2, §IVA5.  [11] (1997) Long shortterm memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §IIIB.
 [12] (2020) A deep learning method for shortterm residential load forecasting in smart grid. IEEE Access 8, pp. 55785–55797. Cited by: §I.
 [13] (2017) Shortterm residential load forecasting based on resident behaviour learning. IEEE Transactions on Power Systems 33 (1), pp. 1087–1088. Cited by: §I, §IIIB, §IVA5.
 [14] (2020) Federated learningbased ultrashort term load forecasting in power internet of things. In 2020 IEEE International Conference on Energy Internet (ICEI), pp. 63–68. Cited by: §I.
 [15] (2021) A privacypreserving federated learning method for probabilistic communitylevel behindthemeter solar generation disaggregation. IEEE Transactions on Smart Grid 13 (1), pp. 268–279. Cited by: §I.

[16]
(2021)
Spatialtemporal residential shortterm load forecasting via graph neural networks
. IEEE Transactions on Smart Grid 12 (6), pp. 5373–5384. Cited by: §I.  [17] (2018) Benchmarking robustness of load forecasting models under data integrity attacks. International Journal of Forecasting 34 (1), pp. 89–104. Cited by: 1st item, 1st item.
 [18] (2020) A comprehensive review of the load forecasting techniques using single and hybrid predictive models. IEEE Access 8 (), pp. 134911–134939. External Links: Document Cited by: §I.
 [19] (2017) Communicationefficient learning of deep networks from decentralized data. In Artificial intelligence and statistics, pp. 1273–1282. Cited by: 2nd item.
 [20] (2019) Machine learning algorithms for shortterm load forecast in residential buildings using smart meters, sensors and big data solutions. IEEE Access 7, pp. 177874–177889. Cited by: §I.
 [21] (2020) Shortterm load forecasting algorithm using a similar day selection method based on reinforcement learning. Energies 13 (10), pp. 2640. Cited by: §I.
 [22] (1990) Markov decision processes. Handbooks in operations research and management science 2, pp. 331–434. Cited by: §I.
 [23] (2022) Poisoning attacks against federated learning in load forecasting of smart energy. In NOMS 20222022 IEEE/IFIP Network Operations and Management Symposium, pp. 1–7. Cited by: §I, 2nd item.
 [24] (2021) Shortterm energy consumption forecasting at the edge: a federated learning approach. IEEE Access 9, pp. 95949–95969. Cited by: §I.
 [25] (2015) Prioritized experience replay. arXiv preprint arXiv:1511.05952. Cited by: §IIID2.
 [26] SRP timeofuse price plan. Note: https://www.srpnet.com/prices/home/tou.aspx Cited by: §I.
 [27] (2018) Reinforcement learning: an introduction. MIT press. Cited by: §I, §IIID2.
 [28] (2020) Electrical load forecasting using edge computing and federated learning. In ICC 20202020 IEEE International Conference on Communications (ICC), pp. 1–6. Cited by: §I.

[29]
(2021)
Overcoming noisy and irrelevant data in federated learning.
In
2020 25th International Conference on Pattern Recognition (ICPR)
, pp. 5020–5027. Cited by: 3rd item.  [30] (2019) Boosting soft actorcritic: emphasizing recent experience without forgetting the past. arXiv preprint arXiv:1906.04009. Cited by: §IIID2.
 [31] (2020) Optimizing federated learning on noniid data with reinforcement learning. In IEEE INFOCOM 2020IEEE Conference on Computer Communications, pp. 1698–1707. Cited by: §I, §IIB, §IIID1.
 [32] (2020) Shortterm load forecasting for industrial customers based on tcnlightgbm. IEEE Transactions on Power Systems 36 (3), pp. 1984–1997. Cited by: §I.
 [33] (2022) Robust federated learning with noisy labels. IEEE Intelligent Systems. Cited by: 3rd item.

[34]
(2020)
Probabilistic solar irradiation forecasting based on variational bayesian inference with secure federated learning
. IEEE Transactions on Industrial Informatics 17 (11), pp. 7849–7859. Cited by: §I.  [35] (2021) Hybrid deep learning gaussian process for deterministic and probabilistic load forecasting. In 2021 IEEE/IAS Industrial and Commercial Power System Asia (I CPS Asia), Vol. , pp. 456–463. External Links: Document Cited by: §I.

[36]
(2010)
Parallelized stochastic gradient descent.
. In NIPS, Vol. 4, pp. 4. Cited by: §IIIC.