Load forecasting is an essential part of the development of the smart grid. Long-term load forecasting is deemed necessary for infrastructure planning, while mid-term and short-term load forecasting are key tasks in system operations [i1]. Day-to-day operational efficiency of electrical power delivery, in particular, requires an accurate prediction of short-term load profiles, which is based on collecting and analysing large volumes of high-resolution data from households. However, individual short-term load forecasting (STLF) has been proven to be a challenging task because of profile volatility. In fact, the electrical load of a house has a high correlation to its residents’ behaviour, which is too stochastic and often hard to predict [r5, said_advanced_2013].
Benchmarks for state-of-the-art methods [r2, r3] have found that deep neural networks are a promising solution for the STLF problem at the household level, due to their ability to capture complex and non-linear patterns. Neural networks outperform other prediction methods such as Auto Regressive Integrated Moving Average (ARIMA)[filali_prediction-based_2019]
and Support Vector Regression (SVR). Nevertheless, applying deep learning models alone will not lead to significant improvements, as models tend to suffer from overfitting[i4]. An overfitted model is a model that learned the details of the training data including the noise, which affects its ability to generalize when applied to new data. To tackle this issue, it is recommended to increase the diversity and size of the used data by combining usage records from different households. Typically, proposed frameworks [r6, r7] assume that all data records are transferred from smart meters to a centralized computational infrastructure through broadband networks to train models. Nevertheless, this assumption raises concerns related to privacy, since the load profiles reveal a lot of sensitive information, such as device usage and the household’s occupancy. Sending such detailed data over networks makes it exposed to malicious interception and misuse.
To address privacy concerns while still increasing data records’ volume and variety, a new on-device solution was recently proposed by the Machine Learning community: Federated Learning (FL) [f3]. Federated Learning is a decentralized machine learning scheme, where each device participates in training a central model without sending any data. As illustrated in Fig.1, the server first initializes the model either arbitrarily or by using publicly available data. Then, the model is sent to a set of randomly selected devices (clients) for local training using their data. Each client sends to the server an update of the model’s weights, which will be averaged and used to update the global model. This process will be repeated until the global model stabilizes.
The main purpose of this paper is to evaluate the use of Edge computing, together with the Federated Learning approach in the STLF challenge for electricity in households. Edge computing refers to data processing at the edge of a network as opposed to cloud or remote server processing. We use Long-short Term Memory (LSTM)[lstm1], a deep neural network for forecasting time series, which uses previous observations of the house’s electrical load to predict future ones. We study a group of houses that have similar properties (geographical location, type of building), on a short period of time to avoid the weather’s fluctuations and seasonality impact. Federated learning is performed on houses grid Edge equipment. Edge equipment is usually present at the end of the electrical distribution system as a smart interface between the customer and the electric power supply, be it a smart meter or a more sophisticated equipment. Our contributions in this work can be summarized as follows: (1) We propose an enabling architecture for FL using Edge equipment in the smart grid; (2) We evaluate the potential gain of FL in terms of accuracy through simulations; and (3) we evaluate the potential network load gain through numerical results. To these contributions, we add the gain in privacy leveraged by decentralization and Edge computing.
The remainder of this paper is structured as follows: Section II discusses related works focusing on load prediction and privacy. In Section III, we define the proposed approach and used methods. Section IV introduces the simulations and numerical results. Then in Section V we discuss the limitations and future work. Section VI concludes the paper.
Ii Related work
Many recent research works used deep neural networks, and particularly Long-short term memory (LSTM) to tackle the short-term load forecasting challenge. In fact, benchmarks have proved LSTM’s potential compared to other methods[i2, i3], yet the results do not match the level of desired exactitude in terms of Root Mean Square Error (RMSE) and Mean Average Percentage Error (MAPE). In order to improve forecasting accuracy, authors in [r1] propose to use a variant of LSTM that is a sequence-to-sequence LSTM, which gives better results for one-minute resolution data, but no significant improvement for the one-hour resolution compared to standard LSTM. Furthermore, other authors [r2]
consider the problem of finding the best LSTM network to be a hyperparameter tuning problem, and use the genetic algorithm to this end. They state that finding the best combination of window size and number of hidden neurons in each layer remains a probabilistic task.
Some other works see that the problem is not simply an neural network architecture problem, and that ability of generalization of data-driven forecasting models is the real issue. In fact, many of the proposed models’ accuracy drops when they are applied to new datasets [r3]. Some works suggest to use complementary data about the weather [r4] or records from the appliances [r5]. While the weather has a real impact on the aggregated electrical consumption, the individual short-term load is more related to the occupants’ behaviour[said_advanced_2013, said_scheduling_2014, rezgui_smart_2017]. However, collecting data from appliances around each house is an expensive and privacy-intrusive task.
Another approach to enrich the training data is grouping data from several customers. Authors in [r6]
use clustering to group users with similar profiles, hence reducing the variance of uncertainty within groups. Authors in[r7] propose a pooling technique that increases data’s diversity to overcome the overfitting problem. Nonetheless, these methods are heavily centralized and are prone to privacy-issues.
Fine-grained consumption data sent over networks is subject to many privacy threats when leaked through unauthorized interception or eavesdropping [r8]. Many efforts were conducted to protect the users’ identities in the smart grid. For instance, authors in [r9] propose a clustering-based method where each group of users who are geographically close receive a common serial number. However this method makes it hard to treat each client individually because of the anonymity. Other works’ focus is masking the consumption data, where data aggregation is the most popular method [r10, r11], but it goes in opposite directions with STLF requirements.
In regards to user privacy and prediction accuracy, none of the aforementioned papers address both of these aspects. In the proposed work, we suggest to use the Edge Equipment that compose the Home Area Network (HAN) to carry out operations related to client selection and training neural network at the Edge following the federated learning scheme, allowing the use of data to train a global model without compromising the resident’s privacy.
Iii System Model
We propose the network architecture shown in Fig.2 with two main components: a Multi-access Edge Computing (MEC) server [mec] and clients. Clients are houses with Edge equipment which is essentially composed of smart-meters and other devices in the HAN. FL is used to build a global LSTM-based model for STLF. The training rounds are orchestrated by the MEC server and executed by the clients using their own electrical consumption data. In this section, we explain in detail LSTM and how it comes to use in the forecasting, as well as FL and how it is used in our system model.
Iii-a Time series forecasting using LSTM
The prediction of the future electrical load in this work is achieved through the time series forecasting approach with LSTM. A time series refers to an ordered sequence of equally-spaced data points that represent the evolution of a specific variable over time. Time series forecasting is enabled through modeling the dependencies between the points of current data points and historical data, but the accuracy of the predictions relies heavily on the chosen model and the quality of historical data points.
LSTM is a recurrent neural network (RNN) that is fundamentally different from traditional feedforward Neural networks, and more efficient than standard RNNs. Sequence learning is LSTM’s Forte. It is able to establish the temporal correlations between previous data points and the current circumstances, while solving vanishing and exploding gradient problems that are common in RNNs. Gradient vanishing means that the norm of the gradient for long-term components gets smaller causing weights to never change at lower layers, while the gradient exploding refers to the opposite event[lstm1]. This is achieved through its key components: the memory cell that is used to remember important states in the past, and the gates that regulate the flow of information. LSTM has three gates: the input gate, the output gate and the forget gate. They learn to reset the memory cell for unimportant features during the learning process. Almost all state of the art results in sequence learning are achieved with LSTM and its variants especially language translation and speech recognition. In the case of residential STLF, it is expected that the LSTM network would be able to form an abstraction of some residents’ states from the provided consumption profile, maintain the memory of the states, and make a forecast of the future consumption based on the learnt information.
Iii-B Federated Learning
Federated learning is a form of machine learning where most of the training process is done in a distributed way among devices referred to as clients. It was first proposed and implemented by Google on keyboards of mobile devices for next word prediction [f1]. This approach is ideal for many cases: 1) When data is privacy sensitive, 2) when data is large in size compared to model updates, 3) highly distributed systems where the number of devices is orders of magnitude larger than nodes in a data center, 4) in supervised training when labels can be inferred directly from the user. Federated learning has also proven to be very useful when datasets are unbalanced or non-identically distributed.
An iteration of federated learning goes as follows : First, a subset of clients is chosen and each of them receives the current model. In our case, clients are hosted at Edge equipment in houses (e.g. smart meters). Clients that were selected compute Stochastic Gradient Descent (SGD) updates on locally-stored data, then a server aggregates the client updates to build a new global model. The new model is sent back to another subset of clients. This process is repeated until the desired prediction accuracy is reached. The operations are detailed in Algorithm 1.
In order to combine the client updates, the server uses the FederatedAveraging algorithm [f3]. First, the initial global model is initialized randomly or is pre-trained using publicly available data. In each training round , the server sends a global model to a subset of clients who have enough data records and whose consumption load varies enough to enrich the training data. This condition was added to ensure that we have enough variation in terms of data points to give a representation of the occupants’ regular consumption. Afterward, every client in the subset uses examples from its local data. In our case, the volume is related to how long the smart meter has been generating data and how many of it is saved locally. The used dataset is composed of sliding windows with a predetermined number of look-back steps.
SGD is then used by each client to compute the average gradient , with a learning rate . The updated models are sent to the server to be aggregated.
However, the centralized model may not fit all the users’ electrical consumption. A proposed solution to this problem is Personalization. Personalization is the focus of many applications that require understanding user behaviour and adapting to it. It consists on retraining the centralized model using user-specific data to build a personalized model for each user. This can be achieved through retraining the model for a small number of epochs locally using exclusively the user’s data[f6].
Federated learning has fewer privacy risks than centralized server storage, since even when data are anonymized, the users’ identities are still at risk and can be discovered through reverse engineering. The model updates sent by each client are ephemeral and never stored on the server; weight updates are processed in memory and are discarded after aggregation. The federated learning procedure requires that the individual weight uploads will not be inspected or analyzed. This is still more-secure than server training because the network and the server cannot be entrusted with fine-grained user data. Some data still have to be sent in an aggregated form for billing, but these data do not reveal many details. Techniques such as secure aggregation [f5] and differential privacy[f4] are being explored to enforce trust requirements.
Iii-C Networking Load Gain
To evaluate the gain in network load in FL contrast to centralized training, we first define the network load for a server in centralized training in Eq. 1 and the network load in FL in Eq. 2.
is the size of data sent by the client and is the size of the model. In the centralized training, is the number of hops between client and the server.
where is the number of hops between the client selected in round and the server, and is the number of users in each subset.
Using Eq.1 and Eq.2, we define the gain in networking load as follows :
Iv Simulation and results
Iv-a Dataset Pre-Processing and Evaluation Method
This research was conducted using data from Pecan Street Inc. Dataport site. Dataport contains unique, circuit-level electricity use data at one-minute to one-second intervals for approximately 800 homes in the United States, with Photovoltaics generation and Electrical Vehicles charging data for a subset of these homes [database]. We chose a subset of 200 clients who have similar properties from this dataset. It is composed of the same kind of houses (detached-family homes), located in the same area (Texas). The dataset is composed of records between January 1st 2019 and March 31st 2019 with a one-hour resolution data. The weather fluctuations in this period are low, so the seasonal factor can be ignored in this study. The data of each client is prepared to be ready for further analysis. First, we transform the data to be in a scale between 0 and 1. Then we transform the time series into sliding windows with look-backs of size 12 and a look-ahead of size 1. Finally, we split data into train and test subsets (90% for training and 10% for test). We also split the clients into two groups : 180 participating in the federated learning process, and 20 are left for further evaluation for how well the model can fit non-participating clients.
We use RMSE and MAPE to evaluate the model’s performance with regard to the prediction error. RMSE allows us to quantify the error in terms of energy, while MAPE is a percentage quantifying the size of the error relative to the real value. The expressions of RMSE and MAPE are as follows:
where is the predicted value, is the actual value and is the number of predicted values.
Iv-B Simulations setup
The simulations were conducted on a laptop with a 2,2 GHz Intel i7 processor and 16GB of memory and NVIDIA GeForce GTX 1070 graphic card. We used Tensorflow Federated 0.4.0 with Tensorflow 1.13.1 backend.
Hyper-parameter tuning in deep learning models is important to obtain the best forecasting performance. However, in this work, we only focus on evaluating the federated learning paradigm. Previous work shows performance insensitivity to combinations of some layers and layer size, as long as we use multiple layers and that the number of hidden nodes is sufficiently large [s1]
. It was also suggested that very deep networks are prone to under-fitting and vanishing gradients. Following these rules, the initial model hyper-parameters (e.g number of layers, and time steps to be considered) were chosen by random search on a randomly selected client’s data. The retained model has two LSTM hidden layers composed of 200 neurons each. The loss function used is Mean squared error and the optimiser chosen is Adam. The model converges around the 20th epoch and thus we use close values for rounds and epochs.
Iv-C Numerical Results
1) Evaluated scenarios:
The different scenarios that were evaluated are summarized in Table I. As explained in the previous section, in each round, only a subset of clients train the model. We modify the number of clients in the subset selected in each round, to see the effect of larger subsets.We also vary the number of epochs of local training. In all the scenarios, the federated learning algorithm was run for 20 rounds.
|Scenarios||Clients in subset||Local Epochs|
2) Results for global models:
The evaluated scenarios resulted in global models that are obtained following the federated learning approach. These models are evaluated in terms of RMSE and MAPE as shown in Tables II and III. Null consumption values have been disgarded when calculating MAPE. Table II summarizes the results for the participating clients in the different scenarios. In our case, the load forecast is on a granular level (single house) and on a short term (1 hour), therefore the values of MAPE achieved in Table II for various models are reasonable, and this level of accuracy is anticipated as similar values have been reported by previous works [s1, s3]. These works also report that the forecasting accuracy tends to be low for short-term forecasting horizons. One of the most notable things we notice is that the global model fits some clients better than others when considering the fact that not all clients have similar profiles. We also notice that selecting a bigger number of clients in each round is preferable, but in cases where sending updates is more expensive in terms of networking, the difference can be compensated by using more local training epochs. The results are similar when applied to the set of clients who did not participate in the training.
3) Behaviour of personalization:
In this section, we study the effect of personalization on the performance of the models. First we test if re-training the model locally for the participant clients gives better results. Then we apply the same thing to the set of clients who did not participate in the training. The models were retrained for 5 epochs for each client. Results for the set of clients participating in the training are summarized in Table IV and for the non-participating clients in Table V. We notice an overall improvement of most of the models. For example, the model 1 has an overall improvement of 5.07% in terms of MAPE for the participating set of clients and of 4.78% on the non-participating clients set. However, for some clients, the performance can not be improved despite retraining, and this, as we mentioned earlier, is related to the quality of historical data points. Applying the models to these clients’ consumption profiles results in very high MAPE, which affects the average results. These clients should be treated as outliers, nonetheless, this is beyond the scope of this study.
To illustrate the improvements on predictions using personalization, we randomly selected a client from the participant set (client 4313) and a client from the non-participant set (client 8467). We applied the global model 4 and the corresponding personalized models. The actual load profiles and the predicted profiles are shown in Fig.3 and Fig.4. Both models fit the overall behaviour of the consumption profiles.
We conclude that we can indeed train powerful models for a population’s consumption profiles using only a subset of the users forming it. For applications that have high accuracy requirements, the model can be retrained resulting in a personalized model that follows the profile’s curves better, yielding more accurate predictions. Nonetheless, the predictions obtained with the global model can be a good starting point for new clients who don’t have enough data for personalization.
4) Gain in network load:
To illustrate the gain in the network load, we can consider the most basic case where the distance between all the clients and the MEC server is 1-Hop. The size of the model is 1,9Kb and the size of the used data is 16Mb. Using Eq.3, the gain in the scenarios 1 and 3 is 97%, while scenarios 2 and 4 result in a gain of 90%. This is a significant gain, especially when considering that the approach could be applied at the scale of a city or bigger, for example.
V Remarks & future work
The feasibility of the proposed approach is dependent on the capabilities of the edge devices to perform local training. New IoT devices have sufficient computing hardware to run complex machine learning models, but training a neural network is very likely to compromise device performance. However, some lightweight machine learning frameworks have emerged such as Tensorflow Lite 111https://www.tensorflow.org/lite which provides solid ground for future implementations.
The accuracy of the models, even after personalization, still varies depending on the user. To improve the results, neural networks should be coupled with other methods, such as a prior clustering of clients using criteria other than the geographical proximity. Solving the problem of outliers in this context should also be investigated.
Individual short-term load forecasting is a challenging task considering the stochastic nature of consumption profiles. In this paper, we proposed a system model using Edge computing and federated learning to tackle privacy and data diversity challenges related to short-term load forecasting in the smart grid.To the best of our knowledge, this represents one of the first studies of federated learning in the smart grid context. Unlike centralized methods, in the proposed system federated learning uses edge devices to train models, hence reducing security risks to the ones related to the device only. We conducted experiments to evaluate the performance of both centralized and personalized models in federated settings. The simulations results show that it is a promising approach to create highly performing models with a significantly reduced networking load compared to a centralised model, while preserving the privacy of consumption data.
The authors would like to thank the Natural Sciences and Engineering Research Council of Canada, for the financial support of this research.