Since the building sector is responsible for 20% of global energy consumption, energy conservation plans are introduced to encourage reductions in this sector [0.1eia2016energy]. It is essential to consider smart energy management solutions to achieve the reduction goal [7hosseini2017non]. However, these solutions require an appliance load monitoring system to obtain the status of individual appliances in households and to measure their energy consumption [6welikala2019incorporating].
Non-intrusive load monitoring (NILM) is a technique, proposed by Hart [20hart1992nonintrusive], that enables individual appliance load monitoring with a minimum of sensors. Usually, only one sensor measures the total power consumption of the building. An energy disaggregation algorithm is then used to separate individual appliance loads from the aggregate signal containing consumption of all appliances. While many solutions have been proposed, a few drawbacks still prevent the practical deployment of NILM: the scalability issue of NILM systems due to the complexity of the disaggregation task, which increases exponentially with the number of appliances and states, the challenge of near-real time capabilities for the disaggregation task, and model generalization are some examples of obstacles to overcome [19nalmpantis2019machine].
Traditional approaches for energy disaggregation include graph signal processing, Hidden Markov Models (HMM), and their variants[17kolter2012approximate, 18he2016non, 16ji2019non]. However, they suffer from a scalability problem that hinders performance as the number of appliances increases. This situation is problematic in real use cases where more than 15 appliances can easily be present in a typical house.
Other approaches propose the training of one model per appliance type to alleviate the scalability issue. Generally, these approaches are based on deep neural networks (DNN), such as long-short term memory (LSTM) or denoising autoencoders (DAE)[12kelly2015neural, mauch2015new, 14bonfigli2018denoising, zhang2016sequencetopoint, jiang2019deep, kaselimi2019multi]. These solutions require sub-metering appliances during the supervised training phase. These models disaggregate well when the training and testing data are extracted from the same house. However, they still need improvements to be as effective when deployed in a different house. Furthermore, some models, such as sequence-to-point (S2P) [zhang2016sequencetopoint], are non-causal approaches. Therefore, they cannot be used in a near-real time context, which is problematic for time-shifted strategies used by home energy management systems (HEMS) to schedule energy-consuming appliances. In this perspective, [harell2019wavenilm] proposes an approach based on dilated convolutional layers for the task of energy disaggregation. This autoregressive network has the particularity of being causal. Therefore, it is more adapted to the disaggregation process without requiring a delay for the decision taker. Although this type of model gives promising results, it is more efficient when the number of appliances in the system remains low.
Another challenge in this field is that the performance of many approaches degrades significantly for multi-state appliances [12kelly2015neural, kong2019practical, wu2019concatenate]. These multi-state appliances, such as washing machines and dishwashers, are among the appliances that account for a significant share of total consumption. Moreover, they are in the category of delayable appliances, making them strategic appliances that can be time-shifted by a HEMS within smart grids. Thus, NILM systems must be effective to disaggregate loads for this type of appliance.
Generative Adversarial Networks (GAN) [goodfellow2014generative] have demonstrated impressive capabilities for image synthesis and other tasks in a variety of fields, such as video games. Particularly, [bao2018enhancing, pan2020sequence] propose approaches based on the GAN framework for the energy disaggregation. The results show improved performance for the reconstruction of consumption signals, especially for multi-state appliances.
Variational autoencoders (VAE) [VAE] are another important group of generative models that have received particular attention in recent years [VAE_wavenet, liang2018variational]
. As a generative model, the VAE is more stable during training. Its regularized latent space allows interpolations between two distributions learned during training to generate realistic appliance load profiles once decoded. Load profiles often vary from one activation to another for the same type of appliance. VAE is, therefore, well suited to the task of energy disaggregation, to generate the power signal of the target appliance using the aggregate power as input. Sirojanet al. introduce the convolutional VAE for the energy disaggregation in [sirojan2018deep]. Their method outperforms state-of-the-art approaches based on neural networks [12kelly2015neural, zhang2016sequencetopoint]. This VAE framework demonstrates a promising avenue for generative approaches in the NILM domain.
In this work, we propose a method based on the VAE neural network framework for the energy disaggregation task. As in [sirojan2018deep]
, the proposed approach consists of two components: an encoder which maps information into a latent space, and a decoder that reconstructs the power signal of the target appliance using the latent representation. However, the proposed encoder uses a succession of instance-batch normalization networks (IBN-Net)[xingangTwo]
to enhance high-level feature extraction from the aggregate measurement. Furthermore, we implement skip connections between the encoder and decoder, allowing the decoder to benefit the feature maps from the encoder, as in the U-Net architecture[ronneberger2015u]. These connections offer a global insight of the aggregate power consumption to the decoder and therefore allow a better reconstruction of the target signal by the deconvolution layers.
One of the key benefits of the proposed approach is the regularized latent space provided by the VAE framework which facilitates the encoding of relevant features of the aggregate signal. While batch and instance normalization in the model helps to stabilize and improve the learning process, skip connections enhance the signal reconstruction performance. The proposed approach is extensively analyzed against state-of-the-art approaches. To assess generalization capability, tests are carried out on three different houses from the UK-DALE dataset [kelly2015uk]. The obtained results show that, on average, the proposed VAE-NILM outperforms state-of-the-art approaches and demonstrates higher accuracy for signal reconstruction of multi-state appliances.
The rest of this paper is organized as follows. Section II gives a brief overview of DNN techniques for NILM. The proposed solution based on VAE for the energy disaggregation task is presented in Section III. Section IV describes the experiment setup, followed by results and discussions in Section V. Finally, the paper concludes and projects future research in Section VI.
NILM is a technique used to monitor the individual consumption of household appliances using a single sensor that measures the total power of the whole house. The NILM problem can be formulated as follows: Let be the aggregate measurements containing power consumption of all appliances at time . Generally, we suppose that represents the active power. Then the aggregate signal can be expressed by:
where represents the power consumption of the appliance , is the number of appliances and is the measurement’s noise. Using an energy disaggregation algorithm, the objective is to recover individual appliance power consumption given only the aggregate measurement.
DNN approaches have been applied to energy disaggregation for many years [12kelly2015neural, mauch2015new]
. These state-based approaches are mainly used for low-frequency (<1 Hz) monitoring which requires lower-cost hardware. Methods based on recurrent neural networks[12kelly2015neural], such as LSTM [kim2017nonintrusive, mauch2015new]krystalakos2018sliding], have been primarily proposed as they are well suited for 1D time series data. In [kaselimi2019bayesian], the authors extend the RNN and propose a Bayesian optimized bidirectional LSTM model for NILM.
To improve feature extraction, architectures based on convolutional neural network (CNN) have been proposed for energy disaggregation. Reference[14bonfigli2018denoising] proposes a DAE approach for NILM. In this context, the input signal corresponds to the aggregate consumption of all appliances and the desired output is the power consumption of a single target appliance. Thus, the power consumption of appliances other than the target appliance is considered as noise in the input signal. The work of Zhang et al. [zhang2016sequencetopoint] presents two approaches based on CNN: sequence-to-sequence (S2S) and S2P. The latter is trained to predict the target appliance’s power consumption only for the midpoint of the input window. This allows the model to focus on predicting a single value rather than the entire sequence considering that the window edges are harder to predict. The S2P approach improves energy disaggregation performance, but it significantly increases the computational complexity of the method compared to S2S. Xia et al. propose the D-ResNet model [xia2019non]
, which uses dilated convolutional layers with residual connections. The dilated convolutional layers allow a larger receptive field without increasing the model’s complexity, whereas the residual connections facilitate the gradient flow during training. However, most of these DNN models still suffer from a lack of generalization, and the test performance is often sensitive to the training dataset. Consequently, the models have more difficulty to disaggregate loads when deployed in further houses[15DBLP:journals/corr/FaustineMKM17].
Generative approaches for energy disaggregation, such as GAN [bao2018enhancing, pan2020sequence] demonstrate improvements in signal reconstruction and generalization compared to state-of-the-art methods. In particularly, [pan2020sequence] proposes a sequence-to-subsequence (S2SS) model to make a trade-off between traditional S2S and S2P methods, to balance the convergence difficulty and the computational complexity. The generator of the S2SS model is based on the U-Net architecture, which increases the quality of the generated power consumption signals. A generative model based on the VAE framework is proposed in [sirojan2018deep]. The regularized latent space of the VAE framework encourages the encoder to map the relevant information contained in the aggregated signal. This allows the decoder to better reconstruct the power consumption signal of the target appliance, and thus offers better disaggregation performance than CNN-based models.
Iii Proposed Solution
In this section, we first review the VAE framework and then describe the proposed model.
Iii-a VAE Framework
The VAE [VAE] proposes a probabilistic framework that maps inputs in a distribution over a continuous latent space z rather than a single point. The true posterior density is intractable, which results in an indifferentiable marginal likelihood . To address it, the probabilistic encoder is introduced to approximate the true conditional inference distribution . The variational parameters are learned jointly with the generative model parameters following the variational principle, where can be written as:
where is the variational lower bound to optimize. The expectation term of (III-A) encourages the reconstruction accuracy of the probabilistic decoder , while the Kullback-Leibler (KL) divergence term acts as a regularizer to constrain the approximate posterior to be close to the prior . Generally, is assumed to be a centered isotropic multivariate Gaussian with identity covariance and the variational approximate posterior is . In this case, both and are Gaussian, so that can be calculated in closed form. In practice, both the encoder and the decoder are neural networks parameterized by and respectively. This way, and correspond to the encoder outputs and they are learned from observed datasets through the objective function given in (III-A).
Since the decoder samples from
, the gradient cannot be backpropagated through the stochastic units within the network. To address it, we use a reparameterization trick[VAE]. We start by sampling an auxiliary variable and then compute . Here, denotes an element-wise product.
Iii-B Proposed Model
In this work, we propose an energy disaggregation model based on the VAE [VAE] framework. The whole network consists of two components, as shown in Fig. 1 (a): The encoder distills relevant target appliance information from the aggregate signal x into the latent space z, and the decoder reconstructs only the power signal of the target appliance from z.
The inputs of the model are sequences of aggregate power extracted containing time steps. The sequences are obtained using a sliding window. Each input sequence is processed separately by the model to generate an output sequence corresponding to the power of the target appliance.
The proposed network architecture is composed of IBN-Nets as shown in Fig. 1 (b). The IBN-Net subnetwork architecture combines instance and batch normalization [xingangTwo]
. Batch normalization in convolutional layers increases the discriminating capacity of the learned features and thus allows the encoder to have more relevant features to map on the latent space. Moreover, instance normalization in shallow layers of the network improves generalization performance, which remains one of the weaknesses of many NILM approaches. The IBN-Net consists of three successive convolution layers combined with batch normalization and a rectified linear unit activation function (ReLU). A residual connection joins the IBN-Net input to the instance normalization layer to facilitate gradient flow through the entire model during training and also prevents the vanishing gradient problem.
The encoder architecture comprises seven IBN-Nets each followed by a max-pooling layer to decrease the temporal resolution. This encourages the learning of high-level features describing the target appliance. Two fully connected layers translate the output of the IBN-Net stack into the distribution parametersand of the latent space z.
The architecture of the decoder is similar to that of the encoder. It consists of seven IBN-Nets followed by deconvolution layers to progressively increase the temporal resolution and reconstruct the signal of the target appliance. We concatenate through skip connections the outputs of the corresponding IBN-Net from the encoder to the decoder. In this way, the decoder can also benefit from the feature maps extracted by the encoder, useful for recovering the power consumption details from the aggregate measurement. The reconstruction of the target appliance consumption is thus more accurate than if it uses only the latent representation.
We experiment to assess the proposed VAE-NILM performance against state-of-the-art methods. We design the experiment to measure the detection and reconstruction capabilities of appliance power consumption. We divide the experimental protocol into three use-case scenarios that involve evaluating the model on three different houses.
The UK-DALE [kelly2015uk] dataset is used in our scenarios. This reference dataset is a set of five houses containing respectively, 54, 20, 5, 6, and 26 sub-metered appliances. Each house includes the aggregate measurement of electrical power as well as the individual electrical power of all appliances sampled at Hz. In this paper, we focus the disaggregation on three On/Off appliances: refrigerator, kettle and microwave, and on two multi-states appliances: washing machine and dishwasher which are all available in houses 1, 2, and 5. We ignore houses 3 and 4 because they contain only a few appliances.
Iv-B Reference Methods
We compare our model against four state-of-the-art methods. We select the DAE approach proposed by Kelly et al. [12kelly2015neural] as a first comparison method. Second, we compare the proposed model with the work of Zhang et al. [zhang2016sequencetopoint], which proposes two approaches, S2S and S2P, based on CNN. Finally, we choose the generative approach, S2SS [pan2020sequence], based on the GAN framework as a comparison algorithm. We carry out experiments using the original authors’ implementations available for each method: DAE111https://github.com/JackKelly/neuralnilm, S2S222https://github.com/MingjunZhong/NeuralNetNilm and S2P333https://github.com/MingjunZhong/seq2point-nilm, and S2SS444https://github.com/DLZRMR/seq2subseq.
Iv-C Experimental Protocol
In this work, we focus on scenarios where test houses are held out during the training process, as is done in [12kelly2015neural, zhang2016sequencetopoint, pan2020sequence]. For each scenario, we hold out a house for the test and train the model on the other two houses, as detailed in Table I. In this way, the experiment represents a real use case where the model would be deployed in houses without the need for sub-metering during a training period.
For each house, the aggregate and sub-metered measurements are separated into sequences using a sliding window. The data available for house 1 correspond to a collection period of more than five years, which is much longer than for the other houses. Thus, to balance the training set, we randomly select 15% of the total number of sequences in house 1 and we complete the training set with all sequences of a second house. We train one model for each type of appliance on 80% of the training set, and we use the remaining 20% for validation. We employ the same sliding window technique for the test house and recombine overlapped portions using the median filter, as done in [14bonfigli2018denoising]
. We reproduce the same protocol for each reference method. Finally, we repeat the process ten times and average scores over all repetitions. We use Welch’s t-test to assess the statistical significance of the results.
|Scenario 1||Scenario 2||Scenario 3|
|Train||1, 5||2, 5||1, 2|
Iv-D Implementation Details
The encoder and decoder of the proposed model contain seven groups of layers, i.e., IBN-Net and max pooling for the encoder, and IBN-Net, concatenation, and deconvolution for the decoder. The number of filters for the convolution layers in all IBN-Nets is 64, 64, and 256 respectively. The max-pooling divides by two the temporal resolution every step in the encoder, whereas the deconvolution operation increases by two the temporal resolution in the decoder. All the hyperparameters of the proposed model and training process were tuned during the preliminary experiments using a grid search technique.
We train the proposed model through supervised learning using the objective function (III-A
). We use the optimizer RMSProp (Root Mean Square Propagation) with a decreasing learning rate initialized at 0.001, which decreases by half at each epoch. All experiments are run on a maximum of 100 epochs using an early-stopping criterion with a patience of 20 epochs to prevent overfitting.
These reference methods DAE, S2S, and S2P adapt the input size to the appliance type on the basis of the average activation time. In practice, they extract all appliance activations from the sub-metered measurements of the training houses, and they calculate the average activation time for each type of appliance. However, a larger window is preferable, even for appliances with short activation duration, such as the kettle and microwave. This gives the model a better overview of other active loads, and it better discerns activations related to the target appliance. The proposed model and S2SS use the same size set to 1024 for all appliance types. Thus,
is large enough to contain the full activation of a multi-state appliance. All hyperparameters are fixed regardless of the appliance type, except for the strideof the sliding window and the batch size. For both multi-state appliances, the stride and the batch size are set to 256 and 32 respectively, and they are set to 64 and 150 for the On/Off appliances. The code used to create results in this paper is available in our ”VAE-NILM” repository.555https://github.com/ETSSmartRes/VAE-NILM
The reference methods are trained according to the implementation guidelines provided in the repositories listed in Section IV-B. However, we adjusted some hyperparameters, such as batch size and window size, to improve their performances. Furthermore, we added dropouts in models S2S and S2P to reduce overfitting.
Iv-E Performance Metrics
To compare our model with the state-of-the-art, we evaluate the disaggregation performance using several metrics. First, we compute the mean absolute error (MAE) between the predicted and ground truth power:
where is the number of time points, and are respectively the predicted power and the ground truth power at time .
Particularly useful for household users, energy per day (EpD) [d2019transfer] is a measure that calculates the absolute error of the predicted energy in a one-day period. The average EpD over the entire dataset is defined as follows:
where is the total number of days and is the energy consumption in a one-day period.
In addition to metrics focused on energy consumption, we use state-based metrics to measure the ability of the model to predict the appliance state (ON/OFF). At each time step, an appliance is considered in the ON state if its power exceeds a predefined threshold [12kelly2015neural]. The threshold
varies depending on the appliance type. It corresponds to 50, 2000, 200, 20, and 10 W for the fridge, kettle, microwave, washing machine, and dishwasher respectively. Once states are assigned, we compute the F1-Score using the precision and recall measures.
Since appliances are mostly inactive, we define an additional metric, , that use the threshold to calculate the MAE separately only when the appliance is ON:
where and is the number of time points when the appliance is ON. provides a more insightful evaluation of the algorithm accuracy, since the error is not averaged out during the time the appliance is actually off.
V Results and Discussions
To facilitate comparison with state-of-the-art works, Table II
presents the test results for the energy disaggregation on house 2 with models trained on houses 1 and 5. The first two columns represent the metrics and methods respectively. The next five columns correspond to the specific results for each appliance. Finally, the last column is the average result for all appliances. Each reported result corresponds to the mean performance and the standard deviation for the ten repetitions. Bold values represent the optimal result of a metric under different methods. The results show that VAE-NILM yields the best results on average for all the metrics. When compared to the reference methods, we note that VAE-NILM reduces the MAE by 40% on average over the entire test sequence with an average reduction of 226 W when appliances are ON. Moreover, the estimated EpD decreases by 32% on average, while the F1-Score increases by 14% in comparison with the other methods. We also observe that the proposed model improves the disaggregation performance evenly for multi-state appliances as well as appliances with short activation times. To reflect the overall performance on the UK-DALE dataset, TableIII presents the combined results of all the scenarios performed on houses 1, 2 and 5. Overall, we observe a trend similar to that of house 2. The MAE improves by about 18% with a reduction of 130 W for the , whereas the EpD decreases by 13%, and the F1-Score increases by 11% compared to the state-of-the-art.
V-1 Disaggregated signal qualitative analysis
Fig. 2 shows examples of disaggregated signals from house 2 for each type of appliance. The first column shows the aggregate signal and the ground truth signal of the target appliance. The following columns illustrate the ground truth signal and the predicted signal for each reference model, and the last column depicts the proposed model. Better performance could be explained by a more accurate signal reconstruction of the VAE-NILM than reference methods. In particular, the proposed model improves reconstruction performance for multi-state appliances that have a longer activation time, as shown in Fig. 2.
The U-shaped architecture of the proposed model allows the feature maps extracted by the encoder to be combined with the deconvolution layers in the decoder through skip connections. The decoder then learns to assemble a more precise output based on this information. Thus, we note that the VAE-NILM model increases sharpness and accuracy of the reconstruction signals. While skip connections are also used in S2SS, we find that both models reconstruct multi-state appliances better than state-of-the-art.
Furthermore, we notice in Fig. 2 that the activation profile of the washing machine in house 2 starts with cyclic states, which is not the case for the washing machines in training houses 1 and 5. Despite this, we observe that the VAE-NILM and S2SS reconstruct the cyclic states, whereas other reference methods experience difficulties. We therefore conclude that skip connections provide valuable information to the decoder, making it more efficient in reconstructing the power consumption of the target appliance.
V-2 State detection analysis
Overall, the results of the experiment suggest that the proposed model achieves better detection of target appliance states in the aggregate signal with an F1-Score 11% higher than the reference methods. The VAE-NILM mainly improves the precision metric, by reducing the number of false positives. We hypothesize that a larger window size helps to capture the context of the other active appliances in the aggregate signal and provides a better prediction of the target appliance states. The proposed model supports the use of larger windows and therefore to benefit from these advantages. As shown in Fig. 2 for the fridge and kettle, the VAE-NILM generates less residual power activation than reference methods.
V-3 Energy estimation analysis
The VAE-NILM obtains on average a more accurate EpD prediction than the reference methods. However, we notice that for some appliances, another method obtains a lower EpD, while the MAE and F1-Score are better for the VAE-NILM. We explain this discrepancy with the EpD definition. The EpD metric is defined as the absolute value of the difference between the predicted and actual energy consumption per day. The weakness of this metric is that the energy consumption used to calculate the EpD is the sum of the power measurements over a period of time. Thus, some false positive and false negative activations cancel each other out over the period of one day. This causes a lower EpD for a model that has more difficulty detecting the target appliance in the aggregate signal. Regarding this, we note that the average F1-Score is higher by more than 11% for VAE-NILM than the reference methods.
In light of the results reported, we conclude that the VAE-NILM model outperforms state-of-the-art methods for the state detection of appliances with a higher F1-Score. Finally, the proposed model yields a more accurate signal power reconstruction than the reference methods, which reduces the overall MAE and more specifically the .
In this paper, we propose an approach based on the VAE framework for the energy disaggregation task. The regularized latent space of the VAE facilitates the encoding of the relevant features essential for an accurate reconstruction of the target appliance power signal. Furthermore, batch and instance normalization implemented in the IBN-Net helps to stabilize and improve the learning process of the proposed approach. Skip connections between the encoder and decoder layers allow the transfer of the extracted features and contribute to enhancing the reconstruction capability of the decoder. In addition, the proposed model can generate realistic appliance activation by varying the latent variable. Thus, the model is able to create synthetic activations to enhance the training data of energy disaggregation approaches. The proposed model was compared to state-of-the-art NILM approaches on the UK-DALE dataset and yielded competitive results. VAE-NILM showed improvements for the detection of the target appliance in the aggregate signal with an average increase of 11% for the F1-Score. Moreover, the proposed model demonstrated a more accurate reconstruction capability, especially for multi-state appliances, with a reduction of 130 W for the on average.
In a future work, we will investigate the integration of multi-task learning (MTL) techniques into this model. MTL would improve both the detection of appliance activations by reducing false positives and false negatives, and the energy disaggregation performance. Furthermore, we expect MTL would increase the generalization capability of the model and yield a better reconstruction for appliances in different houses.
This work was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) and the Canada Research Chair (CRC).