Deep Recurrent Adversarial Learning for Privacy-Preserving Smart Meter Data Release

06/14/2019
by   Mohammadhadi Shateri, et al.
0

Smart Meters (SMs) are an important component of smart electrical grids, but they have also generated serious concerns about privacy data of consumers. In this paper, we present a general formulation of the privacy-preserving problem in SMs from an information-theoretic perspective. In order to capture the casual time series structure of the power measurements, we employ Directed Information (DI) as an adequate measure of privacy. On the other hand, to cope with a variety of potential applications of SMs data, we study different distortion measures along with the standard squared-error distortion. This formulation leads to a quite general training objective (or loss) which is optimized under a deep learning adversarial framework where two Recurrent Neural Networks (RNNs), referred to as the releaser and the attacker, are trained with opposite goals. An exhaustive empirical study is then performed to validate the proposed approach for different privacy problems in three actual data sets. Finally, we study the impact of the data mismatch problem, which occurs when the releaser and the attacker have different training data sets and show that privacy may not require a large level of distortion in real-world scenarios.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

11/20/2020

Deep Directed Information-Based Learning for Privacy-Preserving Smart Meter Data Release

The explosion of data collection has raised serious privacy concerns in ...
12/19/2017

Privacy-Preserving Adversarial Networks

We propose a data-driven framework for optimizing privacy-preserving dat...
04/13/2019

Statistical-Based Privacy-Preserving Scheme with Malicious Consumers Identification for Smart Grid

As smart grids are getting popular and being widely implemented, preserv...
08/23/2021

Investigating Personalisation-Privacy Paradox Among Young Irish Consumers: A Case of Smart Speakers

Personalisation refers to the catering of online services to match consu...
12/01/2017

Together or Alone: The Price of Privacy in Joint Learning

Machine Learning is a widely-used method for prediction generation. Thes...
10/07/2021

Privacy-preserving methods for smart-meter-based network simulations

Smart-meters are a key component of energy transition. The large amount ...
12/06/2020

Privacy-Preserving Synthetic Smart Meters Data

Power consumption data is very useful as it allows to optimize power gri...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Smart Meters (SMs) are a cornerstone for the development of smart electrical grids. These devices are able to report power consumption measurements of a house to an utility provider at every hour or even at every few minutes. This feature generates a considerably amount of useful data which enables several applications in almost real-time such as power quality monitoring, timely fault detection, demand response, energy theft prevention, etc. [1, 2, 3]. However, this fine-grained power consumption monitoring poses a thread to consumers privacy. In fact, it has been shown that simple algorithms, known in general as NonIntrusive Load Monitoring (NILM) methods, can readily be used to infer the types of appliances being used at a home in a given time, even without any prior knowledge about the household [4]. Since these features are highly correlated with the presence of people at the dwelling and their personal habits [5], this induces serious privacy concerns which can have an impact on the acceptance and deployment pace of SMs [6, 7]

. The natural challenge raised here is: how privacy can be enhanced while preserving the utility of the data? Although this problem has been widely studied in the context of data science

[8], the time series structure of SMs data requires a particular treatment [9]. For further details the reader may be referred to [5].

Simple approaches for privacy-preserving in the context of SMs include data aggregation and encryption [10, 11], the use of pseudonyms rather than the real identities of users [12], downsampling of the data [13, 14] and random noise addition [15]. However, these methods often restrict the potential applications of the SMs data. For instance, domwnsampling of the data may incur in time delays to detect critical events, while data aggregation degrades the positioning and accuracy of the power measurements.

A formal approach to the privacy problem has been presented in [16]

from an information-theoretic persepctive, where it has been proposed to assess privacy by the Mutual Information (MI) between the sensitive and released variables. More specifically, the authors model the power measurements of SMs with a hidden Markov model in which the distribution of the measurements is controlled by the state of the appliances, and for each particular state the distribution of power consumption is modeled as Gaussian. This model is then used to obtain the privacy-utility trade-off using tools from rate-distortion theory

[17]. Although this approach is very appealing it has s two important limitations for its application to real-time scenarios with actual data. First, the privacy measure does not capture the causal time dependence and processing of the data, which is an essential feature of the problem. Second, the Gaussian model is quite restrictive in practice. The first limitation has been addressed in [18], where it is shown that for, an online scenario, the privacy measure should be based on Directed Information (DI) [19], which is the privacy measure that we will adopt in the present work. We will further address the second limitation by taking a data-based approach in which no explicit constraints on the distributions or statistics of the involved variables are assumed.

A more sophisticated approach considers the use of Rechargeable Batteries (RBs) and Renewable Energy Sources (RES) in homes in order to modify the actual energy consumption of users with the goal of hiding the sensitive information [20, 21, 22, 23, 24, 25, 26]. Many of these works borrow ideas from the well-known principle of differential privacy [27], which seems to be better suited for fixed databases than for time series data [9]. The main motivation to introduce the use of physical resources into the privacy problem comes from the observation that this approach does not require any distortion in the actual SMs measurements, which means that there is no loss in terms of utility. However, the incorporation of physical resources does not only make the problem more complex and limited in scope, but can also generate a significant cost to users due to the faster wear of the RBs as a consequence of the increased charging/discharging rate [5]. On the other hand, the required level of distortion for a specific privacy goal in a realistic scenario in which the attacker threatening privacy has only partial information is still an open question. Thus, the need and convenience of these solutions is questionable. As a matter of fact, in this work, we show that under some conditions the privacy-utility trade-off may be much less severe than expected. However, it is important to note that these approaches are complementary to the ones based on distorting the power measurements rather than alternative. Thus, for simplicity, we assume that no RBs and/or RESs are available and distortion is the only mean to achieve a desired privacy level.

The use of neural networks to model an attacker has been considered in [28]. However, a more powerful formulation of the problem assumes that both the releaser (or privatizer) and the attacker are deep neural networks (DNNs) that are trained simultaneously based on a minimax game, an idea that is inspired by the well-known Generative Adversarial Networks (GANs) [29]. This concept can be referred to as Generative Adversarial Privacy (GAP) [30] and is the basis for our approach. It should be mentioned that the concept of GAP has been studied for different applications related to images [31, 32]

but, to the best of our knowledge, not in the context of SMs time series data. In these works, the authors consider i.i.d. data and deep feed-forward neural networks for the releaser and attacker, while in this paper we consider deep Recurrent Neural Networks (RNNs) to capture and exploit the time correlation. The idea of time-series generation with an adversarial approach has been considered in

[33] for medical data based in the principle of differential privacy. As we mentioned previously, our approach is instead based on the DI, an information-theoretic measure of privacy.

In summary, the main contributions of this paper are the following:

  1. We applied DI as a privacy measure similarly to [18]. However, unlike this and previous works, we impose no explicit assumptions on the generating model of the power measurements, but take a more versatile data-driven approach.

  2. We study different possible distortion measures which provide more flexibility to control the specific features to be preserved in the released signals that is, the relevant features for the target applications.

  3. For the sake of computational tractability, we propose a loss function for training the privacy-preserving releaser based on an upper bound to DI. Then, considering an attacker that minimizes a standard cross-entropy loss, we show that this leads to an adversarial framework based on two RNNs to train the releaser.

  4. We perform an extensive statistical study with actual data from three different data sets and frameworks motivated by real-world theaters to characterize the utility-privacy trade-offs and the nature of the distortion generated by the releaser network.

  5. We investigate the data mismatch problem in the context of SMs privacy, which occurs when the data available to the attacker is not the same as the one used for training the releaser mechanism, and show that it has an important impact on the privacy-utility trade-off.

The rest of the paper is organized as follows. In Section II, we present the theoretical formulation of the problem that motivates the loss functions for the releaser and attacker. Then, in Section III, the privacy-preserving adversarial framework is introduced along with the training algorithm. Extensive results are presented and discussed in Section IV. Finally, some concluding remarks are presented in Section V.

Notation and conventions

  • : A sequence of random variables, or a time series, of length

    ;

  • : A realization of ;

  • : The sample in a minibatch used for training;

  • : The expectation of a random variable ;

  • : The distribution of ;

  • : Mutual information between random variables and [17];

  • : Entropy of random variable ;

  • : Directed information between two time series and ;

  • : Causally conditional entropy of given [34];

  • : Markov chain among

    , and .

Ii Problem Formulation and Training Loss

Ii-a Main definitions

Consider the private time series , the utile process , and the observed signal . We assume that takes values on a fixed discrete alphabet , for each . A releaser

(this notation is used to denote that the releaser is controlled by a vector of parameters

) produces the release process as based on the observation , for each time , while an attacker attempts to infer based on by finding an approximation of , which we shall denote by . Thus, the Markov chain holds for all . In addition, due to causality, the distribution can be decomposed as follows:

(1)

The goal of the releaser is to minimize the flow of information from the sensitive process to while simultaneously keeping the distortion between the release time series and the useful signal small. On the other hand, the goal of the attacker (again, this notation is used to denote that the attacker is controlled by a vector of parameters ) is to learn the optimal decision rule based on the distribution , for each , as accurately as possible. Note that after the approximation

is obtained, the attacker can estimate the realization

corresponding to in an online fashion, by solving the following problems:

(2)

Thus, the attacker can be interpreted as an hypothesis test, as stated in [35]. However, in the present case, we consider the more realistic scenario in which the statistical test is suboptimal due to the fact that the attacker has no access to the actual conditional distributions but only to , i.e., an inference of them.

In order to take into account the causal relation between and , the flow of information is quantified by DI [19]:

(3)

where is the conditional mutual information between and conditioned on [17]. The normalized expected distortion between and is defined as:

(4)

where is any distortion function (i.e., a metric on ). To ensure the quality of the release, it is natural to impose the following constraint: for some given . In previous works, the normalized squared error was considered as a distortion function (e.g., [16]). Beside this, other distortion measures can be relevant whithin the framework of SMs. For instance, demand response programs usually require an accurate knowledge of peak power consumption, so a distortion function closer to the infinity norm would be more meaningful for those particular applications. Thus, for the sake of generality and to keep the distortion function simple, we propose to use an distance:

(5)

where is a fix parameter. Note that this distortion function contains the squared error case as a particular case for while it converges to the maximum error between the components of and as .

Therefore, the problem of finding an optimal releaser subject to the aforementioned attacker and distortion constraint can be formally written as follows:

(6)

Note that the solution to this optimization problem depends on , i.e., the conditional distributions that represent the attacker . Thus, a joint optimization of the releaser and the attacker is required.

Ii-B A novel training loss

The optimization problem in (II-A) can be exploited to motivate an objective function for . However, note that the cost of computing DI term is , where is the size of . Thus, for the sake of tractability, DI will be replaced with the following surrogate upper bound:

(7)

where (i) is due to the fact that conditioning reduces entropy; equality (ii) is due to the Markov chains and ; (iii) is due to the trivial bound ; and (iv) follows by the definition of the causally conditional entropy  [34] and the Markov chain . Therefore, the loss function for can be written as:

(8)

where controls the privacy-utility trade-off and the factor has been introduced for normalization purposes. It should be noted that, for , the loss function reduces to the expected distortion, being independent from the attacker . In such scenario, offers no privacy guarantees. Conversely, for very large values of , the loss function is dominated by the upper bound on DI, so that privacy is the only goal of . In this regime, we expect the attacker to fail completely to infer , i.e., to approach to random guessing performance.

On the other hand, the attacker

is a classifier which optimizes the following cross-entropy loss:

(9)

where the expectation should be understood w.r.t. . It is important to note that

(10)

since the second term in (II-B

) is a Kullback-Leibler divergence, which is non-negative. Thus, by minimizing

, the releaser is preventing the attacker from inferring the sensitive process while also minimizing the distortion between the useful and released processes. This shows that and are indeed trained in an adversarial fashion. It should be noted that here is an artificial attacker used for training . Once the training is complete, and is fixed, a new attacker should be trained from scratch, using the loss (9), in order to assess the privacy-utility trade-off in an unbiased way.

Iii Privacy-Preserving Adversarial Learning

Based on the previous theoretical formulation, an adversarial modeling framework consisting of two RNNs, a releaser and an attacker , is considered (see Fig. 1). Note that independent noise is appended to in order to randomize the released variables

, which is a popular approach in privacy-preserving methods. In addition, the available theoretical results show that, for Gaussian distributions, the optimal release contains such a noise component

[16, 36]. For both networks, a LSTM architecture is selected (see Fig. 2), which was shown to be successful in several problems dealing with sequences of data (e.g., see [37] and references therein for more details). Training in the suggested framework is performed using the Algorithm 1 which requires gradient steps to train followed by one gradient step to train . It is worth to emphesize that should be larger than in order to ensure that represents a strong attacker. However, if is too large, this could lead to an overfitting and thus a poor attacker.

Fig. 1: Privacy-Preserving framework. The seed noise

is generated from i.i.d. samples according to a uniform distribution:

.
Fig. 2:

LSTM recurrent network cell diagram. The cell includes four gating units to control the flow of information. All the gating units have a sigmoid activation function (

) except for the input unit (that uses an hyperbolic tangent activation function () by default). The parameters are respectively biases, input weights, and recurrent weights. In the LSTM architecture, the forget gate uses the output of the previous cell (which is called hidden state ) to control the cell state to remove irrelevant information. On the other hand, the input gate and input unit adds new information to from the current input. Finally, the output gate generates the output of the cell from the current input and cell state.

Input: Data set (which includes sample sequences of useful data , sensitive data ); seed noise samples ; seed noise dimension ; batch size ; number of steps to apply to the attacker

; gradient clipping value

; recurrent regularization parameter .
Output: Releaser network .

1:  for number of training iterations do
2:     for  steps do
3:        Sample minibatch of examples: .
4:        Compute the gradient of , approximated with the minibatch , w.r.t. to .
5:

        Update the attacker by applying the RMSprop optimizer with clipping value

.
6:     end for
7:     Sample minibatch of examples: .
8:     Compute the gradient of , approximated with the minibatch , w.r.t. to .
9:     Use recurrent regularization with value and update the releaser by applying RMSprop optimizer with clipping value .
10:  end for
Algorithm 1 Algorithm for training privacy-preserving data releaser neural network.

Iv Results and Discussion

Iv-a Description of data sets

Three different data sets are considered:

  • The Electricity Consumption & Occupancy (ECO) data set, collected and published by [38], which includes 1 Hz power consumption measurements and occupancy information of five houses in Swiss over a period of months. In this study, we re-sampled the data to have hourly samples.

  • The Pecan Street data set contains hourly SMs data of houses in Texas, Austin and was collected by Pecan Street Inc. [39]. Pecan Street project is a smart grid demonstration research program which provides electricity, water, natural gas, and solar energy generation measurements for over houses in Texas, Austin.

  • The Low Carbon London (LCL) data set, which includes half-hourly energy consumption for more than households over the period in London [40]. Each household is allocated to a CACI Acorn group [41], which includes three categories: affluent, comfortable and adversity.

We model the time dependency over each day, so the data sets were reshaped to sample sequences of length for ECO and Pecan Street (data rate of 1 sample per hour) while sample sequences of length were used for LCL data set (data rate of 1 sample per half hour). For the ECO, Pecan Street, and LCL data sets, a total number of , , and sample sequences were collected, respectively. The data sets were splitted into train and test sets with a ratio of roughly 85:15 while

of the training data is used as the validation set. The network architectures and hyperparameters used for training and test in the different applications are summarized in Table

I.

width=0.95 SMs Application Releaser Training Attacker Test Attacker Inference of households occupancy LSTM layers each with 64 cells and LSTM layers each with 32 cells LSTM layers each with 32 cells 128 4 8 Inference of households identity LSTM layers each with 128 cells and LSTM layers each with 32 cells LSTM layers each with 32 cells 128 5 3 Inference of households acron type LSTM layers each with 100 cells and LSTM layers each with 32 cells LSTM layers each with 32 cells 128 7 3

TABLE I: Model architectures and hyperparameters values used for each application.

To assess the distortion with respect to the actual power consumption measurements, we define the Normalized Error (NE) for the different distortion functions as follows:

(11)

Iv-B Distortion

First, the distortion function is considered (i.e., in (5)). In the following subsections, three different privacy applications are studied (one for each of the data sets presented in Section IV-A).

Iv-B1 Inference of households occupancy

The first practical case of study regarding privacy-preserving in time series data is the concern of inferring presence/absence of residents at home from the total power consumption collected by SMs [42, 43]

. For this application, the electricity consumption measurements from the ECO data set are considered as the useful data, while occupancy labels are defined as the private data. Therefore, in this case, the releaser attempts to minimize a trade-off between the distortion of the total electricity consumption incurred and the probability of inferring the presence of an individual at home from the release signal. Note from Table

I that a stronger attacker composed of 3 LSTM layers is used for the test.

In Fig. 3 we show the empirically found privacy-utility trade-off for this application. Note that by adding the distortion the accuracy of the attacker changes from more than (no privacy) to (full privacy), which corresponds to the performance of a random guessing classifier.

Fig. 3: Privacy-utility trade-off for house occupancy inference application. Since in this application the attacker is a binary classifier, the random guessing (balanced) accuracy is 50. The fitted curve is based on an exponential function and is included only for illustration purposes.

In order to provide more insights about the release mechanism, the Power Spectrum Density (PSD) of the input signal and the PSD of the error signal for three different cases along the privacy-utility trade-off curve of Fig. 3 are estimated using Welch’s method [44]

. For each case, we use 10 release signals and average the PSD estimates to reduce the variance of the estimator. Results are shown in Fig.

4. Looking at the PSD of the input signal (useful data), some harmonics are clearly visible. The PSD of the error signals show that the model controls the trade-off in privacy-utility by modifying the floor noise and the distortion on these harmonics.

Fig. 4: PSD of the actual electricity consumption and error signals for the house occupancy inference application.

It should be mentioned that two stationary tests, the Augmented Dickey-Fuller test [45] and the Kwiatkowski, Phillips, Schmidt, and Shin (KPSS) test [46], were applied to the data. This confirmed that there is enough evidence to suggest the data is stationary, thus supporting our PSD analysis.

Iv-B2 Inference of household identity

The second practical case of study regarding privacy preservation in SMs measurements is identity recognition from total power consumption of households [12]. It is assumed that both the releaser and the attacker have access to the total power consumption of different households in a region (training data), and then the attacker attempts to determine the identity of a house using the released data obtained from the test data. Thus, our model aims at generating release data of total power consumption of households in a way that prevents the attacker to perform the identity recognition while keeping distortion on the total power minimized. For this task, total power consumption of five houses is used.

The empirical privacy-utility trade-off curve obtained for this application is presented in Fig. 5. Comparing Fig. 5 with Fig. 3, we see that a high level of privacy requires a high level of distortion. For instance, in order to obtain an attacker accuracy of 30 , NE should be approximately equal to 0.30. This is attributed to the fact that this task is harder from the learning viewpoint than the one considered in Section IV-B1.

Fig. 5: Privacy-utility trade-off for house identity inference application. Since in this application the attacker is a five-class classifier, the random guessing (balanced) accuracy is 20. The fitted curve is based on an exponential function and is included only for illustration purposes.

Iv-B3 Inference of households acorn type

As the third practical case of study, we consider the family acorn type identification which can reveal the family economic status to any third-party having access to the SMs data [47]. Thus, for this application, the SMs power consumption is used as useful data while the acorn type is considered as private one.

The empirical privacy-utility trade-off curve obtained for this application is presented in Fig. 6. Once again, we see a large variation in the accuracy of the attacker as the distortion is modified.

Fig. 6: Privacy-utility trade-off for acorn type inference application. Since in this application the attacker is a three-class classifier, the random guessing (balanced) accuracy is 33. The fitted curve is based on an exponential function and is included only for illustration purposes.

It should be noted that the PSD analysis for this application and the previous one lead to similar results to the ones of the first application and therefore are not reported.

To assess the quality of the release signal, utility providers may be interested in several different indicators. These include, for instance, the mean, skewness, kurtosis, standard deviation to mean ratio, and maximum to mean ratio

[48]. Thus, for completeness, we present these indicators in Table II for three different cases along the privacy-utility trade off curve. These results show that in general the error in these indicators is small when the privacy constraints are lax and increases as they become strict. Whereas no simple relation can be expected between NE and the values of the corresponding indicators.

width=0.9 SMs Application NE Accuracy(%) Absolute relative error of quality indicators(%) Mean Skewness Kurtosis Std. Dev./Mean Max./Mean Inference of households occupancy 0.04 78 1.42 1.06 0.70 0.67 0.46 0.12 65 9.69 4.32 5.81 4.58 4.92 0.18 57 13.26 12.83 2.57 16.44 13.89 Inference of households identity 0.05 54 3.42 2.22 2.01 3.50 2.51 0.17 39 4.63 3.18 1.79 15.74 9.32 0.36 29 12.49 6.71 1.44 19.12 9.98 Inference of households acron type 0.03 85 1.86 0.66 0.44 0.02 0.02 0.29 47 2.49 9.46 14.54 24.97 13.24 0.60 35 13.21 45.92 24.03 55.38 41.68

TABLE II: Errors in power quality indicators for three applications along the privacy-utility trade-off.

Iv-C Distortion

As already discussed in Section II, the distortion function should be properly matched to the intended application of the release variables in order to preserve the characteristics of the target variables that are considered essential. In this section, we consider the distortion (5) with as an alternative to the distortion function in Section and study their potential benefits.

The privacy-utility trade-off curve for the inference of households occupancy application is shown in Fig. 7. As a first observation, it appears clear that the choice of the distortion measure has a non-negligible impact on the privacy-utility trade-off curve. In fact, it can be seen that for a given amount of distortion, the releasers trained with the and distortion measures achieve a higher level of privacy than the one trained with the distortion function. It should be mentioned that we also considered other norms, such as the , and the privacy-utility trade-off was observed to be similar, but slightly better, than the one corresponding to the norm.

Fig. 7: Privacy-utility trade-off for house occupancy inference application based on the different distortion functions. For each figure, the dashed line, shown for comparison purposes, is the fitted curve found in Fig. 3 for the distortion function.

As we discussed in Section II, in demand response programs, the utilities are mostly interested in the peak power consumption of the customers. It is also expected that higher-order norms are better at preserving these signal characteristics than the norm. To verify this notion, we considered 60 random days of the ECO data set in a full privacy scenario (i.e., with an attacker accuracy very close to ) and plotted the actual power consumption along with the corresponding release signals for both the and distortion functions. Results are shown in Fig. 8, which clearly indicates that the number of peaks preserved by the releaser trained with the distortion function is much higher than the ones kept by the releaser trained with the distortion function. This suggests that for the demand response application, higher order distortion functions should be considered.

Fig. 8: Example of the release power consumption in the time domain compared with the actual power consumption over 60 random days with almost full privacy for the and distortion functions.

Iv-D Attacker with Data Mismatch Problem

All the previous results are based on the assumption that the attacker has access to exactly the same training data set used by the releaser. This case should be considered as a worst-case analysis of the performance of the privacy-preserving networks. However, this assumption may not be true in practice. To examine this scenario, we revisit the application of Section IV-B1 in two different cases. In the first case, we assume that out of the data set of five houses (ECO data set), the releaser uses the data of houses for training while the attacker has access to just the data of house . In the second case, we assume that releaser is trained by the the data set of all houses but just the data set of houses and are available to the attacker. These scenarios try to capture different degrees of a data mismatch problem, which could have an impact on the privacy-utility trade-off due to the different generalization errors.

The results are presented in Fig. 9 along with the worst-case scenario. This clearly shows how the overlapping of the training data sets of the releaser and the attacker affect the performance of the model. In fact, in the case where the attacker does not have access to the full data set of the releaser but a portion of that, the performance of the attacker largely degrades, which means that a target level of privacy requires much less distortion. In the extreme case where the attacker has no access to the releaser training data set, a very high level of privacy can be achieved with negligible distortion. This should be considered as a best-case scenario. It should be mentioned that we repeated this experiment with different shuffling of the houses and similar results were obtained.

Fig. 9: Privacy-utility trade-off for house occupancy inference application when an attacker (trained separately to infer private data from the release) does not have full access to the releaser training data set.

V Summary and Discussion

Privacy problems associated with smart meters measurements are an important concern in society, which can have an impact on their deployment pace and the advancement of smart grid technologies. Thus, it is essential to understand the real privacy risks associated with them in order to provide an adequate solution to this problem.

In this paper, we proposed to measure the privacy based on Directed Information (DI) between the sensitive time series and its inference by a potential attacker optimized for that task. DI captures the causal time dependencies present in the time series data and its processing. Unlike previous approaches, we impose no explicit assumption on the statistics or distributions of the involved random variables. We believe that this data-driven approach can provide a more accurate assessment of the information leakage in practice than purely theoretical studies based on worst-case assumptions.

We considered a privacy-preserving adversarial learning framework that balances the trade-off between privacy and distortion on release data. More precisely, we defined a tractable training objective (or loss) based on an upper bound to DI and a general distortion measure. The desired releaser is then trained in an adversarial framework using RNNs to optimize such objective, while an artificial attacker is trained with an opposite goal. After convergence, a new attacker is trained to test the level of privacy achieved by the releaser. A detailed study of different applications, including inference of households occupancy (ECO data set), inference of household identity (Pecan Street data set), and inference of household acorn type (LCL data set), shows that the privacy-utility trade-off is strongly dependent upon the considered application and distortion measure. We showed that the usual -norm based distortion measure for can have a worse privacy-utility trade-off than for . In addition, we showed that the distortion measure generates a release that preserves most of the power consumption peaks even under a full privacy regime, which is not the case for the distortion function. This result is of considerable importance for demand response applications.

Finally, we studied the impact of the data mismatch problem in this application, which occurs when the training data set of the releaser is not the same as the one used by the attacker. Results show that this effect can greatly affect the privacy-utility trade-off. Since this phenomenon is expected in practice, at least to some degree, these findings suggest that the level of required distortion to achieve desired privacy targets may not be too significant in several cases of interest. In such scenarios, our approach may offer a simpler and more general solution than the ones offered by methods based on rechargeable batteries and renewable energy sources.

References

  • [1] D. Alahakoon and X. Yu, “Smart electricity meter data intelligence for future energy systems: A survey,” IEEE Transactions on Industrial Informatics, vol. 12, pp. 425–436, Feb 2016.
  • [2] Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of smart meter data analytics: Applications, methodologies, and challenges,” IEEE Transactions on Smart Grid, vol. 10, pp. 3125–3148, May 2019.
  • [3] S. S. S. R. Depuru, L. Wang, V. Devabhaktuni, and N. Gudi, “Smart meters for power grid — challenges, issues, advantages and status,” in 2011 IEEE/PES Power Systems Conference and Exposition, pp. 1–7, March 2011.
  • [4] A. Molina-Markham, P. Shenoy, K. Fu, E. Cecchet, and D. Irwin, “Private memoirs of a smart meter,” in Proceedings of the 2Nd ACM Workshop on Embedded Sensing Systems for Energy-Efficiency in Building, BuildSys ’10, (New York, NY, USA), pp. 61–66, ACM, 2010.
  • [5] G. Giaconi, D. Gunduz, and H. V. Poor, “Privacy-aware smart metering: Progress and challenges,” IEEE Signal Processing Magazine, vol. 35, no. 6, pp. 59–78, 2018.
  • [6] E. McKenna, I. Richardson, and M. Thomson, “Smart meter data: Balancing consumer privacy concerns with legitimate applications,” Energy Policy, vol. 41, pp. 807 – 814, 2012. Modeling Transport (Energy) Demand and Policies.
  • [7] C. Cuijpers and B.-J. Koops, “Smart metering and privacy in europe: Lessons from the dutch case,” in European Data Protection, 2013.
  • [8] P. Jain, M. Gyanchandani, and N. Khare, “Big data privacy: a technological perspective and review,” Journal of Big Data, vol. 3, p. 25, Nov 2016.
  • [9] M. R. Asghar, G. Dán, D. Miorandi, and I. Chlamtac, “Smart meter data privacy: A survey,” IEEE Communications Surveys Tutorials, vol. 19, pp. 2820–2835, Fourthquarter 2017.
  • [10] F. Li, B. Luo, and P. Liu, “Secure information aggregation for smart grids using homomorphic encryption,” in 2010 First IEEE International Conference on Smart Grid Communications, pp. 327–332, Oct 2010.
  • [11] C. Rottondi, G. Verticale, and C. Krauss, “Distributed privacy-preserving aggregation of metering data in smart grids,” IEEE Journal on Selected Areas in Communications, vol. 31, pp. 1342–1354, July 2013.
  • [12] C. Efthymiou and G. Kalogridis, “Smart grid privacy via anonymization of smart metering data,” in 2010 First IEEE International Conference on Smart Grid Communications, pp. 238–243, IEEE, 2010.
  • [13] D. Mashima, “Authenticated down-sampling for privacy-preserving energy usage data sharing,” in 2015 IEEE International Conference on Smart Grid Communications (SmartGridComm), pp. 605–610, IEEE, 2015.
  • [14] G. Eibl and D. Engel, “Influence of data granularity on smart meter privacy,” IEEE Transactions on Smart Grid, vol. 6, pp. 930–939, March 2015.
  • [15] P. Barbosa, A. Brito, and H. Almeida, “A technique to provide differential privacy for appliance usage in smart metering,” Information Sciences, vol. 370-371, pp. 355 – 367, 2016.
  • [16] L. Sankar, S. R. Rajagopalan, S. Mohajer, and H. V. Poor, “Smart meter privacy: A theoretical framework,” IEEE Transactions on Smart Grid, vol. 4, pp. 837–846, June 2013.
  • [17] T. M. Cover and J. A. Thomas, “Elements of information theory, 2nd edition,” Willey-Interscience: NJ, 2006.
  • [18] M. A. Erdogdu and N. Fawaz, “Privacy-utility trade-off under continual observation,” in 2015 IEEE International Symposium on Information Theory (ISIT), pp. 1801–1805, June 2015.
  • [19] J. Massey, “Causality, feedback and directed information,” in Proc. Int. Symp. Inf. Theory Applic.(ISITA-90), pp. 303–305, Citeseer, 1990.
  • [20] G. Kalogridis, C. Efthymiou, S. Z. Denic, T. A. Lewis, and R. Cepeda, “Privacy for smart meters: Towards undetectable appliance load signatures,” in 2010 First IEEE International Conference on Smart Grid Communications, pp. 232–237, Oct 2010.
  • [21] O. Tan, D. Gunduz, and H. V. Poor, “Increasing smart meter privacy through energy harvesting and storage devices,” IEEE Journal on Selected Areas in Communications, vol. 31, pp. 1331–1341, July 2013.
  • [22] G. Ács and C. Castelluccia, “DREAM: differentially private smart metering,” CoRR, vol. abs/1201.2531, 2012.
  • [23] J. Zhao, T. Jung, Y. Wang, and X. Li, “Achieving differential privacy of data disclosure in the smart grid,” in IEEE INFOCOM 2014-IEEE Conference on Computer Communications, pp. 504–512, IEEE, 2014.
  • [24] S. Li, A. Khisti, and A. Mahajan, “Information-theoretic privacy for smart metering systems with a rechargeable battery,” IEEE Transactions on Information Theory, vol. 64, no. 5, pp. 3679–3695, 2018.
  • [25] G. Giaconi, D. Gündüz, and H. V. Poor, “Smart meter privacy with renewable energy and an energy storage device,” IEEE Transactions on Information Forensics and Security, vol. 13, pp. 129–142, Jan 2018.
  • [26] E. Erdemir, P. L. Dragotti, and D. Gunduz, “Privacy-cost trade-off in a smart meter system with a renewable energy source and a rechargeable battery,” arXiv preprint arXiv:1902.07739, 2019.
  • [27] C. Dwork, “Differential privacy: A survey of results,” in Theory and Applications of Models of Computation (M. Agrawal, D. Du, Z. Duan, and A. Li, eds.), (Berlin, Heidelberg), pp. 1–19, Springer Berlin Heidelberg, 2008.
  • [28] Y. Wang, N. Raval, P. Ishwar, M. Hattori, T. Hirano, N. Matsuda, and R. Shimizu, “On methods for privacy-preserving energy disaggregation,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6404–6408, March 2017.
  • [29] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems 27 (Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, eds.), pp. 2672–2680, Curran Associates, Inc., 2014.
  • [30] C. Huang, P. Kairouz, X. Chen, L. Sankar, and R. Rajagopal, “Generative adversarial privacy,” CoRR, vol. abs/1807.05306, 2018.
  • [31] A. Tripathy, Y. Wang, and P. Ishwar, “Privacy-preserving adversarial networks,” CoRR, vol. abs/1712.07008, 2019.
  • [32] C. Feutry, P. Piantanida, Y. Bengio, and P. Duhamel, “Learning Anonymized Representations with Adversarial Neural Networks,” arXiv e-prints, p. arXiv:1802.09386, Feb 2018.
  • [33] C. Esteban, S. L. Hyland, and G. Rätsch, “Real-valued (medical) time series generation with recurrent conditional gans,” CoRR, vol. abs/1706.02633, 2018.
  • [34] G. Kramer, “Capacity results for the discrete memoryless network,” IEEE Transactions on Information Theory, vol. 49, pp. 4–21, Jan 2003.
  • [35] Z. Li, T. J. Oechtering, and D. Gündüz, “Privacy against a hypothesis testing adversary,” IEEE Transactions on Information Forensics and Security, vol. 14, pp. 1567–1581, June 2019.
  • [36] A. Tripathy, Y. Wang, and P. Ishwar, “Privacy-preserving adversarial networks,” arXiv preprint arXiv:1712.07008, 2017.
  • [37] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016. http://www.deeplearningbook.org.
  • [38] C. Beckel, W. Kleiminger, R. Cicchetti, T. Staake, and S. Santini, “The eco data set and the performance of non-intrusive load monitoring algorithms,” in Proceedings of the 1st ACM Conference on Embedded Systems for Energy-Efficient Buildings, pp. 80–89, ACM, 2014.
  • [39] Pecan Street Inc, “Dataport: the world’s largest energy data resource,” 2019. https://dataport.cloud/.
  • [40] UK Power Networks, Low Carbon London Project., “SmartMeter Energy Consumption Data in London Households.” https://data.london.gov.uk/dataset/smartmeter-energy-use-data-in-london-households.
  • [41] Caci, “The acorn user guide,” 1989.
  • [42] W. Kleiminger, C. Beckel, and S. Santini, “Household occupancy monitoring using electricity meters,” in Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, pp. 975–986, ACM, 2015.
  • [43] W. Jia, H. Zhu, Z. Cao, X. Dong, and C. Xiao, “Human-factor-aware privacy-preserving aggregation in smart grid,” IEEE Systems Journal, vol. 8, no. 2, pp. 598–607, 2014.
  • [44] P. Stoica and R. Moses, Spectral Analysis of Signals. Pearson Prentice Hall, 2005.
  • [45] D. A. Dickey and W. A. Fuller, “Distribution of the estimators for autoregressive time series with a unit root,” Journal of the American statistical association, vol. 74, no. 366a, pp. 427–431, 1979.
  • [46]

    D. Kwiatkowski, P. C. Phillips, P. Schmidt, and Y. Shin, “Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root?,”

    Journal of econometrics, vol. 54, no. 1-3, pp. 159–178, 1992.
  • [47] S. Thorve, L. Kotut, and M. Semaan, “Privacy preserving smart meter data,” 2018.
  • [48]

    Y. Shen, M. Abubakar, H. Liu, and F. Hussain, “Power quality disturbance monitoring and classification based on improved pca and convolution neural network for wind-grid distribution systems,”

    Energies, vol. 12, no. 7, 2019.