Measuring privacy in smart metering anonymized data

02/12/2020 ∙ by Santi Martínez, et al. ∙ Universität Saarland Universitat de Lleida 0

In recent years, many proposals have arisen from research on privacy in smart metering. In one of the considered approaches, referred to as anonymization, smart meters transmit fine-grained electricity consumption values in such a way that the energy supplier can not exactly determine procedence. This paper measures the real privacy provided by such approach by taking into account that at the end of a billing period the energy supplier collects the overall electricity consumption of each meter for billing purposes. An entropy-based measure is proposed for quantifying privacy and determine the extent to which knowledge on the overall consumption of meters allows to re-identify anonymous fine-grained consumption values.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Smart meters measure and transmit information about electricity consumption to the energy supplier in short intervals (each 15 or 30 minutes). Such data allows an accurate prediction of consumption so that the production can be managed in real-time. The replacement of old analog meters with smart ones is in progress in many developed countries. For instance, the European Union expects to replace 80% of electricity meters by 2020. In Spain, such deployment was almost complete by the end of 2018.

Unfortunately, such fine-grained data about household electricity consumption allows to infer sensitive information like the types of electrical devices being used, or even the exact times people leave or arrive at home.

Encryption of electricity readings prior to their transmission protects against external eavesdroppers but not against a malicious energy supplier which may process them for non-authorized purposes. Hence, privacy-preserving solutions for smart metering data transmission should assume that the energy supplier is an untrusted entity.

Solutions providing privacy in smart metering can be classified into the following main classes:

  • Obfuscation-based methods: Controllable batteries and alternative generation devices Kalogridis et al. (2010), or devices with an adjustable consumption Egarter et al. (2014) are placed inside the household so as to alter energy usage curves.

  • Perturbative methods: Meters mask a fine-grained electricity consumption value by adding some random noise to it prior to its transmission to the energy supplier. In this way, the energy supplier gets only an approximation to the real household consumption profiles. The type and magnitude of the added noise has to be accurately tuned to enforce privacy while preserving the accuracy of the transmitted data. The proposal Barbosa et al. (2016) provides differential privacy Dwork (2008) by LaPlacian noise addition. The proposal Bohli et al. (2010) is based on adding Gaussian noise.

  • Anonymous methods: Data are transmitted so that the link between electricity readings and the identity of customers is removed. The proposals Efthymiou and Kalogridis (2010); Finster and Baumgart (2013) propose the use of pseudonyms for sending high frequency metering data. In the aforementioned proposals each household has a unique pseudonym which is transmitted together with all its consumption values so that the electricity providers are able to get anonymous consumption patterns (linked to pseudonyms). In Jawurek et al. (2011)

    it is proven that pseudonymized consumption traces can be re-identified if combined with external indicators. The authors show the high re-identification performance of two attack vectors called

    anomaly detection and

    behavior pattern matching

    . Both attacks are shown to be robust against basic countermeasures like resolution reduction or frequent re-pseudonymization.

    A different approach is taken in Petrlic (2010); Stegelmann and Kesdogan (2012). In these proposals, the electricity consumptions received by the electricity supplier at a given period come without a pseudonym Petrlic (2010) or linked to a pseudonym shared among several meters Stegelmann and Kesdogan (2012). In this way, a given consumption value could come from any meter belonging to the originating community.

  • Aggregation-based methods: The meters are grouped into neighborhoods or communities with a substation which has direct communication with each meter. Each metering period, the meters and the substation engage in a protocol which results in the substation receiving the aggregated (added) readings of all the meters. Such proposals employ homomorphic cryptosystems for ensuring the substation gets no information but the addition of all meters readings in a metering period. This problem, when not focused on smart metering data, is referred to as privacy-preserving aggregation of time-series data Joye and Libert (2013); Rastogi and Nath (2010); Shi et al. (2011).

    The authors in Garcia and Jacobs (2011) consider a neighborhood with smart meters, and a substation. Each meter stores a private key whose related public key is publicly known. Each meter takes its energy consumption at a given period and generates values , , satisfying . Then each meter encrypts each value , under the public key of and transmits the resulting ciphertext to the substation. The substation homomorphically aggregates all ciphertexts encrypted under the public key of each meter and sends the result to it. Each meter , decrypts the received ciphertext and adds to it. Finally, it sends the result to the substation. The substation gets the aggregated consumption as the addition of the results received from all the meters. The proposal has a high, , communication cost per execution.

    The authors in Rastogi and Nath (2010) suggest a method which can be applied to aggregate smart meter readings. There exists a neighborhood public key whose private key is distributed among all the meters. Each meter in a neighborhood encrypts its consumption value using the Paillier cryptosystem under the neighborhood public key. The substation aggregates all the encrypted consumptions and then requests each meter to compute a partial decryption of the resulting ciphertext. From all the partial decryptions, the substation obtains the aggregated consumption. The proposal includes a method to prevent a malicious coalition including the substation and some corrupted meters from getting information about the consumption of honest meters other than their aggregated sum. The authors also include a way to add LaPlacian noise to the consumptions so as to get differential privacy. A drawback of this proposal comes from the use of a distributed Paillier cryptosystem whose setup is very complicated when a trusted dealer is not desirable Nishide and Sakurai (2011). The proposal Busom et al. (2016), enhanced in Garra et al. (2019), proposes a similar aggregation mechanism which makes use of the homomorphic property of ElGamal cryptosystem with a quite easier distributed private key setup.

    The proposals Shi et al. (2011); Xie and Zhang (2015) require the presence of a trusted dealer that generates a set of random values that sum up to zero. As pointed out in Jung et al. (2015), and implemented in Garra et al. (2018), presence of a trusted dealer during setup can be avoided by making use of a trusted dealer–free protocol for secure addition like Garcia and Jacobs (2011); Rastogi and Nath (2010); Li et al. (2010). Each meter and the substation receive one of such values. Each meter uses its secret share to encrypt its reading prior to transmitting it. The substation aggregates the received encrypted data and gets the addition of all the readings after solving an easy instance of the discrete logarithm problem. The authors in Shi et al. (2011) extend their basic proposal to provide differential privacy by adding symmetric geometric noise. The proposal is very lightweight and only requires unidirectional meter to substation data transmission. Unfortunately, in dynamic scenarios, each time a meter is added or removed, the presence of a trusted dealer or execution of a trusted dealer–free protocol for secure addition is required. A similar proposal which avoids the computation of a discrete logarithm after decryption at the cost of requiring a trusted dealer that generates an RSA modulus , is proposed in Joye and Libert (2013). A proposal equivalent to Joye and Libert (2013) can be found in Borges and Mühlhäuser (2014).

The Germany’s information security agency has designed a smart metering system architecture which considers the privacy of consumers. The solution belongs to the anonymous methods paradigm. As discussed in Stegelmann and Kesdogan (2012), the key component of that architecture is a smart metering gateway which is installed in a consumer’s house. The main tasks of a gateway are to get local meters readings and communicate remotely with the energy service provider and the grid operator. Remote communications can be identifiable or pseudonymised.

Identifiable communications are used for transmitting the sum of electricity consumption values (e.g., once per month) to the service provider for billing purposes. Pseudonymised communications are for privately sending fine-grained electricity consumption values. First, the gateway generates a message composed of the consumption value together with a pseudonym. That message is encrypted under the grid operator’s public key. Next, the resulting ciphertext is digitally signed and transmitted to a gateway operator which verifies the digital signature, removes it, and forwards the encrypted message to the grid operator. The grid operator finally decrypts the received message and gets the consumption value together with the pseudonym. So as to keep the source of data secret, a given pseudonym must be shared among a certain amount of gateways. A pseudonym shared among gateways is assumed to provide -anonymity since a dishonest grid operator can not determine which of the gateways sharing a pseudonym is the source of a given consumption value.

2 Problem statement

As detailed in the model Bohli et al. (2010), a smart metering application consists of an electricity supplier (ES) and a set of smart meters . Let be the electricity consumption measured by in period . We denote by the number of time periods included in a billing period. In a scenario in which consumption values are sent sharing the same pseudonym, at the end of a given billing period, the information collected by the ES is:

  • For each meter , the sum of its electricity consumption values over the billing period. That is,

  • For each period , the set of pseudonymized electricity consumption values transmitted by all the meters in . We denote these values as . Note that there exists a permutation , unknown to the ES, so that .

The privacy of customers is preserved as long as each permutation stays secret. Hence, the objective of a malicious ES aiming to get information about the consumption habits of a given meter is to determine for each time period , so that the consumption values of can be reidentified as .

2.1 Example

Let us assume a smart metering application composed of three smart meters and a billing period composed of nine time periods. The aggregated consumptions of the meters are , and . The pseudonymized consumption values are shown in Table 1.

Period Consumptions
1
2
3
4
5
6
7
8
9
Table 1: Pseudonymized consumptions (in Wh).

Now the ES can search for sets of three-element permutations satisfying that , , and . There exist three such solutions shown in Table 2.

=
=
=
=
=
=
=
=
=
Table 2: Solutions to the problem.

All the three possible solutions satisfy that , hence the ES can deduce that . In the same way, the ES also deduces that , , and , so that the consumption of at four periods is revealed. Regarding , the ES can exactly determine six consumption values, namely , , , , , and . For , the ES can infer , , , and . Hence, partial information about the consumption habits of the three smart meters actually leaked.

This example shows that a smart metering application composed of smart meters sharing the same pseudonym can not guarantee to achieve -anonymity.

2.2 Relaxed problem statement

Let a smart metering application with meters with a billing period composed of periods. For each period there exist possible -element permutations. Hence the search space for the previous problem is . This makes the problem solvable through exhaustive search only for instances with very small values of and . In our experiments, even a dynamic programming optimized implementation of the problem has failed to provide reasonable running times.

We consider a relaxed version of the problem in which the ES focuses on just one of the meters, for instance . Now, for each time period , the ES will reduce its search to sets of integers satisfying that .

In this case, for each period , there exist possible values for and the search space becomes . Although the search space keeps being exponential, the problem aiming to find one of such solutions corresponds to an instance of the multiple-choice subset-sum problem (MCSSP) which, as stated in Pisinger (1999), can be solved in polynomial time when the weights and profits (in our case the consumption values) are bounded by a constant. In our case, we need to find all such solutions, but dynamic programming techniques allow us to efficiently solve even moderately large instances of the problem.

2.3 Measuring privacy

As said before, in a smart metering application with meters and time periods, the objective of an attacker aiming to compromise the privacy of is to determine for each period . From the attacker point of view, each

is a random variable whose sample space is

.

When the attacker is able to exactly determine , then the entropy (uncertainty) on is 0 bits. On the other side, when it has no information at all about it, since there are possible values for , then its entropy is bits. An attacker with partial information about plausible values for will get an entropy ranging between and .

2.4 Our approach

In our approach, we search for all the solutions satisfying . Then, for each , we count the amount of times that each value from its range (

) appears in a solution. Finally, we assign a probability to each solution which is proportional to its appearance rate, and compute the entropy of random variable

.

2.5 Example

In our example, in its reduced version focused on meter , an attacker finds twenty-two solutions to the problem, which are shown in Table 3.

=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
Table 3: Solutions to the relaxed problem.

From these results, we estimate that the probability

(which implies ) is , while (with ) is . In this case, the entropy on is as low as bits.

A higher entropy is obtained for . In that case, is , is , and is . In this case, the entropy we get on is maximum, namely bits.

3 Experimental results

This section shows the results obtained from several experiments over real and synthetic data.

3.1 Experiments over real data

In our experiments we have used data from the set “Electricity Smart Meter CBT” kindly provided by the Irish Social Science Data Archive (ISSDA)111https://www.ucd.ie/issda/. That data set contains data collected from several houselholds in minute intervals. Electricity readings are provided in kWh with three decimal precision, which is equivalent to providing them in Wh.

In order to evaluate the privacy obtained for different group sizes and amount of time periods we have first filtered the data set. The provided file contains the readings of smart meters during about time periods. For each experiment, we have selected a random subset of smart meters , and consecutive time periods.

We generated several problem instances with and . For each instance, we ran our solver so as to find all the solutions focusing on the first smart meter . Then, we computed the entropy at each time period.

Max. entropy
Table 4: Average entropy for experiments over real data.

Table 4 summarizes the results obtained from these experiments. Each column corresponds to a value for . The first row shows the maximum obtainable entropy for each , namely bits. The rest of the rows correspond to different values for .

As it can be seen, a maximum (or almost maximum) entropy is reached in some problem instances, but, in general, the result is about bits below.

Regarding the influence of the amount of time periods, , it seems that this parameter has little influence on the results. In some cases, the highest entropy has been obtained for , while others required . Notice that for , provides a result better than (although in that case was the best).

Other problem instances were analyzed with similar results. Due to the nature of this data set, it is difficult to extract precise conclusions.

Solving instances with very large values for and is very difficult due to the high requirements both in time and space. Hence it may be possible that the adversary is even unable to obtain a set of possible solutions.

3.2 Experiments over synthetic data

We next detail several experiments conducted over synthetic data. So as to generate synthetic data sets we first inferred the probability distribution of the smart meter readings.

We analyzed the data from a single meter so as to estimate the parameters for several possible distributions. When possible, we used the uniformly minimum-variance unbiased estimators (UMVUE)

Voinov and Nikulin (1993); Lam et al. (1994). Then, we used Cramér-von Mises criterion Cramér (1928); Von Mises (1928) to evaluate the goodness of fit of each of the considered distributions. We repeated this process with data from other smart meters to verify the consistency of results.

Our conclusion was that, in general, smart meter readings follow an exponential distribution (although for some meters the Cramér-von Mises criterion pointed more to a normal distribution).

Thus, we created our synthetic problem instances as follows: The first smart meter (the one we are trying to identify) follows an exponential distribution with being the mean of the distribution. Each of the remaining meters follows an exponential distribution , so that their readings have a mean value of . Several problem instances were generated at random under the mentioned distributions.

Table 5: Average entropy for experiments over synthetic data for
Table 6: Average entropy for experiments over synthetic data for
Table 7: Average entropy for experiments over synthetic data for
Table 8: Average entropy for experiments over synthetic data for
Table 9: Average entropy for experiments over synthetic data for

Tables 5, 6, 7, 8, 9 show the results obtained from these new experiments.

In each table, the data of the first smart meter was generated under a different value for . Parameter was in Table 5, in Table 6, in Table 7, in Table 8, and in Table 9.

The results were similar to those obtained from real data from the set provided by the ISSDA. As before, except for the extreme case with only smart meters, the amount of periods did not affect very much. It seems, however, that has a larger tendency to produce the highest entropy, but that was not the case for larger values of . This is specially noticeable in Table 9.

The experiments show that the attainable entropy depends on the value of . It is higher when the distribution of the targeted meter is close to that of the remaining meters, and decreases as its distribution differs from the others.

When (Table 7), which corresponds to the case in which all the meters follow the same distribution, the highest privacy is obtained, as a maximum or almost maximum entropy for all combinations of and .

When the mean of the targeted meter readings is half of (Table 6, ) or doubles (Table 8, ) the mean of the others, the highest attainable entropy is about bits below the maximum.

For (Table 5) and (Table 9), the highest attainable entropy is about bits below the maximum for , but as increases, the gap increases to about bit. In the case for and , the highest entropy (obtained with ) is , which is bits below the maximum.

Our experiments allow to conclude that, when defining the household groups, we should select their size, , considering the desired level of privacy and assuming that the entropy will be about bit below the maximum ().

The number of time intervals can not be chosen as it depends on the billing period, but in general will tend to take high values (one month billing periods are quite usual).

4 Conclusion

In this paper an entropy-based measure for quantifying the real privacy provided by anonymous privacy-preserving smart metering methods has been proposed. As the underlying problem required for solving the original problem statement has shown to be too hard, we have performed several experiments on a relaxed formulation of it.

The experiments have shown that the attained privacy, measured as entropy, is about one bit below the theoretical maximum. Privacy increases when all the electricity readings of the involved meters are similar.

Acknowledgments

This study was funded by the European Regional Development Fund of the European Union in the scope of the “Programa Operatiu FEDER de Catalunya 2014–2020” (project number COMRDI16-1-0060), by the Spanish Ministry of Science, Innovation and Universities (project number MTM2017-83271-R), and by the Federal Ministry for Economic Affairs and Energy of Germany in the SINTEG project DESIGNETZ (project number 03SIN224).

References

  • Barbosa et al. (2016) Barbosa, P., Brito, A., and Almeida, H. (2016). A technique to provide differential privacy for appliance usage in smart metering. Information Sciences, 370-371:355 – 367.
  • Bohli et al. (2010) Bohli, J., Sorge, C., and Ugus, O. (2010). A privacy model for smart metering. In 2010 IEEE International Conference on Communications Workshops, pages 1–5.
  • Borges and Mühlhäuser (2014) Borges, F. and Mühlhäuser, M. (2014). Eppp4sms: Efficient privacy-preserving protocol for smart metering systems and its simulation using real-world data. IEEE Transactions on Smart Grid, 5(6):2701–2708.
  • Busom et al. (2016) Busom, N., Petrlic, R., Sebé, F., Sorge, C., and Valls, M. (2016). Efficient smart metering based on homomorphic encryption. Computer Communications, 82:95 – 101.
  • Cramér (1928) Cramér, H. (1928). On the composition of elementary errors. Scandinavian Actuarial Journal, 1928(1):13–74.
  • Dwork (2008) Dwork, C. (2008). Differential privacy: A survey of results. In Agrawal, M., Du, D., Duan, Z., and Li, A., editors, Theory and Applications of Models of Computation, pages 1–19, Berlin, Heidelberg. Springer Berlin Heidelberg.
  • Efthymiou and Kalogridis (2010) Efthymiou, C. and Kalogridis, G. (2010). Smart grid privacy via anonymization of smart metering data. In 2010 First IEEE International Conference on Smart Grid Communications, pages 238–243.
  • Egarter et al. (2014) Egarter, D., Prokop, C., and Elmenreich, W. (2014). Load hiding of household’s power demand. In 2014 IEEE International Conference on Smart Grid Communications (SmartGridComm), pages 854–859. IEEE.
  • Finster and Baumgart (2013) Finster, S. and Baumgart, I. (2013). Pseudonymous smart metering without a trusted third party. In 2013 12th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pages 1723–1728.
  • Garcia and Jacobs (2011) Garcia, F. D. and Jacobs, B. (2011). Privacy-friendly energy-metering via homomorphic encryption. In Cuellar, J., Lopez, J., Barthe, G., and Pretschner, A., editors, Security and Trust Management, pages 226–238, Berlin, Heidelberg. Springer Berlin Heidelberg.
  • Garra et al. (2019) Garra, R., Leibenger, D., Miret, J. M., and Sebé, F. (2019). Repairing an aggregation-based smart metering system. International Journal of Information Security, 18(5):637–646.
  • Garra et al. (2018) Garra, R., Martínez, S., Miret, J. M., and Sebé, F. (2018). Improving a smart metering system using elliptic curves and removing the trusted dealer. In XV Reunión Española sobre Criptología y Seguridad de la Información, pages 62–65.
  • Jawurek et al. (2011) Jawurek, M., Johns, M., and Rieck, K. (2011). Smart metering de-pseudonymization. In Proceedings of the 27th Annual Computer Security Applications Conference, ACSAC ’11, pages 227–236, New York, NY, USA. ACM.
  • Joye and Libert (2013) Joye, M. and Libert, B. (2013). A scalable scheme for privacy-preserving aggregation of time-series data. In Sadeghi, A.-R., editor, Financial Cryptography and Data Security, pages 111–125, Berlin, Heidelberg. Springer Berlin Heidelberg.
  • Jung et al. (2015) Jung, T., Li, X., and Wan, M. (2015). Collusion-tolerable privacy-preserving sum and product calculation without secure channel. IEEE Transactions on Dependable and Secure Computing, 12(1):45–57.
  • Kalogridis et al. (2010) Kalogridis, G., Efthymiou, C., Denic, S. Z., Lewis, T. A., and Cepeda, R. (2010). Privacy for smart meters: Towards undetectable appliance load signatures. In 2010 First IEEE International Conference on Smart Grid Communications, pages 232–237.
  • Lam et al. (1994) Lam, K., Sinha, B. K., and Wu, Z. (1994). Estimation of parameters in a two-parameter exponential distribution using ranked set sample. Annals of the Institute of Statistical Mathematics, 46(4):723–736.
  • Li et al. (2010) Li, F., Luo, B., and Liu, P. (2010). Secure information aggregation for smart grids using homomorphic encryption. In 2010 First IEEE International Conference on Smart Grid Communications, pages 327–332.
  • Nishide and Sakurai (2011) Nishide, T. and Sakurai, K. (2011). Distributed paillier cryptosystem without trusted dealer. In Chung, Y. and Yung, M., editors, Information Security Applications, pages 44–60, Berlin, Heidelberg. Springer Berlin Heidelberg.
  • Petrlic (2010) Petrlic, R. (2010). A privacy-preserving concept for smart grids. Sicherheit in vernetzten Systemen, 18:B1–B14.
  • Pisinger (1999) Pisinger, D. (1999). Linear time algorithms for knapsack problems with bounded weights. Journal of Algorithms, 33(1):1 – 14.
  • Rastogi and Nath (2010) Rastogi, V. and Nath, S. (2010). Differentially private aggregation of distributed time-series with transformation and encryption. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD ’10, pages 735–746, New York, NY, USA. ACM.
  • Shi et al. (2011) Shi, E., Chan, H., Rieffel, E., Chow, R., and Song, D. (2011). Privacy-preserving aggregation of time-series data. In Annual Network & Distributed System Security Symposium (NDSS). Citeseer.
  • Stegelmann and Kesdogan (2012) Stegelmann, M. and Kesdogan, D. (2012). Gridpriv: A smart metering architecture offering k-anonymity. In 2012 IEEE 11th International Conference on Trust, Security and Privacy in Computing and Communications, pages 419–426.
  • Voinov and Nikulin (1993) Voinov, V. and Nikulin, M. (1993). Unbiased Estimators and Their Applications. Volume 1: Univariate Case, volume 263 of Mathematics and Its Applications. Springer Netherlands.
  • Von Mises (1928) Von Mises, R. (1928). Wahrscheinlichkeit, Statistik und Wahrheit. Number v. 3 in Schriften zur wissenschaftlichen Weltauffassung. J. Springer.
  • Xie and Zhang (2015) Xie, C. and Zhang, R. (2015). Privacy-preserving power consumption data measuring protocol for smart grid. In International Conference on Computer Information Systems and Industrial Applications. Atlantis Press.