AMassive number of fine-grained electricity consumption data are being collected by smart meters. Identifying the load patterns from these smart meter data, i.e., residential load profiling, supports retailers and distribution network operators (DSO) in having a better understanding of the consumption behavior of consumers. For example, the retailers can provide personalized tariffs for different types of consumers; the DSO can perform detailed voltage simulation  or micro-grid operation  of the distribution network based on the identified load patterns.
Ideally, residential load profiling is carried out on a very large and diverse dataset to capture all different types of customers and behaviors. Particularly for retailers and third party providers such a diverse data set is important as they wish to design diversified electricity products to attract new consumers. However, residential load data are only monitored or collected by the corresponding retailers, i.e., each retailer only has the data of the consumers it serves. No center has access to all the smart meter data. Besides, since the smart meter data contains highly private information about the consumers , data sharing between retailers is not allowed. Thus, a privacy-preserving distributed clustering scheme is required, where retailers can possibly cooperate with others to jointly achieve the clustering results on their union consumption dataset via local calculation and communication. During the cooperation, the information of each retailer, e.g., the raw data or the number of consumers, will not be deduced by others.
So far, various clustering algorithms have been applied for load profiling, such as hierarchical clustering using different linkages, CFSFDP , k-means , fuzzy C-means algorithm (FCA) , Gaussian mixture model (GMM) 22], etc. However, to the best of our knowledge, there is no relevant research on privacy-preserving distributed clustering for load profiling.
To bridge this gap, this paper proposes a privacy-preserving distributed clustering framework for load profiling. This framework can be used to transform three commonly used clustering methods, i.e., k-means, FCA, and GMM, into distributed clustering algorithms for the purpose of privacy-preserving load profiling. Among these three methods, k-means is a ‘hard’ clustering method that delivers deterministic clustering results 
; while FCA and GMM are ‘soft’ methods that provide an extent or a probability measure of observations to each classification respectively, which can be leveraged to observe overlapping clusters or uncertain cluster memberships.
In fact, many works about privacy-preserving clustering have been conducted in different fields such as marketing and medicine . Among them, the cryptography-based methods are most commonly used. These methods use secure multiparty computation [15, 5], homomorphic encryption technique [33, 11], or the combination of both  to turn the clustering methods into the privacy-preserving k-means [33, 26], the privacy-preserving FCA , or the privacy-preserving GMM [5, 11]. However, the methods using secure multiparty computation are extremely computationally expensive . Besides, the overheads of encryption in the homomorphic encryption technique also limit the scope of the corresponding clustering methods  and result in time-consuming computations . To reduce overheads, secret sharing can be adopted to design the privacy-preserving k-means clustering [29, 21]. However, these secret-sharing-based methods, including the aforementioned cryptography-based methods, are not fully distributed algorithms, because each party (the data owners, like the retailers in this paper) either has to interact with a data center [33, 29, 15], or has to communicate with all the other parties [21, 26], or has to share its information along a pre-selected information transmission path [5, 11, 6]. These algorithms have the following drawbacks: (1) the existance of a data center or a preset information sharing path greatly increases the risk of a single point or single line failure; (2) the full communication between any two parties results in low scalability.
The proposed privacy-preserving distributed clustering framework aims to solve the above issues. We first perform commonality analysis of the traditional k-means, FCA, and GMM, and point out that the key to the clustering framework lies in how to calculate the summation of retailers’ private information in a fully distributed and privacy-preserving way. The average consensus (AC) algorithm, as an important fully distributed computing method in the automatic control area, provides the means to achieve the summation. However, the slow rate of its convergence towards the average is the major deficiency of this algorithm . Besides, the AC algorithm will reveal the private information available to the retailers during the interaction between neighbors. Therefore, we first introduce an accelerated AC (AAC) algorithm to significantly improve the rate of convergence without sacrificing the simplicity of the original AC algorithm . Then, we adapt the AAC algorithm to provide a privacy-preserving version by leveraging the exponentially decaying disturbance with zero-sum property proposed in . The convergence of the proposed privacy-preserving AAC (PP-AAC) algorithm is also proved. After that, we develop the privacy-preserving distributed clustering framework based on the proposed algorithm. This framework can convert the traditional k-means, FCA, and GMM into fully distributed privacy-preserving clustering methods, where each retailer only needs to communicate with its surrounding neighbors to obtain the exact load pattern identification results of all the consumers. Finally, we provide the privacy and complexity analyses of the proposed framework.
This paper makes the following contributions:
Propose a privacy-preserving distributed clustering framework for load profiling. This framework is based on an original PP-AAC algorithm, which is theoretically proven to be convergent.
Provide the privacy and complexity analyses of the proposed framework theoretically and practically. Results show that this framework not only protects the data privacy of retailers but also greatly reduces the computational overhead.
Develop the privacy-preserving distributed k-means, FCA, and GMM clustering methods using the proposed framework. These methods are applied to identify electrical load patterns, whose results are the same as that of the centralized clustering methods.
To the best of our knowledge, this is the first time that the electrical load data has been analyzed using privacy-preserving distributed clustering methods.
The rest of this paper is organized as follows. Section II analyzes the commonality of k-means, FCA, and GMM. The PP-AAC algorithm is proposed in Section III. Section IV develops the privacy-preserving distributed clustering framework for the three clustering methods. Case studies are provided in Section V, and Section VI concludes this paper.
Ii Problem Formulation
This section first briefly reviews the standard clustering methods: k-means , FCA , and GMM , and then gives the commonality analyses of them. Before that, we assume that the union data set consists of observations. These observations are distributed among retailers, where retailer has consumers, i.e., observations. Besides, the centroid of cluster , described by , is considered as the -th load pattern of the union data set.
K-means partitions observations into
clusters by minimizing the within-cluster variances as follows:
where is the -th observation of retailer . represents the index set of the observations belonging to cluster .
Although finding the solution is NP-hard, Lloyd’s algorithm guarantees to find a local minimum in a few iterations . First, initial cluster centroids are arbitrarily and randomly assigned. Then, in each iteration, the cluster index of is computed by
and the centroid of cluster is updated by
sequentially. These two steps are repeated until convergence is achieved. Note that equals if and otherwise.
FCA is the best-known method for fuzzy clustering with the objective function given as follows:
where is the fuzziness index and is the degree to which belongs to . The following iterative procedure solves this problem: the degree to which the observation belongs to cluster is first calculated by
Then, the centroid of cluster is updated by
Different from k-means, where each observation either belongs to a cluster or not, FCA assigns degrees for each observation to be in every cluster, i.e., FCA is a type of soft clustering.
As a convex combination of Gaussian components with weight and covariance , GMM is given by
where each Gaussian component represents a cluster.
To divide the union data set into
clusters by GMM, one should train GMM by leveraging the maximum likelihood estimation, which is given as follows:
The most commonly used maximum likelihood estimation method is the expectation-maximization (EM) algorithm, which can be summarized as two iterative steps: the E-step and the M-step. The E-step, as given in
computes the probability that an observation belongs to cluster . The M-step updates the parameters in (9) according to
Ii-D Commonality Analysis
The clustering of k-means, FCA, and GMM have two points in common, which are listed in Remark 2.1 and 2.2.
Remark 2.1: The clustering processes of k-means, FCA, and GMM can all be summarized in two parts: the local calculation part and the global calculation part, where the local one can be performed by each retailer, and the global one is essentially the summation of each retailer’s local calculation results.
In fact, each retailer can directly perform the first steps of the three algorithms via its own data, i.e., the calculation in (1), (5) or (10). Then, retailer is able to compute the following local results :
depending on the algorithm used. Once each retailer obtains the local results, the global summation of those local results from all retailers is required to continue the clustering method. For example, k-means algorithm needs to sum the local results and of all retailers respectively to update the centroid of cluster in (2). Let be the global summation result, then we have:
Therefore, the relationship between the local and the global calculation parts can be generalized to:
where can be calculated by each retailer locally using (17), while the computation of needs cooperations among all retailers. Once in (18) is obtained, the second steps of the three algorithms can be carried out and the iterative procedure continues.
Remark 2.2: Each retailer’s local calculation results from k-means, FCA, and GMM contain private information, so that retailer will refuse to share its with others.
In fact, if retailer shares its () with retailer , the latter can derive the following private information of retailer :
Ii-D1 The number of retailer ’s consumers
Ii-D2 The proportion or number of retailer ’s consumers belonging to cluster
Once retailer receives , it will also obtain the proportion of retailer ’s consumers belonging to cluster by:
Particularly, retailer can directly know the specific number of retailer ’s consumers belonging to cluster by receiving in (4).
Ii-D3 Retailer ’s local load pattern of cluster
which will reveal the approximate load pattern of retailer . For example, we choose in (3) and in (4), then is essentially the mean of retailer ’s observations belonging to cluster , which can be considered as its approximate load pattern in cluster . The approximation lies in the fact that and are calculated using the global centroid in (2) in the last iteration, not the local centroid of retailer in the last iteration; otherwise it will be the exact load pattern based on retailer ’s data set.
Definition 2.3: We define the “privacy” of retailer () as the information set .
Iii PP-AAC Algorithm
To achieve a distributed summation algorithm, this section first introduces an AAC algorithm with a fast convergence rate . After that, we further improve the AAC algorithm by leveraging an exponentially decaying disturbance with zero-sum property to propose a PP-AAC algorithm. Finally, the convergence of the proposed algorithm is proved.
Iii-a AAC Algorithm
The AAC algorithm is graph-theory-based. Therefore, we consider a graph consisting of the nodes and edges. Each node represents a retailer, and the edge between each pair of nodes means that there is bidirectional noise-free communication between two retailers. This graph is publicly known by all retailers. Denote the node set by and the edge set by . The neighborhood of retailer is represented by , and the degree of retailer is denoted by . Let be the Metropolis weight matrix with elements as follows :
In the AAC algorithm, each retailer has a state value that will be updated through iterations. Let be the state of retailer in the AAC algorithm, then the state update equation of the AAC algorithm in the -th iteration is given by
which is a convex combination of the value from the original AC algorithm and the predictor given respectively by
The matrix form of the update is given as follows:
where , and
is the identity matrix. We callthe accelerated Metropolis weight matrix.
In this way, will converge to the mean of all retailers’ initial standardized state values
with the fastest asymptotic worst-case convergence rate if the weighted coefficient equals the optimal value :
is the smallest eigenvalue of, and is the second largest eigenvalue of . Since the graph is publicly known by all retailers, each retailer can easily compute using (20). Then, can be obtained by all retailers using (24).
Note that the AAC algorithm is fully distributed, i.e., each retailer only needs to communicate with its neighbors. Besides, after convergence, retailers can obtain the summation of their initial standardized state values by multiplying the mean in (26) by . Thus, let be equal to , then each retailer can obtain in (18) in a fully distributed manner using the AAC algorithm. However, in the first iteration, retailer will send to its neighbors, which directly reveals the private information of retailer .
Iii-B PP-AAC Algorithm
To facilitate the AAC algorithm with privacy-persevering characteristics, we utilize the exponentially decaying disturbance with zero-sum property from  to mask the interactive state values among neighbors during the AAC iterations, so that each retailer cannot derive private information of the others.
The proposed PP-AAC algorithm is defined by
where is the state value masked by the disturbance as follows:
The noise is randomly selected from by retailer , where , , and . This design leads to the two features of , which will be used for the following proof of Theorem 3.1:
The noise is exponentially decaying as and grows with the number of iterations. So is also exponentially decaying.
The disturbance has zero-sum property, which means that if we sum up from to infinity (or to a relatively large number), the result will be close to 0, i.e.,
Proof: See Appendix.
Iv Privacy-preserving Distributed Clustering Framework
This section describes the privacy-preserving distributed clustering framework for k-means, FCA, and GMM incorporating the proposed PP-AAC algorithm. In addition, we provide the privacy and complexity analyses of the proposed framework.
Iv-a Clustering Framework
The idea of the clustering framework is that independent of the employed clustering method, in every iteration, each retailer first performs its local calculation according to (17); then each retailer sets its local result as the initial state of the proposed PP-AAC algorithm; after convergence, each retailer obtains the global summation of all the local results in (18); finally, using the global summations, each retailer can perform the rest of the clustering method to update the global information, e.g., the centroids of all clusters. The detailed clustering framework is demonstrated in Algorithm 1.
Iv-B Privacy Analysis
As aforementioned, the AAC algorithm will directly reveal the initial value in the first iteration. On the contrary, in the first iteration of the proposed PP-AAC algorithm, retailer () receives () instead of . Since is masked using independent disturbance by retailer , retailer cannot derive the original value of from , thus retailer will not know the private of its neighbors, protecting the private information of retailer . In the remaining iterations, the process of adding disturbance continues; meanwhile, begins to converge to the mean value in (26) and moves away from its initial value, which further masks the true initial value. Quantitative illustrations will be shown in the next section.
In addition, we should note that if and , i.e., retailer can receive all the information that retailer has received, including retailer ’s information, then retailer can deduce retailer ’s initial value even if the disturbance is introduced . Therefore, the authors in  and  both consider it necessary to assume that retailer cannot receive all the information that retailer has. The assumption is also adopted in this paper. Since is publicly known by all retailers, retailer can tell that whether is a subset of its neighbor’s . If such a situation occurs, retailer can refuse to communicate with retailer . Therefore, the assumption will hold in practice.
Iv-C Complexity Analysis
For the distributed framework, we investigate each retailer’s computation and communication overhead.
The proposed clustering framework not only keeps all the multiplication calculations in the original clustering methods, but also introduces new multiplication calculations by integrating the proposed PP-AAC algorithm. The multiplication calculations in the original clustering methods are divided by retailers according to their number of observations, i.e., if the computation overhead of the original clustering method is , then the overhead of retailer is . Moreover, in each iteration of the PP-AAC algorithm, although the disturbance can be queried from the preset lookup table, retailer () still needs to compute and (), which requires multiplications. Let denote the iteration number of the selected clustering method, and represent the iteration number of the proposed AAC algorithm, then the computation overhead of retailer is . Take k-means for example, where is , then retailer ’s overhead is . Please note that , because the number of retailers in a DN is small, and the proposed PP-AAC algorithm’s convergence is accelerated, thus is generally also small. However, is thousands and . Moreover, we know that . Therefore, the computation overhead of retailer is significantly smaller than that of the centralized k-means. Detailed illustrations is shown in the next section.
Besides, in each iteration of the proposed AAC algorithm, the communication number of retailer is . Therefore, the communication overhead of retailer is .
V Case Study
V-a Data Description and Experiment Setup
We utilize the smart meter data from Ireland for verification, which contains 509660 half-hourly daily electrical consumption observations of 1000 consumers . The representative load profile (RLP) of each consumer is obtained via the method presented in . Thus we get the union data set consisting of 1000 48-dimensional RLPs. For the verification of the proposed PP-AAC algorithm and the clustering framework, e.g., the correctness, the efficiency, the privacy-preserving feature, and the effectiveness, we assume that there are 10 retailers in a DN, and each of them has access to 100 consumers. Their initial communication topology is shown in Fig. 1, where each retailer only communicates with its one-hop neighbors, and retailer () cannot receive all the information that any of its neighbors has. We also use different topologies to investigate the trend of the computational cost of the proposed clustering framework with respect to different topologies. Besides, we set and for randomly selecting the disturbance. Meanwhile, the initial centroids for all clustering methods are randomly chosen.
V-B Verification of the PP-AAC algorithm
To verify the correctness and efficiency of the proposed PP-AAC algorithm, we compare it with three algorithms: the original AC algorithm in , the AAC algorithm proposed in  and the PP-AC algorithm proposed in . We use the four algorithms to compute the summation of the observations from each retailer’s first consumer. We then illustrate the average error of all retailers relative to the accurate summation result. The errors of the four algorithms for each iteration are shown in Fig. 2. It can be observed that the average error of the proposed PP-AAC algorithm converges to 0, indicating the correctness of this method. In addition, the proposed algorithm has the same convergence rate as the AAC algorithm. The PP-AC algorithm also has the same convergence rate as the AC algorithm. Please note that the proposed algorithm converges faster than both the AC and the PP-AC algorithm, indicating the efficiency of the proposed algorithm. Therefore, the correctness and efficiency of the proposed PP-AAC algorithm are verified.
Compared to the AC algorithm and the AAC algorithm, the proposed algorithm also has the privacy-preserving feature. To illustrate this feature, we provide the value that retailer 1 shares with its neighbors during the above summation calculation at each iteration. The shared values of the four algorithms are shown in Fig. 3.
These shared values all converge to the real average value, but we should note that retailer 1 shares its real initial value with its neighbors in the first iteration when performing the AC and the AAC algorithm, which directly reveals the private information of retailer 1. However, after introducing the disturbance for masking, the proposed algorithm enables retailer 1 to share its masked initial value to its neighbors, which is far away from the real one as indicated by the black arrow. Thus, the proposed algorithm protects the privacy of retailer 1. Moreover, the proposed algorithm still converges faster than the PP-AC algorithm, even if they both start from the same masked initial point.
V-C Verification of the Proposed clustering framework
We can employ the proposed clustering framework to obtain privacy-preserving distributed k-means, FCA, and GMM clustering methods. Then, we use them for load pattern identification on the distributed data sets. As benchmarks, we also use the centralized k-means, FCA, and GMM for load pattern identification on the corresponding union data set.
To verify the correctness of the clustering framework, in Fig. 4, we use the Silhouette coefficient index (SCI)  to evaluate the above distributed and centralized algorithms for a different numbers of clusters. Note that the abbreviation ‘PPD’ in Fig. 4 represents ‘privacy-preserving distributed’. This figure clearly shows that the SCI results of the proposed privacy-preserving distributed algorithms are identical to those of the centralized algorithms. This means that the clustering results on the distributed data sets using the proposed clustering framework, are exactly the same as those on the union data set computed via the centralized methods, indicating the correctness of the proposed clustering framework.
To verify the effectiveness of the clustering framework, we choose k-means for demonstration as it is a hard clustering method, which is very convenient for illustration. We use the most common way, i.e., the sum of squared errors (SSE), to find the optimal number of clusters . From this, we find that the optimal cluster number of the union data set (1000 RLPs, i.e., 1000 consumers) is , while that of the data set of retailer 1 (100 RLPs) is . After that, we perform the centralized k-means on retailer 1’s data set, and the results are shown in Fig. 5(a). Besides, we also perform the proposed privacy-preserving distributed k-means and the centralized k-means on the union data set. The results are demonstrated in Fig. 5(b). The number of RLPs in each cluster is listed in the sub figure’s title. Meanwhile, the RLPs and the load patterns of retailer 1 are highlighted in Fig. 5(b) as well.
First, from Fig. 5(b), we can observe that the centroids of the proposed algorithm are coincident with the centroids of the centralized k-means. Second, the two load patterns of retailer 1’s data set in Fig. 5(a), approximately match the 2nd and the 3rd load patterns of the union data set in Fig. 5(b). However, retailer 1 missed the remaining four categories of consumers. Certainly, if retailer 1 only uses its own two load patterns for tariff design, its products will be difficult to attract the 608 consumers in the remaining clusters. On the contrary, by the proposed clustering framework, each retailer can use the six load patterns of all consumers for tariff design to attract all of them. Therefore, the effectiveness of the proposed clustering framework is proven.
To verify the efficiency of the clustering framework, we provide the computational time and iteration numbers (I-Ns) of the centralized and the privacy-preserving distributed clustering methods. Note that for the distributed methods, the retailers’ computational times are different. Thus the maximum computational time of all retailers is chosen to represent the time of the distributed methods. Details are given in Table I. From this table, it is obvious that the iteration numbers of the corresponding centralized and distributed clustering methods are the same, but the computational times of the corresponding methods differ by an order of magnitude: the time consumed by each retailer in distributed clustering is significantly less than that of the centralized clustering, indicating the high efficiency of the proposed clustering framework.
|Methods||K-means||PPD K-means||FCA||PPD FCA||GMM||PPD GMM|
Please note that the above computational time does not contain communication time. However, this time is probably negligible. In k-means for example, each retailer shares its masked value to its neighbors, which consists of the masked and for . Thus each retailer actually shares 294 floating-point numbers with its neighbors, i.e., 1.15 kbytes. We know that , , and . Meanwhile, as shown in Fig. 2, and the degree of the retailer that consumes the most time is (retailer 1), which is also the maximum degree among the retailers. According to the communication overhead analysis in Section IV-C, the maximum total amount of upstream data of all the retailers will be kbytes Mbytes. Since the global average broadband internet speed is 11.03Mbps, the actual maximum communication time for retailers will not exceed 0.1 seconds. This cost will be greatly reduced in Europe as it has the world’s highest concentration of countries with the fastest internet, e.g., Sweden’s average speed is 55.18Mbps .
V-D Verification of Different Topologies
Although the computational time of retailers’ local calculation is not affected by the change of communication topology, different topologies directly affect the degree of retailers as well as the iteration numbers of the proposed PP-AAC algorithm, resulting in a change in the computational time of the AAC algorithm, which in turn changes the time of the clustering framework. To investigate this trend, we randomly change the communication topology to obtain 9 topologies as shown in Fig. 6.
Then we measure the total execution time of the AAC algorithm part in the clustering framework for each retailer. Finally, we demonstrate the average time of all retailers when performing the AAC algorithm part in the clustering framework in Fig. 7. The average of the retailers and the average iteration numbers of the AAC algorithm under different topologies are also provided in Table II. Please note that, the assumption that retailer cannot receive all the information which its neighbors have received will cause the number of possible communication lines saturate quickly and result in only minor differences between different topologies. Thus we temporarily ignore this assumption to purely demonstrate the variation of cost for different topologies more clearly.
|Average iteration number||121||81||77||25||17||21||14||12||9|
Theoretically, the average computation overhead of the proposed AAC algorithm part is , where denotes the average of all retailers. From Table II, we know that although the average of the retailers increases with the number of the topology map shown in Fig. 6, the increase is much smaller than the decrease of iteration numbers, so the computational time in Fig. 7 is dominated by the iteration numbers. In fact, the worst-case measure of the proposed AAC algorithm’s asymptotic convergence rate is proportional to the spectral radius of matrix , where is the averaging matrix . Since the convergence rate determines the iteration numbers, and the computational time is dominated by the iteration numbers, the trend of the computational time is coincident with the trend of the spectral radius. For verification, we also illustrate the variation in the spectral radius under the different topologies in Fig. 7. As we can see, the decreasing trends of the computational times and the spectral radius are the same.
In this paper, we propose a privacy-preserving distributed clustering framework, which can directly modify the traditional k-means, FCA, and GMM clustering methods and provide privacy-preserving distributed variants. To achieve this, we first performed commonality analysis of the three clustering methods, and pointed out that the key of the clustering framework lies in calculating the summation of the retailers’ private information in a fully distributed and privacy-preserving way. Then we developed a PP-AAC algorithm with proven convergence to achieve the summation. Finally, we presented the privacy-preserving distributed clustering framework based on the proposed algorithm with theoretical privacy and complexity analyses.
The proposed PP-AAC algorithm converges faster than the privacy-preserving AC algorithm and the original AC algorithm. Besides, compared to the original AC algorithm and AAC algorithm, the proposed algorithm is privacy-preserving by introducing the exponentially decaying disturbance with zero-sum property into the shared information. The proposed clustering framework can enable each retailer to obtain the exact residential load pattern identification of all consumers instead of only its own consumers. Thus, this framework can support retailers design better tariff products to attract new users. Meanwhile, the clustering framework not only protects every retailer’s privacy, but also greatly reduces the computation overhead of each retailer compared to the centralized method. Moreover, under different communication topologies, the decreasing trends of the PP-AAC part’s computational times and the spectral radius are the same.
First, we need to prove that is doubly stochastic, i.e., that
as a vector of all ones, then we have:
is a doubly stochastic matrix proved in, the following holds:
Substitute into , we obtain
Similarly, we can obtain with the property that .
Second, define , and . Then we have the matrix form of the proposed PP-AAC algorithm:
-  (2011) A clustering method combining differential evolution with the k-means algorithm. Pattern Recognit. Lett. 32 (12), pp. 1613 – 1621. External Links: Cited by: §V-C.
-  (2009-04) Accelerated distributed average consensus via localized node state prediction. IEEE Trans. Signal Process. 57 (4), pp. 1563–1576. External Links: Cited by: §I, §III-A, §III, §V-B, §V-D.
-  (2006-05) Comparisons among clustering techniques for electricity customer classification. IEEE Trans. Power Syst. 21 (2), pp. 933–940. External Links: Cited by: §I.
-  (2011) The impact of vehicle-to-grid on the distribution grid. Electr Pow Syst Res. 81 (1), pp. 185 – 192. External Links: Cited by: §I.
-  (2002) Tools for privacy preserving distributed data mining. ACM Sigkdd Explorations Newsletter 4 (2), pp. 28–34. Cited by: §I.
-  (2016-08) Efficient and privacy-preserving k-means clustering for big data mining. In 2016 IEEE Trustcom/BigDataSE/ISPA, Vol. , pp. 791–798. External Links: Cited by: §I.
-  (2019) Consensus-based data-privacy preserving data aggregation. IEEE Trans. Autom. Control. (), pp. 1–1. External Links: Cited by: §I, §III-B, §IV-B, §V-B, Appendix.
-  (2012) Commission for energy regulation (cer) smart metering project.. Note: http://www.ucd.ie/issda/data/ commissionforenergyregulationcer/ Cited by: §V-A.
-  (2011-10) Energy management and operational planning of a microgrid with a pv-based active generator for smart grid applications. IEEE Trans. Ind. Electron. 58 (10), pp. 4583–4592. External Links: Cited by: §I.
-  (2019-02) Countries with the fastest internet in the world 2019. ATLAS and BOOTS.. External Links: Cited by: §V-C.
-  (2017) Corruption-resistant privacy preserving distributed em algorithm for model-based clustering. In 2017 IEEE Trustcom/BigDataSE/ICESS, pp. 1082–1089. Cited by: §I.
-  (2016) A novel time-of-use tariff design based on gaussian mixture model. Appl. Energy. 162, pp. 1530 – 1536. External Links: Cited by: §I.
-  (2004) . Anal. Chim. Acta. 515 (1), pp. 87 – 100. External Links: Cited by: §V-C.
-  (1982-03) Least squares quantization in pcm. IEEE Trans. Inf. Theory. 28 (2), pp. 129–137. External Links: Cited by: §I, §II-A, §II.
-  (2018) PRIVACY preserving data mining using threshold based fuzzy cmeans clustering.. ICTACT Journal on Soft Computing 9 (1). Cited by: §I.
-  (2004-11) A comparative analysis of neural and fuzzy cluster techniques applied to the characterization of electric load in substations. In 2004 IEEE/PES Transmision and Distribution Conference and Exposition, Vol. , pp. 908–913. External Links: Cited by: §I.
-  (2009-05) Security and privacy challenges in the smart grid. IEEE Secur Priv. 7 (3), pp. 75–77. External Links: Cited by: §I.
-  (2012) Privacy preserving k-means clustering: a survey research.. Int. Arab J. Inf. Technol. 9 (2), pp. 194–200. Cited by: §I.
-  (2017-02) Privacy preserving average consensus. IEEE Trans. Autom. Control. 62 (2), pp. 753–765. External Links: Cited by: §IV-B.
-  (2010) Communication complexity and energy efficient consensus algorithm. IFAC Proceedings Volumes. 43 (19), pp. 209 – 214. External Links: Cited by: §IV-C.
-  (2012) An efficient approach for privacy preserving distributed k-means clustering based on shamir’s secret sharing scheme. In IFIP International Conference on Trust Management, pp. 129–141. Cited by: §I.
-  (2010) Data-based method for creating electricity use load profiles using large amount of customer-specific hourly measured electricity use data. Appl. Energy. 87 (11), pp. 3538 – 3545. External Links: Cited by: §I.
-  (2007) Privacy preserving k-means clustering in multi-party environment.. In SECRYPT, pp. 381–385. Cited by: §I.
-  (2010-02) Statistical representation of distribution system loads using gaussian mixture model. IEEE Trans. Power Syst. 25 (1), pp. 29–37. External Links: Cited by: §II-C, §II.
-  (2014-02) Enhanced load profiling for residential network customers. IEEE Trans. Power Del. 29 (1), pp. 88–96. External Links: Cited by: §I.
-  (2007-05) Privacy-preserving two-party k-means clustering via secure approximation. In 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW’07), Vol. 1, pp. 385–391. External Links: Cited by: §I.
-  (2017-05) C-vine copula mixture model for clustering of residential electrical load pattern data. IEEE Trans. Power Syst. 32 (3), pp. 2382–2393. External Links: Cited by: §V-A.
-  (2007-08) Two-stage pattern recognition of load curves for classification of electricity customers. IEEE Trans. Power Syst. 22 (3), pp. 1120–1128. External Links: Cited by: §I.
-  (2010) Efficient privacy preserving k-means clustering. In Pacific-Asia Workshop on Intelligence and Security Informatics, pp. 154–166. Cited by: §I.
-  (2016-Sep.) Clustering of electricity consumption behavior dynamics toward big data applications. IEEE Trans. Smart Grid. 7 (5), pp. 2437–2447. External Links: Cited by: §I.
-  (2012) Analysis of parameter selections for fuzzy c-means. Pattern Recognit. 45 (1), pp. 407 – 415. External Links: Cited by: §II.
-  (2005-04) A scheme for robust distributed sensor fusion based on average consensus. In IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005., Vol. , pp. 63–70. External Links: Cited by: §III-A, §V-B, Appendix.
-  (2017-08) Mutual privacy preserving -means clustering in social participatory sensing. IEEE Trans. Ind. Informat. 13 (4), pp. 2066–2076. External Links: Cited by: §I.