I Introduction
AMassive number of finegrained electricity consumption data are being collected by smart meters. Identifying the load patterns from these smart meter data, i.e., residential load profiling, supports retailers and distribution network operators (DSO) in having a better understanding of the consumption behavior of consumers. For example, the retailers can provide personalized tariffs for different types of consumers; the DSO can perform detailed voltage simulation [4] or microgrid operation [9] of the distribution network based on the identified load patterns.
Ideally, residential load profiling is carried out on a very large and diverse dataset to capture all different types of customers and behaviors. Particularly for retailers and third party providers such a diverse data set is important as they wish to design diversified electricity products to attract new consumers. However, residential load data are only monitored or collected by the corresponding retailers, i.e., each retailer only has the data of the consumers it serves. No center has access to all the smart meter data. Besides, since the smart meter data contains highly private information about the consumers [17], data sharing between retailers is not allowed. Thus, a privacypreserving distributed clustering scheme is required, where retailers can possibly cooperate with others to jointly achieve the clustering results on their union consumption dataset via local calculation and communication. During the cooperation, the information of each retailer, e.g., the raw data or the number of consumers, will not be deduced by others.
So far, various clustering algorithms have been applied for load profiling, such as hierarchical clustering using different linkages
[28], CFSFDP [30], kmeans [3], fuzzy Cmeans algorithm (FCA) [16], Gaussian mixture model (GMM) [25][22], etc. However, to the best of our knowledge, there is no relevant research on privacypreserving distributed clustering for load profiling.To bridge this gap, this paper proposes a privacypreserving distributed clustering framework for load profiling. This framework can be used to transform three commonly used clustering methods, i.e., kmeans, FCA, and GMM, into distributed clustering algorithms for the purpose of privacypreserving load profiling. Among these three methods, kmeans is a ‘hard’ clustering method that delivers deterministic clustering results [12]
; while FCA and GMM are ‘soft’ methods that provide an extent or a probability measure of observations to each classification respectively, which can be leveraged to observe overlapping clusters or uncertain cluster memberships
[14].In fact, many works about privacypreserving clustering have been conducted in different fields such as marketing and medicine [23]. Among them, the cryptographybased methods are most commonly used. These methods use secure multiparty computation [15, 5], homomorphic encryption technique [33, 11], or the combination of both [26] to turn the clustering methods into the privacypreserving kmeans [33, 26], the privacypreserving FCA [15], or the privacypreserving GMM [5, 11]. However, the methods using secure multiparty computation are extremely computationally expensive [21]. Besides, the overheads of encryption in the homomorphic encryption technique also limit the scope of the corresponding clustering methods [6] and result in timeconsuming computations [18]. To reduce overheads, secret sharing can be adopted to design the privacypreserving kmeans clustering [29, 21]. However, these secretsharingbased methods, including the aforementioned cryptographybased methods, are not fully distributed algorithms, because each party (the data owners, like the retailers in this paper) either has to interact with a data center [33, 29, 15], or has to communicate with all the other parties [21, 26], or has to share its information along a preselected information transmission path [5, 11, 6]. These algorithms have the following drawbacks: (1) the existance of a data center or a preset information sharing path greatly increases the risk of a single point or single line failure; (2) the full communication between any two parties results in low scalability.
The proposed privacypreserving distributed clustering framework aims to solve the above issues. We first perform commonality analysis of the traditional kmeans, FCA, and GMM, and point out that the key to the clustering framework lies in how to calculate the summation of retailers’ private information in a fully distributed and privacypreserving way. The average consensus (AC) algorithm, as an important fully distributed computing method in the automatic control area, provides the means to achieve the summation. However, the slow rate of its convergence towards the average is the major deficiency of this algorithm [2]. Besides, the AC algorithm will reveal the private information available to the retailers during the interaction between neighbors. Therefore, we first introduce an accelerated AC (AAC) algorithm to significantly improve the rate of convergence without sacrificing the simplicity of the original AC algorithm [2]. Then, we adapt the AAC algorithm to provide a privacypreserving version by leveraging the exponentially decaying disturbance with zerosum property proposed in [7]. The convergence of the proposed privacypreserving AAC (PPAAC) algorithm is also proved. After that, we develop the privacypreserving distributed clustering framework based on the proposed algorithm. This framework can convert the traditional kmeans, FCA, and GMM into fully distributed privacypreserving clustering methods, where each retailer only needs to communicate with its surrounding neighbors to obtain the exact load pattern identification results of all the consumers. Finally, we provide the privacy and complexity analyses of the proposed framework.
This paper makes the following contributions:

Propose a privacypreserving distributed clustering framework for load profiling. This framework is based on an original PPAAC algorithm, which is theoretically proven to be convergent.

Provide the privacy and complexity analyses of the proposed framework theoretically and practically. Results show that this framework not only protects the data privacy of retailers but also greatly reduces the computational overhead.

Develop the privacypreserving distributed kmeans, FCA, and GMM clustering methods using the proposed framework. These methods are applied to identify electrical load patterns, whose results are the same as that of the centralized clustering methods.
To the best of our knowledge, this is the first time that the electrical load data has been analyzed using privacypreserving distributed clustering methods.
The rest of this paper is organized as follows. Section II analyzes the commonality of kmeans, FCA, and GMM. The PPAAC algorithm is proposed in Section III. Section IV develops the privacypreserving distributed clustering framework for the three clustering methods. Case studies are provided in Section V, and Section VI concludes this paper.
Ii Problem Formulation
This section first briefly reviews the standard clustering methods: kmeans [14], FCA [31], and GMM [24], and then gives the commonality analyses of them. Before that, we assume that the union data set consists of observations. These observations are distributed among retailers, where retailer has consumers, i.e., observations. Besides, the centroid of cluster , described by , is considered as the th load pattern of the union data set.
Iia Kmeans
Kmeans partitions observations into
clusters by minimizing the withincluster variances as follows:
where is the th observation of retailer . represents the index set of the observations belonging to cluster .
Although finding the solution is NPhard, Lloyd’s algorithm guarantees to find a local minimum in a few iterations [14]. First, initial cluster centroids are arbitrarily and randomly assigned. Then, in each iteration, the cluster index of is computed by
(1) 
and the centroid of cluster is updated by
(2)  
(3)  
(4) 
sequentially. These two steps are repeated until convergence is achieved. Note that equals if and otherwise.
IiB Fca
FCA is the bestknown method for fuzzy clustering with the objective function given as follows:
where is the fuzziness index and is the degree to which belongs to . The following iterative procedure solves this problem: the degree to which the observation belongs to cluster is first calculated by
(5) 
Then, the centroid of cluster is updated by
(6)  
(7)  
(8) 
Different from kmeans, where each observation either belongs to a cluster or not, FCA assigns degrees for each observation to be in every cluster, i.e., FCA is a type of soft clustering.
IiC Gmm
As a convex combination of Gaussian components with weight and covariance , GMM is given by
(9) 
where each Gaussian component represents a cluster.
To divide the union data set into
clusters by GMM, one should train GMM by leveraging the maximum likelihood estimation, which is given as follows:
s.t. 
The most commonly used maximum likelihood estimation method is the expectationmaximization (EM) algorithm
[24], which can be summarized as two iterative steps: the Estep and the Mstep. The Estep, as given in(10) 
computes the probability that an observation belongs to cluster . The Mstep updates the parameters in (9) according to
(11)  
(12)  
(13)  
(14)  
(15)  
(16) 
IiD Commonality Analysis
The clustering of kmeans, FCA, and GMM have two points in common, which are listed in Remark 2.1 and 2.2.
Remark 2.1: The clustering processes of kmeans, FCA, and GMM can all be summarized in two parts: the local calculation part and the global calculation part, where the local one can be performed by each retailer, and the global one is essentially the summation of each retailer’s local calculation results.
In fact, each retailer can directly perform the first steps of the three algorithms via its own data, i.e., the calculation in (1), (5) or (10). Then, retailer is able to compute the following local results :
(17) 
depending on the algorithm used. Once each retailer obtains the local results, the global summation of those local results from all retailers is required to continue the clustering method. For example, kmeans algorithm needs to sum the local results and of all retailers respectively to update the centroid of cluster in (2). Let be the global summation result, then we have:
(18) 
Therefore, the relationship between the local and the global calculation parts can be generalized to:
(19) 
where can be calculated by each retailer locally using (17), while the computation of needs cooperations among all retailers. Once in (18) is obtained, the second steps of the three algorithms can be carried out and the iterative procedure continues.
Remark 2.2: Each retailer’s local calculation results from kmeans, FCA, and GMM contain private information, so that retailer will refuse to share its with others.
In fact, if retailer shares its () with retailer , the latter can derive the following private information of retailer :
IiD1 The number of retailer ’s consumers
IiD2 The proportion or number of retailer ’s consumers belonging to cluster
Once retailer receives , it will also obtain the proportion of retailer ’s consumers belonging to cluster by:
Particularly, retailer can directly know the specific number of retailer ’s consumers belonging to cluster by receiving in (4).
IiD3 Retailer ’s local load pattern of cluster
Once retailer has received in (3), (7) or (14), along with in hand, retailer can compute the local centroid of retailer in cluster by:
which will reveal the approximate load pattern of retailer . For example, we choose in (3) and in (4), then is essentially the mean of retailer ’s observations belonging to cluster , which can be considered as its approximate load pattern in cluster . The approximation lies in the fact that and are calculated using the global centroid in (2) in the last iteration, not the local centroid of retailer in the last iteration; otherwise it will be the exact load pattern based on retailer ’s data set.
Definition 2.3: We define the “privacy” of retailer () as the information set .
Iii PPAAC Algorithm
To achieve a distributed summation algorithm, this section first introduces an AAC algorithm with a fast convergence rate [2]. After that, we further improve the AAC algorithm by leveraging an exponentially decaying disturbance with zerosum property to propose a PPAAC algorithm. Finally, the convergence of the proposed algorithm is proved.
Iiia AAC Algorithm
The AAC algorithm is graphtheorybased. Therefore, we consider a graph consisting of the nodes and edges. Each node represents a retailer, and the edge between each pair of nodes means that there is bidirectional noisefree communication between two retailers. This graph is publicly known by all retailers. Denote the node set by and the edge set by . The neighborhood of retailer is represented by , and the degree of retailer is denoted by . Let be the Metropolis weight matrix with elements as follows [32]:
(20) 
In the AAC algorithm, each retailer has a state value that will be updated through iterations. Let be the state of retailer in the AAC algorithm, then the state update equation of the AAC algorithm in the th iteration is given by
(21) 
which is a convex combination of the value from the original AC algorithm and the predictor given respectively by
(22)  
(23) 
The matrix form of the update is given as follows:
(24)  
(25) 
where , and
is the identity matrix. We call
the accelerated Metropolis weight matrix.In this way, will converge to the mean of all retailers’ initial standardized state values
(26) 
with the fastest asymptotic worstcase convergence rate if the weighted coefficient equals the optimal value [2]:
(27) 
where
is the smallest eigenvalue of
, and is the second largest eigenvalue of . Since the graph is publicly known by all retailers, each retailer can easily compute using (20). Then, can be obtained by all retailers using (24).Note that the AAC algorithm is fully distributed, i.e., each retailer only needs to communicate with its neighbors. Besides, after convergence, retailers can obtain the summation of their initial standardized state values by multiplying the mean in (26) by . Thus, let be equal to , then each retailer can obtain in (18) in a fully distributed manner using the AAC algorithm. However, in the first iteration, retailer will send to its neighbors, which directly reveals the private information of retailer .
IiiB PPAAC Algorithm
To facilitate the AAC algorithm with privacypersevering characteristics, we utilize the exponentially decaying disturbance with zerosum property from [7] to mask the interactive state values among neighbors during the AAC iterations, so that each retailer cannot derive private information of the others.
The proposed PPAAC algorithm is defined by
(28) 
where is the state value masked by the disturbance as follows:
(29) 
The noise is randomly selected from by retailer , where , , and . This design leads to the two features of , which will be used for the following proof of Theorem 3.1:

The noise is exponentially decaying as and grows with the number of iterations. So is also exponentially decaying.

The disturbance has zerosum property, which means that if we sum up from to infinity (or to a relatively large number), the result will be close to 0, i.e.,
(30) 
Theorem 3.1: The proposed PPAAC algorithm in (28) will make each retailer’s state value converge to the average of all retailers’ initial state values, i.e., (26) still holds.
Proof: See Appendix.
Iv Privacypreserving Distributed Clustering Framework
This section describes the privacypreserving distributed clustering framework for kmeans, FCA, and GMM incorporating the proposed PPAAC algorithm. In addition, we provide the privacy and complexity analyses of the proposed framework.
Iva Clustering Framework
The idea of the clustering framework is that independent of the employed clustering method, in every iteration, each retailer first performs its local calculation according to (17); then each retailer sets its local result as the initial state of the proposed PPAAC algorithm; after convergence, each retailer obtains the global summation of all the local results in (18); finally, using the global summations, each retailer can perform the rest of the clustering method to update the global information, e.g., the centroids of all clusters. The detailed clustering framework is demonstrated in Algorithm 1.
IvB Privacy Analysis
As aforementioned, the AAC algorithm will directly reveal the initial value in the first iteration. On the contrary, in the first iteration of the proposed PPAAC algorithm, retailer () receives () instead of . Since is masked using independent disturbance by retailer , retailer cannot derive the original value of from , thus retailer will not know the private of its neighbors, protecting the private information of retailer . In the remaining iterations, the process of adding disturbance continues; meanwhile, begins to converge to the mean value in (26) and moves away from its initial value, which further masks the true initial value. Quantitative illustrations will be shown in the next section.
In addition, we should note that if and , i.e., retailer can receive all the information that retailer has received, including retailer ’s information, then retailer can deduce retailer ’s initial value even if the disturbance is introduced [7]. Therefore, the authors in [7] and [19] both consider it necessary to assume that retailer cannot receive all the information that retailer has. The assumption is also adopted in this paper. Since is publicly known by all retailers, retailer can tell that whether is a subset of its neighbor’s . If such a situation occurs, retailer can refuse to communicate with retailer . Therefore, the assumption will hold in practice.
IvC Complexity Analysis
For the distributed framework, we investigate each retailer’s computation and communication overhead.
The proposed clustering framework not only keeps all the multiplication calculations in the original clustering methods, but also introduces new multiplication calculations by integrating the proposed PPAAC algorithm. The multiplication calculations in the original clustering methods are divided by retailers according to their number of observations, i.e., if the computation overhead of the original clustering method is , then the overhead of retailer is . Moreover, in each iteration of the PPAAC algorithm, although the disturbance can be queried from the preset lookup table, retailer () still needs to compute and (), which requires multiplications. Let denote the iteration number of the selected clustering method, and represent the iteration number of the proposed AAC algorithm, then the computation overhead of retailer is . Take kmeans for example, where is , then retailer ’s overhead is . Please note that , because the number of retailers in a DN is small, and the proposed PPAAC algorithm’s convergence is accelerated, thus is generally also small. However, is thousands and . Moreover, we know that . Therefore, the computation overhead of retailer is significantly smaller than that of the centralized kmeans. Detailed illustrations is shown in the next section.
Besides, in each iteration of the proposed AAC algorithm, the communication number of retailer is [20]. Therefore, the communication overhead of retailer is .
V Case Study
Va Data Description and Experiment Setup
We utilize the smart meter data from Ireland for verification, which contains 509660 halfhourly daily electrical consumption observations of 1000 consumers [8]. The representative load profile (RLP) of each consumer is obtained via the method presented in [27]. Thus we get the union data set consisting of 1000 48dimensional RLPs. For the verification of the proposed PPAAC algorithm and the clustering framework, e.g., the correctness, the efficiency, the privacypreserving feature, and the effectiveness, we assume that there are 10 retailers in a DN, and each of them has access to 100 consumers. Their initial communication topology is shown in Fig. 1, where each retailer only communicates with its onehop neighbors, and retailer () cannot receive all the information that any of its neighbors has. We also use different topologies to investigate the trend of the computational cost of the proposed clustering framework with respect to different topologies. Besides, we set and for randomly selecting the disturbance. Meanwhile, the initial centroids for all clustering methods are randomly chosen.
VB Verification of the PPAAC algorithm
To verify the correctness and efficiency of the proposed PPAAC algorithm, we compare it with three algorithms: the original AC algorithm in [32], the AAC algorithm proposed in [2] and the PPAC algorithm proposed in [7]. We use the four algorithms to compute the summation of the observations from each retailer’s first consumer. We then illustrate the average error of all retailers relative to the accurate summation result. The errors of the four algorithms for each iteration are shown in Fig. 2. It can be observed that the average error of the proposed PPAAC algorithm converges to 0, indicating the correctness of this method. In addition, the proposed algorithm has the same convergence rate as the AAC algorithm. The PPAC algorithm also has the same convergence rate as the AC algorithm. Please note that the proposed algorithm converges faster than both the AC and the PPAC algorithm, indicating the efficiency of the proposed algorithm. Therefore, the correctness and efficiency of the proposed PPAAC algorithm are verified.
Compared to the AC algorithm and the AAC algorithm, the proposed algorithm also has the privacypreserving feature. To illustrate this feature, we provide the value that retailer 1 shares with its neighbors during the above summation calculation at each iteration. The shared values of the four algorithms are shown in Fig. 3.
These shared values all converge to the real average value, but we should note that retailer 1 shares its real initial value with its neighbors in the first iteration when performing the AC and the AAC algorithm, which directly reveals the private information of retailer 1. However, after introducing the disturbance for masking, the proposed algorithm enables retailer 1 to share its masked initial value to its neighbors, which is far away from the real one as indicated by the black arrow. Thus, the proposed algorithm protects the privacy of retailer 1. Moreover, the proposed algorithm still converges faster than the PPAC algorithm, even if they both start from the same masked initial point.
VC Verification of the Proposed clustering framework
We can employ the proposed clustering framework to obtain privacypreserving distributed kmeans, FCA, and GMM clustering methods. Then, we use them for load pattern identification on the distributed data sets. As benchmarks, we also use the centralized kmeans, FCA, and GMM for load pattern identification on the corresponding union data set.
To verify the correctness of the clustering framework, in Fig. 4, we use the Silhouette coefficient index (SCI) [13] to evaluate the above distributed and centralized algorithms for a different numbers of clusters. Note that the abbreviation ‘PPD’ in Fig. 4 represents ‘privacypreserving distributed’. This figure clearly shows that the SCI results of the proposed privacypreserving distributed algorithms are identical to those of the centralized algorithms. This means that the clustering results on the distributed data sets using the proposed clustering framework, are exactly the same as those on the union data set computed via the centralized methods, indicating the correctness of the proposed clustering framework.
To verify the effectiveness of the clustering framework, we choose kmeans for demonstration as it is a hard clustering method, which is very convenient for illustration. We use the most common way, i.e., the sum of squared errors (SSE), to find the optimal number of clusters [1]. From this, we find that the optimal cluster number of the union data set (1000 RLPs, i.e., 1000 consumers) is , while that of the data set of retailer 1 (100 RLPs) is . After that, we perform the centralized kmeans on retailer 1’s data set, and the results are shown in Fig. 5(a). Besides, we also perform the proposed privacypreserving distributed kmeans and the centralized kmeans on the union data set. The results are demonstrated in Fig. 5(b). The number of RLPs in each cluster is listed in the sub figure’s title. Meanwhile, the RLPs and the load patterns of retailer 1 are highlighted in Fig. 5(b) as well.
First, from Fig. 5(b), we can observe that the centroids of the proposed algorithm are coincident with the centroids of the centralized kmeans. Second, the two load patterns of retailer 1’s data set in Fig. 5(a), approximately match the 2nd and the 3rd load patterns of the union data set in Fig. 5(b). However, retailer 1 missed the remaining four categories of consumers. Certainly, if retailer 1 only uses its own two load patterns for tariff design, its products will be difficult to attract the 608 consumers in the remaining clusters. On the contrary, by the proposed clustering framework, each retailer can use the six load patterns of all consumers for tariff design to attract all of them. Therefore, the effectiveness of the proposed clustering framework is proven.
To verify the efficiency of the clustering framework, we provide the computational time and iteration numbers (INs) of the centralized and the privacypreserving distributed clustering methods. Note that for the distributed methods, the retailers’ computational times are different. Thus the maximum computational time of all retailers is chosen to represent the time of the distributed methods. Details are given in Table I. From this table, it is obvious that the iteration numbers of the corresponding centralized and distributed clustering methods are the same, but the computational times of the corresponding methods differ by an order of magnitude: the time consumed by each retailer in distributed clustering is significantly less than that of the centralized clustering, indicating the high efficiency of the proposed clustering framework.
Methods  Kmeans  PPD Kmeans  FCA  PPD FCA  GMM  PPD GMM 

Time  0.321  0.046  1.169  0.163  18.698  2.081 
IN  7  7  24  24  6  6 
Please note that the above computational time does not contain communication time. However, this time is probably negligible. In kmeans for example, each retailer shares its masked value to its neighbors, which consists of the masked and for . Thus each retailer actually shares 294 floatingpoint numbers with its neighbors, i.e., 1.15 kbytes. We know that , , and . Meanwhile, as shown in Fig. 2, and the degree of the retailer that consumes the most time is (retailer 1), which is also the maximum degree among the retailers. According to the communication overhead analysis in Section IVC, the maximum total amount of upstream data of all the retailers will be kbytes Mbytes. Since the global average broadband internet speed is 11.03Mbps, the actual maximum communication time for retailers will not exceed 0.1 seconds. This cost will be greatly reduced in Europe as it has the world’s highest concentration of countries with the fastest internet, e.g., Sweden’s average speed is 55.18Mbps [10].
VD Verification of Different Topologies
Although the computational time of retailers’ local calculation is not affected by the change of communication topology, different topologies directly affect the degree of retailers as well as the iteration numbers of the proposed PPAAC algorithm, resulting in a change in the computational time of the AAC algorithm, which in turn changes the time of the clustering framework. To investigate this trend, we randomly change the communication topology to obtain 9 topologies as shown in Fig. 6.
Then we measure the total execution time of the AAC algorithm part in the clustering framework for each retailer. Finally, we demonstrate the average time of all retailers when performing the AAC algorithm part in the clustering framework in Fig. 7. The average of the retailers and the average iteration numbers of the AAC algorithm under different topologies are also provided in Table II. Please note that, the assumption that retailer cannot receive all the information which its neighbors have received will cause the number of possible communication lines saturate quickly and result in only minor differences between different topologies. Thus we temporarily ignore this assumption to purely demonstrate the variation of cost for different topologies more clearly.
Topology  1  2  3  4  5  6  7  8  9 

Average  1.2  1.3  1.7  1.9  2.7  2.9  3.5  3.7  4.1 
Average iteration number  121  81  77  25  17  21  14  12  9 
Theoretically, the average computation overhead of the proposed AAC algorithm part is , where denotes the average of all retailers. From Table II, we know that although the average of the retailers increases with the number of the topology map shown in Fig. 6, the increase is much smaller than the decrease of iteration numbers, so the computational time in Fig. 7 is dominated by the iteration numbers. In fact, the worstcase measure of the proposed AAC algorithm’s asymptotic convergence rate is proportional to the spectral radius of matrix , where is the averaging matrix [2]. Since the convergence rate determines the iteration numbers, and the computational time is dominated by the iteration numbers, the trend of the computational time is coincident with the trend of the spectral radius. For verification, we also illustrate the variation in the spectral radius under the different topologies in Fig. 7. As we can see, the decreasing trends of the computational times and the spectral radius are the same.
Vi Conclusions
In this paper, we propose a privacypreserving distributed clustering framework, which can directly modify the traditional kmeans, FCA, and GMM clustering methods and provide privacypreserving distributed variants. To achieve this, we first performed commonality analysis of the three clustering methods, and pointed out that the key of the clustering framework lies in calculating the summation of the retailers’ private information in a fully distributed and privacypreserving way. Then we developed a PPAAC algorithm with proven convergence to achieve the summation. Finally, we presented the privacypreserving distributed clustering framework based on the proposed algorithm with theoretical privacy and complexity analyses.
The proposed PPAAC algorithm converges faster than the privacypreserving AC algorithm and the original AC algorithm. Besides, compared to the original AC algorithm and AAC algorithm, the proposed algorithm is privacypreserving by introducing the exponentially decaying disturbance with zerosum property into the shared information. The proposed clustering framework can enable each retailer to obtain the exact residential load pattern identification of all consumers instead of only its own consumers. Thus, this framework can support retailers design better tariff products to attract new users. Meanwhile, the clustering framework not only protects every retailer’s privacy, but also greatly reduces the computation overhead of each retailer compared to the centralized method. Moreover, under different communication topologies, the decreasing trends of the PPAAC part’s computational times and the spectral radius are the same.
Appendix
First, we need to prove that is doubly stochastic, i.e., that
(31) 
holds. Define
as a vector of all ones, then we have:
Since
is a doubly stochastic matrix proved in
[32], the following holds:Substitute into , we obtain
Similarly, we can obtain with the property that .
References
 [1] (2011) A clustering method combining differential evolution with the kmeans algorithm. Pattern Recognit. Lett. 32 (12), pp. 1613 – 1621. External Links: ISSN 01678655, Document Cited by: §VC.
 [2] (200904) Accelerated distributed average consensus via localized node state prediction. IEEE Trans. Signal Process. 57 (4), pp. 1563–1576. External Links: Document, ISSN Cited by: §I, §IIIA, §III, §VB, §VD.
 [3] (200605) Comparisons among clustering techniques for electricity customer classification. IEEE Trans. Power Syst. 21 (2), pp. 933–940. External Links: Document, ISSN Cited by: §I.
 [4] (2011) The impact of vehicletogrid on the distribution grid. Electr Pow Syst Res. 81 (1), pp. 185 – 192. External Links: ISSN 03787796, Document Cited by: §I.
 [5] (2002) Tools for privacy preserving distributed data mining. ACM Sigkdd Explorations Newsletter 4 (2), pp. 28–34. Cited by: §I.
 [6] (201608) Efficient and privacypreserving kmeans clustering for big data mining. In 2016 IEEE Trustcom/BigDataSE/ISPA, Vol. , pp. 791–798. External Links: Document, ISSN Cited by: §I.
 [7] (2019) Consensusbased dataprivacy preserving data aggregation. IEEE Trans. Autom. Control. (), pp. 1–1. External Links: Document, ISSN Cited by: §I, §IIIB, §IVB, §VB, Appendix.
 [8] (2012) Commission for energy regulation (cer) smart metering project.. Note: http://www.ucd.ie/issda/data/ commissionforenergyregulationcer/ Cited by: §VA.
 [9] (201110) Energy management and operational planning of a microgrid with a pvbased active generator for smart grid applications. IEEE Trans. Ind. Electron. 58 (10), pp. 4583–4592. External Links: Document, ISSN Cited by: §I.
 [10] (201902) Countries with the fastest internet in the world 2019. ATLAS and BOOTS.. External Links: Document, ISSN Cited by: §VC.
 [11] (2017) Corruptionresistant privacy preserving distributed em algorithm for modelbased clustering. In 2017 IEEE Trustcom/BigDataSE/ICESS, pp. 1082–1089. Cited by: §I.
 [12] (2016) A novel timeofuse tariff design based on gaussian mixture model. Appl. Energy. 162, pp. 1530 – 1536. External Links: ISSN 03062619, Document Cited by: §I.

[13]
(2004)
Selecting variables for kmeans cluster analysis by using a genetic algorithm that optimises the silhouettes
. Anal. Chim. Acta. 515 (1), pp. 87 – 100. External Links: ISSN 00032670, Document Cited by: §VC.  [14] (198203) Least squares quantization in pcm. IEEE Trans. Inf. Theory. 28 (2), pp. 129–137. External Links: Document, ISSN Cited by: §I, §IIA, §II.
 [15] (2018) PRIVACY preserving data mining using threshold based fuzzy cmeans clustering.. ICTACT Journal on Soft Computing 9 (1). Cited by: §I.
 [16] (200411) A comparative analysis of neural and fuzzy cluster techniques applied to the characterization of electric load in substations. In 2004 IEEE/PES Transmision and Distribution Conference and Exposition, Vol. , pp. 908–913. External Links: Document, ISSN Cited by: §I.
 [17] (200905) Security and privacy challenges in the smart grid. IEEE Secur Priv. 7 (3), pp. 75–77. External Links: Document, ISSN Cited by: §I.
 [18] (2012) Privacy preserving kmeans clustering: a survey research.. Int. Arab J. Inf. Technol. 9 (2), pp. 194–200. Cited by: §I.
 [19] (201702) Privacy preserving average consensus. IEEE Trans. Autom. Control. 62 (2), pp. 753–765. External Links: Document, ISSN Cited by: §IVB.
 [20] (2010) Communication complexity and energy efficient consensus algorithm. IFAC Proceedings Volumes. 43 (19), pp. 209 – 214. External Links: ISSN 14746670, Document Cited by: §IVC.
 [21] (2012) An efficient approach for privacy preserving distributed kmeans clustering based on shamir’s secret sharing scheme. In IFIP International Conference on Trust Management, pp. 129–141. Cited by: §I.
 [22] (2010) Databased method for creating electricity use load profiles using large amount of customerspecific hourly measured electricity use data. Appl. Energy. 87 (11), pp. 3538 – 3545. External Links: ISSN 03062619, Document Cited by: §I.
 [23] (2007) Privacy preserving kmeans clustering in multiparty environment.. In SECRYPT, pp. 381–385. Cited by: §I.
 [24] (201002) Statistical representation of distribution system loads using gaussian mixture model. IEEE Trans. Power Syst. 25 (1), pp. 29–37. External Links: Document, ISSN Cited by: §IIC, §II.
 [25] (201402) Enhanced load profiling for residential network customers. IEEE Trans. Power Del. 29 (1), pp. 88–96. External Links: Document, ISSN Cited by: §I.
 [26] (200705) Privacypreserving twoparty kmeans clustering via secure approximation. In 21st International Conference on Advanced Information Networking and Applications Workshops (AINAW’07), Vol. 1, pp. 385–391. External Links: Document, ISSN Cited by: §I.
 [27] (201705) Cvine copula mixture model for clustering of residential electrical load pattern data. IEEE Trans. Power Syst. 32 (3), pp. 2382–2393. External Links: Document, ISSN Cited by: §VA.
 [28] (200708) Twostage pattern recognition of load curves for classification of electricity customers. IEEE Trans. Power Syst. 22 (3), pp. 1120–1128. External Links: Document, ISSN Cited by: §I.
 [29] (2010) Efficient privacy preserving kmeans clustering. In PacificAsia Workshop on Intelligence and Security Informatics, pp. 154–166. Cited by: §I.
 [30] (2016Sep.) Clustering of electricity consumption behavior dynamics toward big data applications. IEEE Trans. Smart Grid. 7 (5), pp. 2437–2447. External Links: Document, ISSN Cited by: §I.
 [31] (2012) Analysis of parameter selections for fuzzy cmeans. Pattern Recognit. 45 (1), pp. 407 – 415. External Links: ISSN 00313203, Document Cited by: §II.
 [32] (200504) A scheme for robust distributed sensor fusion based on average consensus. In IPSN 2005. Fourth International Symposium on Information Processing in Sensor Networks, 2005., Vol. , pp. 63–70. External Links: Document, ISSN Cited by: §IIIA, §VB, Appendix.
 [33] (201708) Mutual privacy preserving means clustering in social participatory sensing. IEEE Trans. Ind. Informat. 13 (4), pp. 2066–2076. External Links: Document, ISSN Cited by: §I.
Comments
There are no comments yet.