With the rapid development of economy and society, the electricity consumption has increased to 5802.00 billion kWh in 2015 in China, the 74% of which is provided by the thermal power plant [1, 2]. A large amount of greenhouse gases are emitted by the thermal power plant, aggravating the environmental pollution. It is becoming more pressing to increase energy efficiency and reduce emission. Analyzing the electricity consumption of the customers is an essential step in the development of demand-response strategies to curtail electricity demand and keep balance between supply and demand by load shifting, peak clipping, load profile reshaping [3, 4], and achieve energy efficiency improvement and emission reduction. The electricity consumption of the customers is recorded by the smart meter as household LCs and provides rich information of the electricity consumption behaviors and lifestyles of the customers . The knowledge, such as typical load profiles ( in short, TLP) extracted from household LCs, achieves a better understanding of occupancy behavior to infer habitual consumption patterns and identify potential energy efficiency and demand-response options, which in turn help network operators with demand-side management strategies to increase energy efficiency and reduce emissions .
The clustering methods, the most common technology of LC analyzing, is an unsupervised method to discover the unknown knowledge (such as the TLPs) in the data set by grouping the similar LCs into same sub-groups. Various clustering methods, such as K-means, fuzzy clustering, hierarchical clustering, and self-organizing maps, have been applied to LCs clustering[7, 8, 6, 9, 5]. To improve the performance of LC clustering, in the recent state-of-the-art work, the Dynamic Time Warping (DTW) is recently introduced to measure the distances between the LCs, and combine the K-medoids to cluster the LCs . Unfortunately, the previous LC clustering methods still have the following weakness.
It restrains the performance of the clustering methods in the way that each LC in the data set is treated as an individual time series and hence the inherent relationships among LCs are ignored.
Though the state-of-the-art clustering method that adopts the DTW  performs better than the LC clustering approach that used Euclidean distance as the distance measure, it is unable to obtain the cluster centers that represents the TLP in the common averaging method, and hence hampers the consumer electricity consumption pattern extraction.
Due to the significant volatility and uncertainty of the LCs [11, 10], the clustering methods inevitably results in either a lager number of clusters or huge variances within a cluster, which is unacceptable for the actual application needs that require the tradeoff between the number of clusters and variances within a cluster.
In this paper, we proposed an integrated approach to address the above issue. Our approach includes two integrated parts: a new clustering method incorporated with community detection (in short, CICD) to improve the LC clustering performance, and a best-cluster-number-determined approach to make the trade-off between variances within a cluster and the number of the clusters.
As shown in Figure 1, CICD consists of network construction, community detection, and typical load profile (TLP) extraction. First, we converts the LCs data set into a nearest neighbor network () using the distance measure DTW, characterizing both local and global inherent relationship between any pair or any groups of LCs. Second, we employ a modularity-based algorithm Louvain to synchronously optimize the local and global modularity, and obtain the optimal community partition, where a community represents a cluster. Third, we extract the centers, each of which represents a TLP, from clusters using the averaging method—DTW Barycenter Averaging—to obtain the typical electricity consumption patterns of the customers. Compared with the K-medoids&DTW clustering method in the , our method has significant improvement, using the metric of the common cluster validity indices.
For the best-cluster-number-determined approach, we firstly segments the cluster number into intervals. Secondly, for each interval , we select a value as the best cluster number based on the performance of CICD with the cluster number . Thus, we are able to construct a multi-layer TLP directory, a layer of which is a clustering result of CICD with the best cluster number in an interval. In each layer of the TLP directory, the variance within the cluster is small when the number of the the TLPs is large, and the variance within the cluster is large when the number of the the TLPs is small. In all the layers, the variance within the cluster decrease when the number of the TLPs increases. Thus, the TLP directory has the ability to provide the trade-off between variances within a cluster and the number of the cluster, and enables the researcher to assess the LC and the customer in different layers according to practical application requirements.
The paper is structured as follows. Section 2 summarizes the previous LC clustering methods. Section 3 introduces our innovative approach. Section 4 presents and discusses the results. Section 5 introduces the TLP construction method. Section 6 draws a concluding remark.
2 Related Work
Various methods are applied to the LCs clustering, such as K-means, self-organizing maps and hierarchical clustering. These common used methods are roughly grouped into four categories according to the clustering criterion: partitioning method, hierarchical method, density-based method and model-based method [12, 13, 8, 14, 15, 6, 16, 5, 9].
The partitioning methods is the most common used clustering technology in the LCs clustering, due to the simplicity and low time complexity. The partitioning methods initially select centroids, and iteratively update these points to optimize the cost function . These methods include K-means [17, 18, 19, 20, 21, 22, 23], K-medoids , and fuzzy C-means [24, 25, 26].
The hierarchical methods applied to the LCs clustering includes two types: agglomerative and division, the agglomerative of which is the most commonly used technology . For the agglomerative methods, each LC is firstly initialized to a cluster. Then, the two closest clusters are combined into a new cluster, thus reducing the number of clusters in . In addition,  employed the hierarchical clustering to merge the clusters generated by the K-means, and reduce the number of the clusters.
The density-based method consider the LC in high-density regions in space as clusters, and the ones in low-density regions as outliers or noise. In , a density-based method, which cluster the objects by fast search and find of density peaks algorithm , cluster the customers, using the daily load curves. In , the classic density-based method Density-Based Spatial Clustering of Applications with Nois is applied to cluster daily residential meter data.
, the finite mixture models Gaussian mixture models are applied to cluster the smart meter data. In[33, 34], the self-organizing maps method clusters the smart meter data.
In addition to the method above, some new methods, such as spectral clustering[35, 36], hierarchical K-means 
, Support Vector Clustering, and the iterative self-organizing data-analysis technique algorithm  have also been applied to the LCs clustering.
For the LCs clustering method above, each LC in the data set is treated as an individual time series, ignoring the inherent relationship between LCs, which restrain the performance of the clustering methods. The network is a powerful mechanism to characterize the time series data set that has been proved on the time series analysis . This paper fills the research gap mentioned above by converting the LCs data set into a network and clustering the LCs via the community detection in network, achieving the improvement over the state-of-the-art method.
In this section, we give a detailed description of our method, which consists of 4 steps: data preparation, network construction, community detection, center extraction.
3.1 Data preparation
The normalization is an indispensable step in the LCs clustering, as it shields the amplitude interference and make the user consumption pattern contained in the LCs to be easily identified. In our work, for a given daily LC , we obtained the normalized daily LC by the Equation 1.
3.2 Network Construction
3.2.1 Dynamic Time Warping (DTW)
The distance between LCs determine the formation of edges in the network. A most famous shape-based distance measure of the time series DTW is applied to measure the distance between LCs, since the demand response is focused on understanding the electricity consumption patters of the customer. The purpose of the DTW is finding the optimal comparison path between the time series and calculating the distance between the time series . For two given LCs and , the warping path , is obtained according to the following restrictions .
Boundary condition : and .
Monotonicity condition : and .
Continuity condition : and .
Warping window : .
Here, we assume that a typical consumer exhibits less than hour variation in time of electricity consumption. Customer electricity consumption data are collected every 15 minutes. Thus the warping window of the DTW is set to .
The minimum total cost of the warping path is equal to the distance between the LCs X and Y. And the is obtained by the Equation 2.
3.2.2 The conversion of the LC data set into network
The network is a powerful mechanism, and is able to represent the complex relationship between the LCs. A network consist of vertices and edges , where the is an edge that connects and . In this work, we converted the LCs data set into a nearest neighbor network (). Firstly, we assigned each LC in the original data set as a vertex. Secondly, the edge between two vectors was formed when the distance between them less than in the Equation 3.
Here, the is the mean distance of the vector and all of the other vectors, the is a parameter to adjust the number of the edge in the network. The higher of the is, the more edges the network contained and the more complex of the network is. In addition, the weight of the edge is determined by the Equation 4.
3.3 Community Detection
The community detection is a very important method that boost the finding of a-priori unknown modules in the network, hence attracting a lot of attention . In this work, a famous modularity-based algorithm Louvain is applied to extract the communities from the network, as it has the advantage in the computation time and the quality of the communities detection with the metric of the modularity in Equation 5 [43, 44, 42]. As shown in Algorithm 1, the Louvain iteratively extracted the communities from the network to find the optimal division.
Here, is the weight of the edge between vertex and , is the sum of the weights of the edges attached to , and is the community which is assigned to.
Here, the is the sum of the weights of the edges inside community , is the sum of the weights of the edges incident to the vertex inside , is the sum of the weights of the edges connect to vertex , is the sum of the weights of the edges from to the vertex in , and the is the sum of weights of the network . The is a parameter that adjust the number of the communities. Commonly, the smaller the is, the more communities is extracted from the network.
3.4 Center Extraction
The center of a LC cluster is a TLP which reflects the electricity consumption pattern, and it usually obtained by an averaging method. However, when the DTW is used as the distance measure, the LCs averaging is a difficult task, because it has to be consistent with the ability of DTW to realign sequences over time . In this paper, a averaging technology DTW Barycenter Averaging shown in Algorithm 2 is proposed by , and is applied to extract the TLPs from the clusters.
4 Results and discussion
To verify the efficiency of the method, the clustering method are validated on the same data set in the same task.
4.1 The cluster validity indices
It is significant for the clustering algorithm to select the indexes which reflect the performance of algorithms. According to the performance of each common cluster validity index on various data sets [46, 47], we selected the 5 cluster validity indices to evaluate the result of clustering: Davies-Bouldin index (DB) , VCN index (VCN, an improvement of Silhouette) , S_Dbw Index (S_Dbw) , Score function (SF) , and COP index (COP) . In addition, the , a common indexes to identify the demand response potential customers, is also considered to measure the consumer variability in our work . The smaller , and are, the higher and are, the higher cluster quality is.
4.2 Data Description
The data that validate the clustering algorithm is provided by the Pecan Street Inc. . It contains 22113 daily LCs from 351 households. The LCs are collected from the houses for the period of 63 days between July 6 2015 to September 6 2015, each of which is collected every 15 minutes. In order to obtain the baseline, the data set is also clustered by the state-of-the-art work (K-mediods & DTW) which is proposed in .
4.3 Cluster Performance Evaluation
The is the mean value of the index of clustering results in K-mediods & DTW. The is the mean value of the index of clustering results in CICD method. The index value is obtained when the CICD achieve the max improvement rate in K-mediods & DTW method. The is obtained when the CICD achieve the max improvement rate in CICD method.
With the metrics of , , , and , we evaluated the CICD quality. The smaller , and are, the higher and are, the higher cluster quality is. The Figure 2, 3, 4, 5, and 6 display the , , , , versus when our method is used to cluster the LCs. For the comparison, the results from K-medoids&DTW is also shown in the Figure 2, 3, 4, 5, and 6 after the cluster number of K-medoids&DTW is set the same as the one of our method that is obtained after the is set. In other words, the same means the same cluster number in the Figure 2, 3, 4, 5, and 6. When the cluster number are same, the results of CICD has the smaller , and , and the higher and , outperforming the K-mediods & DTW in the Figure 2, 3, 4, 5, and 6.
Furthermore, the CICD evidently has the significant improvement over the K-mediods & DTW in , , and . As shown in the Table 1, Comparing with the K-mediods & DTW method the CICD has significant improvement in all cluster validity indices. The CICD has the best improvement in the , the mean of the improvement is 184.13%, and the max improvement is 476.78%. The CICD has the lowest improvement in the , but the mean of the still has 36.45% improvement, and the max improvement is 53.82%. The , , , and shown that the max improvement is obtain when the cluster number is small and the is lager. In addition, the mean of of consumers improvement is 60.87%, and the max improvement is 68.87%.
It is concluded that the lager is, the more prominent the improvement of the CICD in , , , and are, compared with the K-mediods & DTW. The conclusion demonstrates that the CICD tend to cluster the LCs into less clusters. What’s more, the CICD have more lower average of consumers in Figure 7
, which indicates that the CICD classifies the consumer into more stable representative groups according to the study of the, which is important to predict individual energy consumption patterns and identify the potential options for the demand respond.
5 The TLP Directory Construction
Ideally, the LCs should be assigned into less cluster, and the LCs in the same cluster are sufficient similar (that mean the variances within a cluster is small), which provide the better understand of the electricity consumption pattern. Actually, it is inevitable that the variances within cluster increase as the number of the cluster decreases, when we applied the clustering technology to cluster the LCs, due to the volatile and uncertain of the electricity consumption behaviors data . In practice, the researcher usually make a tradeoff between cluster number and variances within a cluster according to the application requirement.
In this paper, we proposed a best cluster number determined approach to construct a multi-layers TLP directory. In each layer, the variance within the cluster is small when the number of the the TLPs is large, and the variance within the cluster is large when the number of the the TLPs is small. In all the layers, the variance within the cluster decrease when the number of the the TLPs increases. Thus, the researcher is allowed to assess LCs and the customer using an layer or the whole TLP directory according to the application requirement. In detail, we firstly change the of the Equation 6 to obtain different clustering results in CICD method and extract the TPLs. Secondly, for each clustering result, we calculate its cluster number and the which has been proved that has a excellent performance in determining the optimal number of clusters . Thirdly, we segment the cluster number into intervals. For each interval , we select the TLPs, which correspond to max , as a layer of the TLP directory.
In this work, we constructed a TLP directory with 3 layers. In order to select the number of clusters for every layer, we change the of the Equation 6 from to with the interval , and calculate the of the clustering result. As shown in the Figure 8, we obtain 3 max value, and each of which corresponds to the best number of the CICD in it’s interval. For the first layer, the cluster number is restricted in , and the best number of clusters 297 is obtained when , the mean variance of which is . For the second layer, the cluster number is restricted in , the best number of clusters is obtained when , the mean variance of which is . For the third layer, the cluster number is set in , the best number of clusters is obtained when , the mean variance of which is . Thus, a TLP directory with layers is obtained, the number of the TLPs is respectively , and in different layer. And each layer have different level of mean variances within cluster respectively. It is observed that the first layer has the maximum cluster number , but the layer has the minimum mean variances within cluster which mean that the LCs within a cluster are most similar to each other and the TLP of the cluster is most similar to the LCs within the cluster111The TLP is extracting from the LCs with a cluster to represent the cluster. Apparently, when the variance of cluster is small, the TLP is similar to all of the LCs within the cluster.. Conversely, the last layer has the minimum cluster number , but the layer has the maximum mean variances within cluster which mean that the LCs within a cluster are most differential to each other and the TLP of the cluster is most differential to the LCs within the cluster. The researcher is allowed to assess LCs and the customer using an layer or the whole TLP directory according to the application requirement.
This paper proposed an integrated approach to perform load curves (LC) clustering. First, we proposed a clustering approach incorporated with community detection to improve the performance of LC clustering, which includes network construction, community detection and typical load profile extraction. Second, we construct a multi-layer typical load profile (TLP) directory to make the trade-off between variances within a cluster and the number of the clusters. In terms of the metrics of five cluster validity indices(Davies-Bouldin index, VCN index, S_Dbw Index, Score function, and COP index), our method is validated to be effective, outperforming the state-of-the-art methods .
The smart grid network paradigm relies on the exploitation of smart meter data to improve customer experience, utility operations, and advance power management . Our future work will focus on how to react to demand-response by analyzing the TLP and the customers. In addition, the external factors (such as customer activity information) will be considered to obtain the features of the customers behaviors behind the LCs.
This work is supported by the Major Program of National Natural Science Foundation of China (Grant No. 61432006), National Key Research and Development Program of China (2016YFB1000600, 2016YFB1000601).
-  K. Zhou and S. Yang, “Understanding household energy consumption behavior: The contribution of energy big data analytics,” Renewable and Sustainable Energy Reviews, vol. 56, pp. 810–819, 2016.
-  N. B. of Statistics of China, “Annual data.” http://data.stats.gov.cn/easyquery.htm?cn=C01. Accessed February 4, 2017.
-  R. Alasseri, A. Tripathi, T. J. Rao, and K. Sreekanth, “A review on implementation strategies for demand side management (dsm) in kuwait through incentive-based demand response programs,” Renewable and Sustainable Energy Reviews, vol. 77, pp. 617–635, 2017.
-  H. Lund and E. Münster, “Management of surplus electricity-production from a fluctuating renewable-energy source,” Applied Energy, vol. 76, no. 1-3, pp. 65–74, 2003.
-  Y. Wang, Q. Chen, T. Hong, and C. Kang, “Review of smart meter data analytics: Applications, methodologies, and challenges,” IEEE Transactions on Smart Grid, 2018.
-  B. Yildiz, J. Bilbao, J. Dore, and A. Sproul, “Recent advances in the analysis of residential electricity consumption and applications of smart meter data,” Applied Energy, vol. 208, pp. 402–427, 2017.
-  G. Chicco, R. Napoli, and F. Piglione, “Comparisons among clustering techniques for electricity customer classification,” IEEE Transactions on Power Systems, vol. 21, no. 2, pp. 933–940, 2006.
-  S.-l. Yang, C. Shen, et al., “A review of electric load classification in smart grid environment,” Renewable and Sustainable Energy Reviews, vol. 24, pp. 103–110, 2013.
-  Y. Wei, X. Zhang, Y. Shi, L. Xia, S. Pan, J. Wu, M. Han, and X. Zhao, “A review of data-driven approaches for prediction and classification of building energy consumption,” Renewable and Sustainable Energy Reviews, vol. 82, pp. 1027–1047, 2018.
-  T. Teeraratkul, D. O’Neill, and S. Lall, “Shape-based approach to household electric load curve clustering and prediction,” IEEE Transactions on Smart Grid, 2017.
-  R. Li, F. Li, and N. D. Smith, “Multi-resolution load profile clustering for smart metering data,” IEEE Transactions on Power Systems, vol. 31, no. 6, pp. 4473–4482, 2016.
-  T. Räsänen, D. Voukantsis, H. Niska, K. Karatzas, and M. Kolehmainen, “Data-based method for creating electricity use load profiles using large amount of customer-specific hourly measured electricity use data,” Applied Energy, vol. 87, no. 11, pp. 3538–3545, 2010.
-  G. Chicco, “Overview and performance assessment of the clustering methods for electrical load pattern grouping,” Energy, vol. 42, no. 1, pp. 68–80, 2012.
-  S. Ramos, J. M. Duarte, F. J. Duarte, and Z. Vale, “A data-mining-based methodology to support mv electricity customers’ characterization,” Energy and Buildings, vol. 91, pp. 16–25, 2015.
-  L. G. Costacurta and M. A. Sanz-Bobi, “Application of clustering methods for discovering patterns of energy use in regional areas for the residential sector,” in PowerTech, 2017 IEEE Manchester, pp. 1–6, IEEE, 2017.
-  A. Rajabi, L. Li, J. Zhang, J. Zhu, S. Ghavidel, and M. J. Ghadi, “A review on clustering of residential electricity customers and its applications,” in Electrical Machines and Systems (ICEMS), 2017 20th International Conference on, pp. 1–6, IEEE, 2017.
-  J. Kwac, J. Flora, and R. Rajagopal, “Household energy consumption segmentation using hourly data,” IEEE Transactions on Smart Grid, vol. 5, no. 1, pp. 420–430, 2014.
J. D. Rhodes, W. J. Cole, C. R. Upshaw, T. F. Edgar, and M. E. Webber, “Clustering analysis of residential electricity demand profiles,”Applied Energy, vol. 135, pp. 461–471, 2014.
-  F. L. Quilumba, W.-J. Lee, H. Huang, D. Y. Wang, and R. L. Szabados, “Using smart meter data to improve the accuracy of intraday load forecasting considering customer behavior similarities.,” IEEE Trans. Smart Grid, vol. 6, no. 2, pp. 911–918, 2015.
-  A. Lavin and D. Klabjan, “Clustering time-series energy data from smart meters,” Energy efficiency, vol. 8, no. 4, pp. 681–689, 2015.
-  Y. Zhang, W. Chen, R. Xu, and J. Black, “A cluster-based method for calculating baselines for residential loads,” IEEE Transactions on smart grid, vol. 7, no. 5, pp. 2368–2377, 2016.
J. du Toit, R. Davimes, A. Mohamed, K. Patel, and J. Nye, “Customer segmentation using unsupervised learning on daily energy load profiles,”Journal of Advances in Information Technology Vol, vol. 7, no. 2, 2016.
A. Al-Wakeel, J. Wu, and N. Jenkins, “K-means based load estimation of domestic smart meter measurements,”Applied energy, vol. 194, pp. 333–342, 2017.
-  H. Yang, L. Zhang, Q. He, and Q. Niu, “Study of power load classification based on adaptive fuzzy c means,” Power System Protection and Control, vol. 16, no. 111-115, p. 2238, 2010.
-  T. G. Nikolaou, D. S. Kolokotsa, G. S. Stavrakakis, and I. D. Skias, “On the application of clustering techniques for office buildings’ energy and thermal comfort classification,” IEEE Transactions on Smart Grid, vol. 3, no. 4, pp. 2196–2210, 2012.
-  M. M. Selvam, R. Gnanadass, and N. Padhy, “Fuzzy based clustering of smart meter data using real power and thd patterns,” Energy Procedia, vol. 117, pp. 401–408, 2017.
-  P. R. Jota, V. R. Silva, and F. G. Jota, “Building load management using cluster and statistical analyses,” International Journal of Electrical Power & Energy Systems, vol. 33, no. 8, pp. 1498–1505, 2011.
-  Y. Wang, Q. Chen, C. Kang, and Q. Xia, “Clustering of electricity consumption behavior dynamics toward big data applications,” IEEE transactions on smart grid, vol. 7, no. 5, pp. 2437–2447, 2016.
-  A. Rodriguez and A. Laio, “Clustering by fast search and find of density peaks,” Science, vol. 344, no. 6191, pp. 1492–1496, 2014.
L. Jin, D. Lee, A. Sim, S. Borgeson, K. Wu, C. A. Spurlock, and A. Todd,
“Comparison of clustering techniques for residential energy behavior using
smart meter data,” in
AAAI workshops—artificial intelligence for smart grids and buildings, 2017.
W. Labeeuw and G. Deconinck, “Residential electrical load model based on mixture model clustering and markov models,”IEEE Transactions on Industrial Informatics, vol. 9, no. 3, pp. 1561–1569, 2013.
-  S. Haben, C. Singleton, and P. Grindrod, “Analysis and clustering of residential customers energy behavioral demand using smart meter data,” IEEE transactions on smart grid, vol. 7, no. 1, pp. 136–144, 2016.
-  G. Chicco, R. Napoli, and F. Piglione, “Application of clustering algorithms and self organising maps to classify electricity customers,” in Power Tech Conference Proceedings, 2003 IEEE Bologna, vol. 1, pp. 7–pp, IEEE, 2003.
-  F. McLoughlin, A. Duffy, and M. Conlon, “A clustering approach to domestic electricity load profile characterisation using smart metering data,” Applied energy, vol. 141, pp. 190–199, 2015.
-  R. J. Sánchez-García, M. Fennelly, S. Norris, N. Wright, G. Niblo, J. Brodzki, and J. W. Bialek, “Hierarchical spectral clustering of power grids,” IEEE Transactions on Power Systems, vol. 29, no. 5, pp. 2229–2237, 2014.
-  S. Lin, F. Li, E. Tian, Y. Fu, and D. Li, “Clustering load profiles for demand response applications,” IEEE Transactions on Smart Grid, pp. 1–1, 2017.
-  T.-S. Xu, H.-D. Chiang, G.-Y. Liu, and C.-W. Tan, “Hierarchical k-means method for clustering large-scale advanced metering infrastructure data,” IEEE Transactions on Power Delivery, vol. 32, no. 2, pp. 609–616, 2017.
-  M. Gavrilas, G. Gavrilas, and C. V. Sfintes, “Application of honey bee mating optimization algorithm to load profile clustering,” in Computational Intelligence for Measurement Systems and Applications (CIMSA), 2010 IEEE International Conference on, pp. 113–118, IEEE, 2010.
-  A. Mutanen, M. Ruska, S. Repo, and P. Jarventausta, “Customer classification and load profiling method for distribution systems,” IEEE Transactions on Power Delivery, vol. 26, no. 3, pp. 1755–1763, 2011.
-  L. N. Ferreira and L. Zhao, “Time series clustering via community detection in networks,” Information Sciences, vol. 326, pp. 227–242, 2016.
-  D. J. Berndt and J. Clifford, “Using dynamic time warping to find patterns in time series.,” in KDD workshop, vol. 10, pp. 359–370, Seattle, WA, 1994.
-  V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast unfolding of communities in large networks,” Journal of statistical mechanics: theory and experiment, vol. 2008, no. 10, p. P10008, 2008.
-  M. E. Newman, “Analysis of weighted networks,” Physical review E, vol. 70, no. 5, p. 056131, 2004.
-  M. E. Newman, “Modularity and community structure in networks,” Proceedings of the national academy of sciences, vol. 103, no. 23, pp. 8577–8582, 2006.
-  F. Petitjean, A. Ketterlin, and P. Gançarski, “A global averaging method for dynamic time warping, with applications to clustering,” Pattern Recognition, vol. 44, no. 3, pp. 678–693, 2011.
-  O. Arbelaitz, I. Gurrutxaga, J. Muguerza, J. M. Pérez, and I. Perona, “An extensive comparative study of cluster validity indices,” Pattern Recognition, vol. 46, no. 1, pp. 243–256, 2013.
-  S. Zhou and Z. Xu, “A novel internal validity index based on the cluster centre and the nearest neighbour cluster,” Applied Soft Computing, 2018.
-  D. L. Davies and D. W. Bouldin, “A cluster separation measure,” IEEE transactions on pattern analysis and machine intelligence, no. 2, pp. 224–227, 1979.
-  M. Halkidi and M. Vazirgiannis, “Clustering validity assessment: Finding the optimal partitioning of a data set,” in Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on, pp. 187–194, IEEE, 2001.
S. Saitta, B. Raphael, and I. F. Smith, “A bounded index for cluster
International Workshop on Machine Learning and Data Mining in Pattern Recognition, pp. 174–187, Springer, 2007.
-  I. Gurrutxaga, I. Albisua, O. Arbelaitz, J. I. Martín, J. Muguerza, J. M. Pérez, and I. Perona, “Sep/cop: An efficient method to find the best partition in hierarchical clustering based on a new cluster validity index,” Pattern Recognition, vol. 43, no. 10, pp. 3364–3373, 2010.
-  P. S. R. Consortium et al., “Dataport2017.” https://www.pecanstreet.org/, 2017. Accessed February 4, 2017.
-  R. Li, F. Li, and N. D. Smith, “Multi-resolution load profile clustering for smart metering data,” IEEE Transactions on Power Systems, vol. 31, no. 6, pp. 4473–4482, 2016.