Using competency questions to select optimal clustering structures for residential energy consumption patterns

06/01/2020 ∙ by Wiebke Toussaint, et al. ∙ University of Cape Town Delft University of Technology 0

During cluster analysis domain experts and visual analysis are frequently relied on to identify the optimal clustering structure. This process tends to be adhoc, subjective and difficult to reproduce. This work shows how competency questions can be used to formalise expert knowledge and application requirements for context specific evaluation of a clustering application in the residential energy consumption sector.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Background and Previous Work

While cluster analysis is an established unsupervised machine learning technique, identifying the optimal set of clusters for a specific application requires extensive experimentation and domain knowledge. Cluster compactness and distinctness are two important attributes that characterise a good cluster set

(Sarle et al., 1990) and different metrics, such as the Mean Index Adequacy (MIA), Davies-Bouldin Index (DBI) and the Silhouette Index have been proposed to measure cluster compactness and distinctness. In practise, a combination of measures together with additional expert guidance and visual inspection of clustering results is often used during the experimental process to identify the best cluster set (Jin et al., 2017), (Dent et al., 2014). However, these qualitative approaches can be adhoc and time consuming, subjective and difficult to reproduce, and biased by the expert’s interpretation of the visual representation (Gogolou et al., 2019). This work shows how competency questions from the ontology engineering community can be used to guide cluster set selection for generating representative daily load profiles that are suitable for developing customer archetypes of residential consumers in South Africa.

A daily load profile describes the energy consumption pattern of a household over a 24 hour period. Representative daily load profiles (RDLPs) are indicative of distinct daily energy usage behaviour for different types of households. Customer archetypes are developed to represent groupings of energy users that consume energy in a similar manner. RDLPs have been well explored for generating customer archetypes for applications in long term energy modelling (Figueiredo et al., 2005), (McLoughlin et al., 2015)

. Traditionally, the most common approaches used for clustering load profiles are centroid-based approaches and variants of kmeans, self-organising maps (SOM) and hierarchical clustering

(Jin et al., 2017). For residential consumers the variable nature of individual households makes the interpretation of clustering results ambiguous (Swan and Ugursal, 2009), a challenge that is exacerbated in highly diverse, developing country populations, where economic volatility, income inequality, geographic and social diversity contribute to increased variability of residential energy demand (Heunis and Dekenah, 2014). Xu et al. (2017) have used pre-binning, which involves applying a two-stage clustering algorithm that first clusters load profiles by overall consumption and then by load shape, to improve clustering results for highly variable households spread across the United States. In addition to the general clustering metrics, Kwac et al. (2014) also propose the notion of entropy as a metric for capturing the variability of electricity consumption of a household. To evaluate the result of segmenting a large number of daily load profiles into interpretable consumption patterns, Xu et al. (2017) use peak overlap, percentage error in overall consumption and entropy as metrics.

In ontology engineering, competency questions are an established methodology used to specify the requirements of an ontology and to evaluate the extent to which a particular ontology meets these requirements (Grüninger and Fox, 1995). Brainstorming, expert interviews and consultation of established sources of domain knowledge are processes that can be used to identify competency questions (De Nicola et al., 2009). Informal competency questions can be expressed in natural language and connect a proposed ontology to its application scenarios, thus providing an informal justification for the ontology (Uschold and Gruninger, 1996). To our knowledge competency questions have not been used previously to evaluate clustering structures.

2 Data

The Domestic Electrical Load Metering Hourly (DELMH) (Toussaint, 2019) dataset contains 3 295 194 daily load profiles for 14 945 South African households over a period of 20 years from 1994 to 2014. The daily load profile

is a 24 element vector

representing the hourly consumption (measured in Amperes) of household on day . Each interval is labeled by the start time, such that captures interval 00:00:00 - 00:59:59. is the array of all daily load profile vectors for household , and (dim 3 295 848 24) is the array of all daily load profiles .


We can then use clustering to find an optimal clustering structure , given the dataset .

3 Developing Competency Questions

We used a combination of analysing existing standards and engagement with domain experts to formulate informal competency questions expressed in natural language. The Geo-based Load Forecasting Standard (4) contains manually constructed load profiles and guiding principles for load forecasting in South Africa. The competency questions were developed after analysis of this standard and continuous engagement with a panel of five industry experts. There were initial interviews with all experts to elicit the usage requirements. Preliminary competency questions were presented at a workshop with key stakeholders in the community. The final version of the competency questions incorporated the feedback from the stakeholders. The competency questions were then used to construct associated qualitative evaluation measures and a cluster scoring matrix that weights these measures to provide a qualitative ranking of cluster sets in terms of the application requirements.

The following five core competency questions were identified:

  1. Can the load shape and demand be deduced from clusters?

  2. Do clusters distinguish between low, medium and high demand consumers?

  3. Can clusters represent specific loading conditions for different day types and seasons?

  4. Can a zero-consumption profile be represented in the cluster set111This was deemed important for considering energy access in low income contexts, as households may go through periods where they cannot afford to buy electricity and thus have no consumption.?

  5. Is the number of households assigned to clusters reasonable, given knowledge of the sample population?

Based on these questions, we defined a good cluster set as having expressive clusters and being usable. An expressive cluster must convey specific information related to particular socio-economic and temporal energy consumption behaviour. A usable cluster set must represent energy consumption behaviour that makes sense in relation to the clustering context and that carries the necessary information to make it pertinent to domain users. Next we developed qualitative measures to assess the competency questions. They are explained briefly below and in detail in Appendix A.

Expressivity (from competency questions 2 and 3) requires that the RDLP of a cluster is representative of the energy consumption behaviour of the individual daily load profiles that are members of that cluster, as expressed by the mean consumption error of total and peak demand and the mean peak coincidence ratio. An expressive cluster must also have the ability to convey specific meaning, especially in contexts where populations are highly variable. Cluster entropy can be used as a measure to establish the information embedded in a cluster and thus its specificity. The lower the entropy, the more information is embedded in the cluster, the more specific (homogeneous) the cluster, the better the cluster. In a specific cluster all members share the same context, e.g. daily load profiles of low consumption households on Sundays in summer.

The characteristic of cluster usability was derived from competency questions 4 and 5. Question 4 requires a manual evaluation based on expert judgement and is evaluated as being either true, or false. Question 5 is calculated as the percentage of clusters whose membership exceeds a threshold value of 10490 members222The threshold was selected as a value approximately equal to 5% of households using a particular cluster for 14 days.. Additional considerations are that fewer clusters typically ease interpretation and are thus preferable to larger numbers of clusters. The maximum number of clusters should be limited to 220, based on population diversity and existing expert models which account for 11 socio-demographic groups, 2 seasons, 2 daytypes and 5 climatic zones.

3.1 Cluster Scoring Matrix

The cluster scoring matrix in Table 1 presents a summary of the attributes and competency questions, the corresponding evaluation measures and their weights. The weights are based on the relative importance that experts assigned to the measure. Experiments are ranked by performance in each measure, with a score of 1 indicating the best cluster set. A weighted score is then computed for each experiment by multiplying its rank with the corresponding measure’s weight, and summing over all measures. The lower the total score, the better the cluster set.

Attribute Qu. Evaluation measure Weight
usable 5 sensible count per cluster 2
4 zero-profile representation 1
expressive 1 mean consumption error total 6
representative 1 peak 6
1 mean peak coincidence 3
expressive 3 temporal entropy weekday 4
specific 3 monthly 4
2 demand entropy total daily 5
2 peak daily 5
Table 1: Cluster Scoring Matrix

4 Clustering Experiments and Results

Various clustering experiments were performed to find a set of clusters that symbolise the best RDLPs for

. The clustering process was set up as a typical data processing pipeline, using hourly daily load profiles from DELMH as input. Depending on the experiment, different pre-processing steps were performed. These include the selection of a pre-binning by average monthly consumption (AMC) or integral k-means, and retaining or dropping zero values. Each of the experiments was run with four different normalisation algorithms, and without normalisation. Algorithms were initialised with different parameter values to generate cluster sets with a range of membership sizes. Details on the algorithms, normalisation and pre-binning are provided in Appendix B.

4.1 Evaluation

Based on the experiment details defined in Table 2 in Appendix B, 2083 individual experiment runs were conducted across all parameters. Each run was first evaluated with traditional quantitative clustering metrics. To ease the quantitative evaluation process and allow for comparison across metrics, Mean Index Adequacy (MIA), the Davies-Bouldin Index (DBI) and Silhouette Index were combined into a Combined Index (CI) score. The top 10 ranked experiment runs based on the CI score are shown in Table 4 in Appendix C. The highest ranked experiments were then further evaluated with the cluster scoring matrix.

4.2 Qualitative Clustering Results

Table 5 in Appendix C summarises the scores and ranking produced by the cluster scoring matrix. The scores span a greater range of values than the CI scores and are grounded in interpretable measures, which makes the results more meaningful and eases the selection of the best experiment. While the top two runs lie only 8 points apart, they comfortably outperform the third best run, which has double the score. The potential of the qualitative evaluation measures is evident when contrasting the quantitative and qualitative results of exp. 5 (kmeans, zero-one) with those of exp. 8 (kmeans, unit norm). Exp. 5 (kmeans, zero-one) had the second best run based on the CI score but was ranked second last in the cluster scoring matrix. Exp. 8 (kmeans, unit norm) on the other hand only ranked ninth by quantitative score, yet convincingly claimed the top position based on qualitative measures.

Comparing the RDLPs in Figure 1 in Appendix C gives confidence in the reranking. Exp. 5 (kmeans, zero-one) has only 18 clusters; on average 2.125 clusters per bin. The five smallest clusters combined have fewer than 1500 member profiles and appear invisible in the bar chart of cluster size at the bottom of Figure 0(a). The ragged shapes of cluster 16, cluster 17 and cluster 18 are also an indication that very few profiles were aggregated in these RDLPs. Over half of all load profiles belong to only three clusters: cluster 5, cluster 6 and cluster 9. As a whole, the individual RDLPs lack distinguishing features and are neither expressive nor useful, making them poorly suited for creating customer archetypes.

Exp 8 (kmeans, unit norm) on the other hand has 59 clusters, varying between 2 and 15 clusters per bin. With the exception of cluster 33 which accounts for roughly 15% of all daily load profiles, cluster membership for the remaining clusters varies in a range from 15 000 to 100 000 members. Cluster 33 is one of only two clusters in its bin, which has a large bin membership in line with expectations given our sample population. Collectively, the individual RDLPs are expressive, featured and distinct, which promises that they will be useful for constructing customer archetypes.

5 Discussion and Conclusion

This work formalises competency questions, formulated in consultation with domain experts, as quantifiable, qualitative evaluation measures. The qualitative measures are summarised in a cluster scoring matrix which weights, ranks and compares the measures across clustering experiments. By combining traditional clustering metrics and qualitative evaluation measures, clustering structures with good compactness and distinctness are thus ranked by their usability and expressivity, which guides our selection of a clustering structure that is useful for our intended application of creating customer archetypes in the residential energy sector in South Africa.

The cluster scoring matrix eases the scoring and ranking of experiments, while also making the reliance on expert validation explicit and repeatable. It clearly indicates that of the top 10 experiments, unit norm normalisation and pre-binning produced the most expressive and usable clusters. While the best experiment was pre-binned with integral kmeans, pre-binning by average monthly consumption produced comparable scores. The difference in scores between the two pre-binning approaches was strongly influenced by the weights assigned to different evaluation measures and the threshold determining the minimum cluster membership. These are subjective constraints determined by our application context. In a different application, they may be set differently. The cluster scoring matrix could be improved by making it less susceptible to weight and threshold changes, as well as the ranking method. A limitation of the work is that we used well established clustering techniques and have not tested more recent clustering algorithms and dynamic time warping.

Our work presents a novel application of machine learning in the energy domain in South Africa, with potential for application in other developing country contexts. The approach shows promise for generating clusters that are useful for application in a real-world, long-term energy planning scenario and demonstrates the use of cluster analysis techniques for building real world systems.


This research was funded in part by the South African Centre for Artificial Intelligence Research (CAIR).


  • A. De Nicola, M. Missikoff, and R. Navigli (2009) A software engineering approach to ontology building. Inf. Syst. 34 (2), pp. 258–275. External Links: Document, ISBN 0306-4379, ISSN 03064379 Cited by: §1.
  • I. Dent, T. Craig, U. Aickelin, and T. Rodden (2014) Variability of behaviour in electricity load profile clustering; Who does things at the same time each day?. Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics) 8557 LNAI, pp. 70–84. External Links: Document, arXiv:1409.1043v1, ISBN 9783319089751, ISSN 16113349 Cited by: §1.
  • V. Figueiredo, F. Rodrigues, Z. Vale, and J. B. Gouveia (2005) An electric energy consumer characterization framework based on data mining techniques. IEEE Trans. Power Syst. 20 (2), pp. 596–602. External Links: Document, ISBN 0885-8950, ISSN 08858950 Cited by: §1.
  • [4] (2012) Geo-based Load Forecast Standard. Technical report Technical Report June 2012, Eskom, Johannesburg. Cited by: §3.
  • A. Gogolou, T. Tsandilas, T. Palpanas, and A. Bezerianos (2019) Comparing similarity perception in time series visualizations. IEEE Transactions on Visualization and Computer Graphics 25 (1), pp. 523–533. External Links: Document, ISSN Cited by: §1.
  • M. Grüninger and M. S. Fox (1995) The role of competency questions in enterprise engineering. In Benchmarking — Theory and Practice, A. Rolstadås (Ed.), pp. 22–31. External Links: ISBN 978-0-387-34847-6, Document, Link Cited by: §1.
  • S. Heunis and M. Dekenah (2014) Manual for Eskom Distribution Pre- Electrification Tool ( DPET ). Eskom Holdings Limited, Johannesburg. Cited by: §1.
  • L. Jin, D. Lee, A. Sim, S. Borgeson, K. Wu, C. A. Spurlock, and A. Todd (2017) Comparison of Clustering Techniques for Residential Energy Behavior Using Smart Meter Data. AAAI Work. Artif. Intell. Smart Grids Smart Build., pp. 260–266. Cited by: §1, §1, footnote 4.
  • J. Kwac, J. Flora, and R. Rajagopal (2014) Household energy consumption segmentation using hourly data. IEEE Trans. Smart Grid 5 (1), pp. 420–430. External Links: Document, ISBN 1949-3053, ISSN 19493053 Cited by: §1.
  • F. McLoughlin, A. Duffy, and M. Conlon (2015) A clustering approach to domestic electricity load profile characterisation using smart metering data. Appl. Energy 141, pp. 190–199. External Links: Document, ISBN 0306-2619, ISSN 03062619, Link Cited by: §1.
  • S. K. Morley (2016) Alternatives to accuracy and bias metrics based on percentage errors for radiation belt modeling applications. 01. External Links: Document, Link Cited by: §A.1.
  • W. S. Sarle, A. K. Jain, and R. C. Dubes (1990) Algorithms for Clustering Data. Vol. 32. External Links: Document, tesxx, ISBN 013022278X, ISSN 00401706, Link Cited by: §1.
  • L. G. Swan and V. I. Ugursal (2009) Modeling of end-use energy consumption in the residential sector: A review of modeling techniques. Renew. Sustain. Energy Rev. 13 (8), pp. 1819–1835. Note: excellent overview of residential sector energy modelling! External Links: Document, ISBN 1364-0321, ISSN 13640321 Cited by: §1.
  • W. Toussaint (2019) Domestic electrical load metering, hourly data 1994-2014. version 1. DataFirst. External Links: Document, Link Cited by: §2.
  • M. Uschold and M. Gruninger (1996) Ontologies: principles, methods and applications. Knowledge Engineering Review 11, pp. 93–136. Cited by: §1.
  • S. Xu, E. Barbour, and M. C. González (2017) Household Segmentation by Load Shape and Daily Consumption. Proc. of. ACM SigKDD 2017 Conf., pp. 1–9. External Links: Document, ISBN 1234567245, Link Cited by: §B.3.2, §1.

Appendix A Qualitative Evaluation Measures

We use to denote a single cluster in clustering structure . The score of a qualitative measure for cluster set is the mean of the scores of all clusters with more than 10490 members. Clusters with a small member size were excluded when calculating mean measures, as they tend to overestimate the performance of poor clusters. Individual cluster performance is weighted by cluster size to account for the overall effect that a particular cluster has on the set.

a.1 Mean Consumption Error

The total daily demand and peak daily demand for an actual daily load profile and a predicted cluster representative daily load profile are given by the equations below:

and (4)
and (5)

Four mean error metrics are calculated to characterise the extent of deviation between the total and peak demand of a cluster, and those of its member profiles. Mean absolute percentage error (MAPE) and median absolute percentage error (MdAPE) are well known error metrics. The median log accuracy ratio (MdLQ) overcomes some of the drawbacks of the absolute percentage errors Morley (2016)

as the log-transformation tends to induce symmetry in positively skewed distributions, thus reducing bias. Interpreting MdLQ is not intuitive, a problem overcome by the median symmetric accuracy (MdSymA) which can be interpreted as a percentage error, similar to MAPE. Peak and total consumption errors can be calculated using the same formulae and are equivalent to the corresponding demand errors.

The consumption error measures are calculated for , where are all assigned to .

Absolute Percentage Error
Median Log Accuracy ratio
Median Symmetric Accuracy

a.2 Mean Peak Coincidence Ratio

For each daily load profile the peaks are identified as all those values that are greater than half the maximum daily load profile value. The python package peakutils was used to extract the peak values and peak times for all daily load profiles and all representative daily load profiles.


where and . The mean peak coincidence ratio for a single cluster is a value between 0 and 1 that represents the ratio of mean peak coincidence to the count of peaks in cluster . The magnitude of the peak is not taken into account in calculating the mean peak coincidence ratio. The mean peak coincidence (denoted as MPC) was calculated from the intersection of the actual and cluster peak times for all assigned to :


a.3 Entropy as a Measure of Cluster Specificity

Entropy H is used to quantify the specificity of clusters and is calculated as follows:


Here are the values of a feature and

is the probability that daily load profiles with value

for feature are assigned to cluster . For example, expresses the specificity of a cluster with regards to day of the week, with and , where is the likelihood that daily load profiles that are used on a Sunday are assigned to cluster .

To calculate peak and total daily demand entropy, we created percentile demand bins. Thus the values of feature are and is the likelihood that daily load profiles with peak demand corresponding to that of the 60th peak demand percentile are assigned to cluster .

Appendix B Clustering Experiments

We implemented our experiments in python 3.6.5 using k-means algorithms from scikit-learn (0.19.1) and self-organising maps from the SOMOCLU (1.7.5) libraries333The codebase is available online at

Table 2 summarises the algorithms, parameters and pre-processing steps for each experiment, with indicating that zero consumption values were retained in the input dataset.

Exp. Algorithm Parameters Pre-bin Zeros
1 kmeans True
2 kmeans True
SOM True
SOM+kmeans True
3 kmeans False
SOM False
SOM+kmeans False
4 kmeans AMC True
SOM+kmeans AMC True
5 kmeans AMC True
SOM+kmeans AMC True
6 kmeans AMC False
7 kmeans integral kmeans True
8 kmeans integral kmeans False
Table 2: Experiment details

b.1 Clustering Algorithms

An experiment run takes input array to produce cluster set and predict a cluster for each normalised daily load profile of household observed on day . Variations of kmeans, self-organising maps (SOM) and a combination of the two algorithms were implemented to cluster . The kmeans algorithm was initialised with a range of clusters. The SOM algorithm was initialised as a square map with dimensions for in range . Combining SOM and kmeans first creates a map, which acts as a form of dimensionality reduction on . For each , kmeans then clusters the map into clusters. The mapping only makes sense if is greater than . and are the algorithm parameters.

b.2 Normalisation

The table below lists the normalisation techniques applied.

Normalisation Equation Comments
Unit norm Scales input vectors individually to unit norm
De-minning Subtracts daily min. demand from each hourly value, then divides each value by deminned daily total 444proposed by Jin et al. (2017)

Scales all values to a range [0, 1]; retains profile shape but is very sensitive to outliers.

555also known as min-max scaler
SA norm Normalises all input vectors to mean of 1; retains profile shape but very sensitive to outliers. 666introduced as a comparative measure, as it is frequently used by South African domain experts
Table 3: Data normalisation algorithms and descriptions

b.3 Pre-binning

b.3.1 Pre-binning by average monthly consumption (AMC)

To pre-bin by average monthly consumption, we selected 8 expert-approved bin ranges based on South African electricity tariff ranges. The average monthly consumption (AMC) for household over one year is:


All the daily load profiles, of household were assigned to one of 8 consumption bins based on the value of . Individual household identifiers were removed from after pre-binning.

b.3.2 Pre-binning by integral k-means

Pre-binning by integral k-means is a data-driven approach that draws on the work of Xu et al. (2017). For the simple case where represents hourly values, pre-binning by integral k-means followed these steps:

  1. Construct a new sequence from the cumulative sum of profile normalised with unit norm

  2. Append to – this ensures that both peak demand and relative demand increase are taken into consideration

  3. Gather all features in array and remove individual household identifiers

  4. Use the kmeans algorithm to cluster into bins, corresponding to the number of bins created for AMC pre-binning

Appendix C Cluster Evaluation

c.1 CI Score and Quantitative Results

To ease the quantitative evaluation process and allow for comparison across metrics, Mean Index Adequacy (MIA), Davies-Bouldin Index (DBI) and the Silhouette Index were combined into a Combined Index (CI) score. is an interim score that computes the product of the DBI, MIA and inverse Silhouette Index. The CI is the log of the weighted sum of across all experiment bins. A lower CI is desirable and an indication of a better clustering structure. The logarithmic relationship between and the CI means that the CI is negative when is between 0 and 1, 0 when and greater than 0 otherwise. For experiments with pre-binning, the experiment with the lowest score in each bin is selected, as it represents the best clustering structure for that bin. For experiments without pre-binning, and . Table 4 shows the top ten experiments based on CI score.

# CI DBI MIA Sil. Exp. Alg. m Norm.
1 2.282 2.125 0.438 0.095 2 kmeans 47 unit
2 2.289 1.616 1.220 0.262 5 kmeans 17 zero-one
3 2.296 1.616 1.220 0.260 4 kmeans 17 zero-one
4 2.301 2.152 0.485 0.119 6 kmeans 82 unit
5 2.316 2.115 0.447 0.093 2 kmeans 35 unit
6 2.320 2.199 0.486 0.121 5 kmeans 71 unit
7 2.349 2.152 0.481 0.143 7 kmeans 49 unit
8 2.351 2.189 0.434 0.090 2 kmeans 50 unit
9 2.354 2.111 0.476 0.128 8 kmeans 59 unit
10 2.355 2.173 0.453 0.093 2 kmeans 32 unit
Table 4: Top 10 runs ranked by CI score

c.2 Experiments Ranked by Qualitative Score

# Score Exp. Norm. Pre-binning Zeros
1 57.0 8 unit integral kmeans False
2 65.0 5 unit AMC True
3 117.5 6 unit AMC False
4 143.5 7 unit integral kmeans True
5 150.0 2 unit True
6 205.0 5 zero-one AMC True
7 208.0 4 zero-one AMC True
Table 5: Top runs ranked by qualitative scores

c.3 Comparison of Two Clustering Experiments


(a) RDLPs of exp. 5 (kmeans, zero-one)


(b) RDLPs of exp. 8 (kmeans, unit norm)
Figure 1: Comparison of RDLPs of clustering experiments