1 Introduction
Clustering is one of the most fundamental tasks in data mining and machine learning. It aims to divide a set of objects into homogeneous groups, by maximizing the similarity between objects in the same group and minimizing the similarity between objects in different groups
Aggarwal and Reddy (2014). As an active research topic, new approaches are constantly proposed, because the usage and interpretation of clustering depend on each particular application. Clustering is currently applied in a variety of fields, such as computer vision
Yang et al. (2019), communications Tam et al. (2019), biology Wang et al. (2020), and commerce Song et al. (2020). Based on the properties of clusters generated, clustering techniques can be classified as partitional clustering and hierarchical clustering
Han et al. (2011). Partitional clustering conducts onelevel partitioning on datasets. In contrast, hierarchical clustering conducts multilevel partitioning on datasets, in agglomerative or divisive way.Modelbased clustering is a classical and powerful approach for partitional clustering. It attempts to optimize the fit between the observed data and some mathematical model using a probabilistic approach, with the assumption that the data are generated by a mixture of underlying probability distributions. Many mixture models can be adopted to represent the data, among which the Gaussian mixture model (GMM) is by far the most commonly used representation
Melnykov and Maitra (2010). As a modelbased clustering approach, the GMM provides a principled statistical way to the practical issues that arise in clustering, e.g., how many clusters there are. Besides, the statistical properties also make it suitable for inference Fraley and Raftery (2002). The GMM has shown promising results in many clustering applications, ranging from image registration Ma et al. (2017), topic modeling Costa and Ortale (2019), traffic prediction Jia et al. (2019)Li et al. (2020).However, the GMM is limited to probabilistic (or fuzzy) partitions for datasets: it does not allow ambiguity or imprecision in the assignment of objects to clusters. Actually, in many applications, it is more reasonable to assign those objects in overlapping regions to a set of clusters rather than some single cluster. Recently, the notion of evidential partition Denœux and Masson (2004); Masson and Denœux (2008) was introduced based on the theory of belief functions Dempster (1967); Shafer (1976); Denœux (2016); Denœux et al. (2020). As a general extension of the probabilistic (or fuzzy), possibilistic, and rough partitions, it allows the object not only to belong to single clusters, but also to belong to any subsets of the frame of discernment that describes the possible clusters Denœux and Kanjanatarakul (2016). Therefore, the evidential partition provides more refined partitioning results than the other ones, which makes it very appealing for solving complex data clustering problems. Up to now, different evidential clustering algorithms have been proposed to build an evidential partition for object datasets. Most of these algorithms fall in the category of prototypebased clustering, including evidential means (ECM) Masson and Denœux (2008), and its variants such as constrained ECM (CECM) Antoine et al. (2012), median ECM (MECM) Zhou et al. (2015), et al. Besides, in Denœux et al. (2015), a decisiondirected clustering procedure, called EKNNclus, was developed based on the evidential nearest neighbor rule, and in Su and Denœux (2019), a beliefpeaks evidential clustering (BPEC) algorithm was developed by fast search and find of density peaks. Although the above mentioned algorithms can generate powerful evidential partitions, they are purely descriptive and unsuitable for statistical inference. A recent work for modelbased evidential clustering was proposed in Denœux (2020) by bootstrapping Gaussian mixture models (called bootGMM). This algorithm builds calibrated evidential partitions in an approximate way, but the high computational complexity in the procedures of bootstrapping and calibration limits its application to large datasets.
In this paper, we propose a new modelbased evidential clustering algorithm, called EGMM (evidential GMM), by extending the classical GMM in the belief function framework directly. Unlike the GMM, the EGMM associates a distribution not only to each single cluster, but also to sets of clusters. Specifically, with a mass function representing the cluster membership of each object, the evidential Gaussian mixture distribution composed of the components over the powerset of the desired clusters is proposed to model the entire dataset. After that, the maximum likelihood solution of the EGMM is derived via a specially designed ExpectationMaximization (EM) algorithm. With the estimated parameters, the clustering is performed by calculating the tuple evidential membership , which provides an evidential partition for the considered
objects. Besides, in order to determine the number of clusters automatically, an evidential Bayesian inference criterion (EBIC) is also presented as the validity index. The proposed EGMM is as convenient as the classical GMM that has no open parameter and does not require to fix the number of clusters in advance. More importantly, the proposed EGMM generates an evidential partition, which is more informative than a probabilistic partition.
The rest of this paper is organized as follows. Section 2 recalls the necessary preliminaries about the theory of belief functions and the Gaussian mixture model from which the proposal is derived. Our proposed EGMM is then presented in Section 3. In Section 4, we conduct experiments to evaluate the performance of the proposal using both synthetic and realworld datasets. Finally, Section 5 concludes the paper.
2 Preliminaries
We first briefly introduce necessary concepts about belief function theory in Section 2.1. The Gaussian mixture model for clustering is then recalled in Section 2.2.
2.1 Basics of the Belief Function Theory
The theory of belief functions Dempster (1967); Shafer (1976)
, also known as DempsterShafer theory or evidence theory, is a generalization of the probability theory. It offers a wellfounded and workable framework to model a large variety of uncertain information. In belief function theory, a problem domain is represented by a finite set
called the frame of discernment. A mass function expressing the belief committed to the elements of by a given source of evidence is a mapping function : , such that(1) 
Subsets such that are called the focal sets of the mass function . The mass function has several special cases, which represent different types of information. A mass function is said to be

Bayesian
, if all of its focal sets are singletons. In this case, the mass function reduces to the precise probability distribution;

Certain, if the whole mass is allocated to a unique singleton. This corresponds to a situation of complete knowledge;

Vacuous, if the whole mass is allocated to . This situation corresponds to complete ignorance.
Shafer Shafer (1976) also defined the belief and plausibility functions as follows
(2) 
represents the exact support to and its subsets, and represents the total possible support to . The interval can be seen as the lower and upper bounds of support to . The belief functions , and are in onetoone correspondence.
For decisionmaking support, Smets Smets (2005) proposed the pignistic probability to approximate the unknown probability in as follows
(3) 
where is the cardinality of set .
2.2 Gaussian Mixture Model for Clustering
Suppose we have a set of objects consisting of observations of a
dimensional random variable
. The random variable is assumed to be distributed according to a mixture of components (i.e., clusters), with each one represented by a parametric distribution. Then, the entire dataset can be modeled by the following mixture distribution(4) 
where is the set of parameters specifying the th component, and is the probability that an observation belongs to the th component (, and ).
The most commonly used mixture model is the Gaussian mixture model (GMM) Aggarwal and Reddy (2014); Melnykov and Maitra (2010); McLachlan et al. (2019)
, where each component is represented by a parametric Gaussian distribution as
(5) 
where is a
dimensional mean vector,
is a covariance matrix, and denotes the determinant of .The basic goal of clustering using the GMM is to estimate the unknown parameter from the set of observations . This can be done using maximum likelihood estimation (MLE), with the loglikelihood function given by
(6) 
The above MLE problem is well solved by the ExpectationMaximization (EM) algorithm Dempster et al. (1977), with solutions given by
(7) 
(8) 
(9) 
where
is the posterior probabilities given the current parameter estimations
as(10) 
With initialized parameters , , and , the posterior probabilities and the parameters update alternatively until the change in the loglikelihood becomes smaller than some threshold. Finally, the clustering is performed by calculating the posterior probabilities with the estimated parameters using Eq.(10).
3 EGMM: Evidential Gaussian Mixture Model for Clustering
Considering the advantages of belief function theory for representing uncertain information, we extend the classical GMM in belief function framework and develop an evidential Gaussian mixture model (EGMM) for clustering. In Section 3.1, the evidential membership is first introduced to represent the cluster membership of each object. Based on this representation, Section 3.2 describes how the EGMM is derived in detail. Then, the parameters in EGMM are estimated by a specially designed EM algorithm in Section 3.3. The whole algorithm is summarized and analyzed in Section 3.4. Finally, the determination of the number of clusters is further studied in Section 3.5.
3.1 Evidential Membership
Suppose the desired number of clusters is (). The purpose of the EGMM clustering is to assign to the objects in dataset soft labels represented by an tuple evidential membership structure as
(11) 
where , , are mass functions defined on the frame of discernment .
The above evidential membership modeled by mass function provides a general representation model regarding the cluster membership of object :
Example 1
Let us consider a set of objects with evidential membership regarding a set of classes . Mass functions for each object are given in Table 1. They illustrate various situations: the case of object corresponds to situation of probabilistic uncertainty ( is Bayesian); the class of object is known with precision and certainty ( is certain), whereas the class of sample is completely unknown ( is vacuous); finally, the mass function models the general situation where the class of object is both imprecise and uncertain.
0.2  0  0  0  
0.3  0  0  0.1  
0  0  0  0  
0.5  1  0  0.2  
0  0  0  0  
0  0  0  0.4  
0  0  1  0.3 
As illustrated in the above example, the evidential membership is a powerful model to represent the imprecise and uncertain information existing in datasets. In the following part, we will study how to derive a soft label represented by the evidential membership for each object in dataset given a desired number of clusters .
3.2 From GMM to EGMM
In the GMM, each component in the desired cluster set is represented by the following clusterconditional probability density:
(12) 
where is the set of parameters specifying the th component , . It means that any objet in set is drawn from one single cluster in .
Unlike the probabilistic membership in the GMM, the evidential membership introduced in the EGMM enables the object to belong to any subset of , including not only the individual clusters but also the metaclusters composed of several clusters. In order to model each evidential component (), we construct the following evidential clusterconditional probability density:
(13) 
where is the set of parameters specifying the th evidential component , .
Notice that different evidential components may be nested (e.g., , ), the clusterconditional probability densities are no longer independent. To model this correlation, we propose to associate to each component the mean vector of the average value of the mean vectors associated to the clusters composing as
(14) 
where denotes the cardinal of , and is defined as
(15) 
with being the indicator function.
As for the covariance matrix, the values for different components can be free. Some researchers also proposed different assumptions on the component covariance matrix in order to simplify the mixture model Banfield and Raftery (1993). In this paper, we adopt the following constant covariance matrix:
(16) 
where is an unknown symmetric matrix. This assumption results in clusters that have the same geometry but need not be spherical.
In the EGMM, each object is assumed to be distributed according to a mixture of components over the powerset of the desired cluster set, with each one defined as the evidential clusterconditional probability density in Eq. (13). Formally, the evidential Gaussian mixture distribution can be formulated as
(17) 
where
is called mixing probability, denoting the prior probability that the object was generated from
th component. Similar with the GMM, the mixing probabilities must satisfy , and .Remark 1
The above EGMM is an generalization of the classical GMM in the framework of belief functions. When the evidential membership reduces to the probabilistic membership, all the metacluster components are assigned zero prior probability, i.e., , . In this case, the evidential Gaussian mixture distribution , which is just the classical Gaussian mixture distribution.
In this formulation of the mixture model, we need to infer a set of parameters from the observations, including the mixing probabilities and the parameters for the component distributions . Considering the constraints for the mean vectors and covariance matrices indicated in Eqs. (14) and (16), the overall parameter of the mixture model is . If we assume that the objects in set are drawn independently from the mixture distribution, then we can obtain the observeddata loglikelihood of generating all the objects as
(18) 
In statistics, maximum likelihood estimation (MLE) is an important statistical approach for parameter estimation. The maximum likelihood estimate of is defined as
(19) 
which is the best estimate in the sense that it maximizes the probability density of generating all the observations. Different from the normal solutions of the GMM, the MLE of the EGMM is rather complicated as additional constraints (see Eqs. (14) and (16)) are imposed for the estimated parameters. Next, we will derive the maximum likelihood solution for the EGMM via a specially designed EM algorithm.
3.3 Maximum Likelihood Estimation via the EM algorithm
In order to use the EM algorithm to the solve the MLE problem for the EGMM in Eq. (19), we artificially introduce a latent variable to denote the component (cluster) label of each object , , with the form of an dimensional binary vector , where,
(20) 
The latent variable is independent and identically distributed (idd) according to a multinomial distribution of one draw from components with mixing probabilities . In conjunction with the observed data , the complete data are considered to be , . Then, the corresponding completedata loglikelihood can be formulated as
(21) 
The EM algorithm approaches the problem of maximizing the observeddata loglikelihood in Eq. (18) by proceeding iteratively with the above completedata loglikelihood . Each iteration of the algorithm involves two steps called the expectation step (Estep) and the maximization step (Mstep). The derivation of the EM solution for the EGMM is detailed in Appendix. Only the main equations are given here without proof.
As the completedata loglikelihood depends explicitly on the unobservable data , the Estep is performed on the socalled function, which is the conditional expectation of given , using the current fit for . More specifically, on the th iteration of the EM algorithm, the Estep computes
(22) 
where is the evidential membership of th object to th component, or responsibility of the hidden variable
, given the current fit of parameters. Using Bayes’ theorem, we obtain
(23) 
In the Mstep, we need to maximize the function to update the parameters:
(24) 
Different from the observeddata loglikelihood in Eq. (18), the logarithm of function in Eq. (22) works directly on the Gaussian distributions. By keeping the evidential membership fixed, we can maximize with respect to the involved parameters: the mixing probabilities of the components , the mean vectors of the singleclusters and the common covariance matrix . This leads to closedform solutions for updating these parameters as follows.

The mixing probabilities of the components , :
(25) 
The mean vectors of the singleclusters , :
(26) where is a matrix of size () composed of all the mean vectors, i.e., , is a matrix of size () defined by
(27) and is a matrix of size () defined by
(28)
This algorithm is started by initializing with guesses about the parameters . Then the two updating steps (i.e., Estep and Mstep) alternate until the change in the observeddata loglikelihood falls below some threshold . The convergence properties of the EM algorithm are discussed in detail in McLachlan and Krishnan (2007). It is proved that each EM cycle can increase the observeddata loglikelihood, which is guaranteed to convergence to a maximum.
3.4 Summary and Analysis
With the estimated parameters via the above EM algorithm, the clustering is performed by calculating the evidential membership , , , using Eq. (23). The computed tuple evidential membership provides an evidential partition of the considered objects. As indicated in Denœux and Kanjanatarakul (2016), the evidential partition provides a complex clustering structure, which can boil down to several alternative clustering structures including traditional hard partition, probabilistic (or fuzzy) partition Bezdek (1981); D’Urso and Massari (2019), possibilistic partition Krishnapuram and Keller (1993); Ferone and Maratea (2019), and rough partition Peters (2014, 2015). We summarize the EGMM for clustering in Algorithm 1.
Generality Analysis: The proposed EGMM algorithm provides a general framework for clustering, which boils down to the classical GMM when we constrain all the evidential components to be singletons, i.e., , . Compared with the GMM algorithm, the evidential one allocates for each object a mass of belief to any subsets of possible clusters, which allows to gain a deeper insight on the data.
Convergence Analysis: As indicated in Park and Ozeki (2009), the EM algorithm for mixture models takes many iterations to reach convergence, and reaches multiple local maxima starting from different initializations. In order to find a suitable initialization and speed up the convergence for the proposed EGMM algorithm, it is recommended to run the means algorithm Jain (2010) and choose the means of the clusters and the average covariance of the clusters for initializing and , respectively. As for the mixing probabilities , if no prior information is available, these values can be initialized equally as .
Complexity Analysis: For each object, the proposed EGMM algorithm distributes a fraction of the unit mass to each nonempty element of . Consequently, the number of parameters to be estimated is exponential in the number of clusters and linear in the number of objects . Considering that, in most cases, the objects assigned to elements of high cardinality are of less interpretability, in practice, we can reduce the complexity by constraining the focal sets to be composed of at most two clusters. By this way, the number of parameters to be estimated is drastically reduced from to .
3.5 Determining the Number of Clusters
One important issue arising in clustering is the determination of the number of clusters. This problem is often referred to as cluster validity. Most of the methods for GMM clustering usually start by obtaining a set of partitions for a range of values of (from to ) which is assumed to contain the optimal . The number of clusters is then selected according to
(30) 
where is the estimated parameter with clusters, and is some validity index. A very common criterion can be expressed in the form Aggarwal and Reddy (2014)
(31) 
where is the maximized mixture loglikelihood when the number of clusters is chosen as and is an increasing function penalizing higher values of .
Many examples of such criterion have been proposed for the GMM, including Bayesian approximation criteria, such as Laplasempirical criterion (LEC) McLachlan and Peel (2000), and Bayesian inference criterion (BIC) Fraley and Raftery (2002), and informationtheoretic criterion, such as minimum description length (MDL) Grünwald (2007), minimum message length (MML) Yatracos (2015), and Akaike’s information criterion (AIC) Charkhi and Claeskens (2018). Among these criterion, the BIC has given good results in a wide range of applications of modelbased clustering. For general mixture models, the BIC is defined as
(32) 
where is the number of independent parameters to be estimated in when the number of clusters is chosen as .
For our proposed clustering approach, we adopt the above BIC as the validity index to determine the number of clusters. For EGMM, the mixture loglikelihood is replaced by the evidential Gaussian mixture loglikelihood defined in Eq. (18), and the number of independent parameters in is replaced by that in . Consequently, the evidential version of BIC for EGMM is then derived as
(33) 
where , including independent parameters in the mixing probabilities , independent parameters in the mean vectors , and independent parameters in the common covariance matrix . This index has to be maximized to determine the optimal number of clusters.
4 Experiments
This section consists of two parts. In Section 4.1, some numerical examples are used to illustrate the behavior of the EGMM algorithm^{1}^{1}1The Matlab source code can be downloaded from https://github.com/jlm138/EGMM.. In Section 4.2, we compare the performance of our proposal with those of related clustering algorithms based on several real datasets.
4.1 Illustrative examples
In this subsection, we consider three numerical examples to illustrate the interest of the proposed EGMM algorithm for deriving evidential partition that better characterizes clustermembership uncertainty.
4.1.1 Diamond dataset
In the first example, we consider the famous Diamond dataset to illustrate the behavior of EGMM compared with the general GMM Aggarwal and Reddy (2014). This dataset is composed of 11 objets, as shown in Fig. 1. We first calculated the cluster validity indices by running the EGMM algorithm under different numbers of clusters. Table 2 shows the EBIC indices with the desired number of clusters ranging from 2 to 6. It can be seen that the maximum is obtained for clusters, which is consistent with our intuitive understanding for the partition of this dataset. Figs. 2 and 3 show the clustering results (with ) by GMM and EGMM, respectively. For the GMM, object 6, which lies at the cluster boundary, is assigned a high probability to cluster . But for our proposed EGMM, object 6 is assigned a high evidential membership to , which reveals that this point is ambiguous: it could be assigned either to or . In addition, the EGMM can find the approximate locations for both of the two cluster centers, whereas the GMM gives a biased estimation for the center location of cluster . This example demonstrates that the proposed EGMM is more powerful to detect those ambiguous objects, and thus can reveal the underlying structure of the considered data in a more comprehensive way.
Cluster number  2  3  4  5  6 

EBIC index  55.9  63.1  70.6  91.5  132.3 
4.1.2 Twoclass dataset
In the second example, a dataset generated by two Gaussian distributions is considered to demonstrate the superiority of the proposed EGMM compared with the prototypebased ECM Masson and Denœux (2008) and the modelbased bootGMM Denœux (2020), which are two representative evidential clustering algorithms developed in the belief function framework. This dataset is composed of two classes of 400 points, generated from Gaussian distributions with the same covariance matrix and different mean vectors, and , respectively. The dataset and the contours of the distributions are represented in Fig. 4 (a). We first calculated the cluster validity indices by running the EGMM algorithm under different numbers of clusters. Table 3 shows the EBIC indices with the desired number of clusters ranging from 2 to 6. It indicates that the number of clusters should be chosen as , which is consistent with the real class distributions. Figs. 4 (b)(d) show the clustering results (with ) by ECM, bootGMM and EGMM, respectively. It can be seen that, the ECM fails to recover the underlying structure of the dataset, which is because the Euclidan distancebased similarity measure can only discover hyperspherical clusters. The proposed EGMM accurately recovers the two underlying hyperellipsoid clusters thanks to the adaptive similarity measure derived via MLE. This example demonstrates that the proposed EGMM is more powerful to distinguish hyperellipsoid clusters with arbitrary orientation and shape than ECM. As for the bootGMM, it successfully recovers the two underlying hyperellipsoid clusters by fitting the model based on mixtures of Gaussian distributions with arbitrary geometries. However, it fails to detect those ambiguous objects lying at the cluster boundary via the hard evidential partition, as quite small evidential membership is assign to for these objects. By comparison, the proposed EGMM can automatically detect these ambiguous objects thanks to the mixture models constructed over the powerset of the desired clusters.
Cluster number  2  3  4  5  6 

EBIC index  3395.6  3413.1  3425.6  3443.7  3466.8 
4.1.3 Fourclass dataset
In the third example, a more complex dataset is considered to illustrate the interest of evidential partition obtained by the proposed EGMM. This dataset is composed of four classes of 200 points, generated from Gaussian distributions with the same covariance matrix and different mean vectors, , , , and , respectively. The dataset and the contours of the distributions are represented in Fig. 5 (a). We first calculated the cluster validity indices by running the EGMM algorithm under different numbers of clusters. Table 4 shows the EBIC indices with the desired number of clusters ranging from 2 to 6. Noting that the maximum is obtained for clusters, the underlying structure of the dataset is correctly discovered. Fig. 5 (b) shows the hard evidential partition result (represented by convex hull) and the cluster centers (marked by red cross) with . It can be seen that the four clusters are accurately recovered, and those points that lie at the cluster boundaries are assigned to the ambiguous sets of clusters. Apart from the hard evidential partition, it is also possible to characterize each cluster by two sets: the set of objects which can be classified as without any ambiguity and the set of objects which could possibly be assigned to Masson and Denœux (2008). These two sets and , referred to as the lower and upper approximations of , are defined as and , with denoting the set of objects for which the mass assigned to is highest. Figs. 5 (c) and (d) show the lower and upper approximations of each cluster, which provide a pessimistic and an optimistic clustering results, respectively. This example demonstrates that the evidential partition generated by the proposed EGMM is quite intuitive and easier to interpret than the numerical probabilities obtained by the GMM, and can provide much richer partition information than the classical hard partition.
Cluster number  2  3  4  5  6 

EBIC index  3648.5  3666.5  3635.2  3655.1  3679.9 
4.2 Real data test
In this subsection, we aim to evaluate the performance of the proposed EGMM based on several classical benchmark datasets from the UCI Machine Learning Repository Dua and Karra Taniskidou (2020), whose characteristics are summarized in Table 5. The clustering results obtained with proposed EGMM are compared with the following representative clustering algorithms:

HCM Jain (2010): hard means (function kmeans in the MATLAB Statistics toolbox).

FCM Bezdek (1981): fuzzy means (function fcm in the MATLAB Fuzzy Logic toolbox).

ECM Masson and Denœux (2008): evidential means (function ECM in the MATLAB Evidential Clustering package^{2}^{2}2Available at https://www.hds.utc.fr/~tdenoeux/dokuwiki/en/software).

GMM Aggarwal and Reddy (2014): general Gaussian mixture model without constraints on covariance (function fitgmdist in the MATLAB Statistics toolbox).

GMM (constrained) Aggarwal and Reddy (2014): Gaussian mixture model with constant covariance across clusters (function fitgmdist with ‘SharedCovariance’ = true).

bootGMM Denœux (2020): Calibrated modelbased evidential clustering by bootstrapping the most fitted GMM (function bootclus in the R package evclust).
For determining the number of clusters, the validity indices of modified partition coefficient (MPC) Davé (1996) and average normalized specificity (ANS) Masson and Denœux (2008) are used for FCM and ECM, respectively, and the classical BIC Fraley and Raftery (2002) is used for the three modelbased algorithms including GMM, GMM (constrained) and bootGMM.
Datasets  # Instances  # Features  # Classes 

Iris  150  4  3 
Knowledge  403  5  4 
Seeds  210  7  3 
Vehicle  846  18  4 
Wine  178  13  3 
To perform a fair evaluation of the clustering results, hard partitions are adopted for all the considered algorithms. For the three evidential clustering algorithms, hard partitions are obtained by selecting the cluster with maximum pignistic probability for each object. The following three common external criteria are used for evaluation Manning et al. (2008):

Purity: Purity is a simple and transparent evaluation measure. To compute purity, each cluster is assigned to the class which is most frequent in the cluster, and then the accuracy of this assignment is measured by counting the number of correctly assigned objects and dividing by . Formally,
(34) where is the set of partitioned clusters and is the set of actual classes.

NMI (Normalized Mutual Information): NMI is an informationtheoretical evaluation measure, which is defined as
(35) where and denote the operations of mutual information and entropy, respectively.

ARI (Adjusted Rand Index): ARI is a pair counting based evaluation measure, which is defined as
(36) where TP, TN, FP, FN denote true positive samples, true negative samples, false positive samples and false negative samples, respectively.
For each dataset, two cases of experiment were conducted. In the first case, the number of clusters was unknown and had to be determined based on the affiliated validity indices. For all the algorithms except HCM (which requires the number of clusters to be known in advance), the number of clusters was searched between 2 and 6. All algorithms were run 10 times, and the average estimated number of clusters was calculated for each algorithm. For evaluating the clustering performance, the average NMI and ARI were calculated for each algorithm using the corresponding estimated number of clusters. Note that the purity measure was not used here because it is severely affected by the number of clusters as indicated in Manning et al. (2008) (high purity is easy to achieve when the number of clusters is large). In the second case, the number of clusters was assumed to be known in advance. All algorithms were run 10 times using the correct number of clusters, and the average purity, NMI, and ARI were calculated for each algorithm.
Tables 610 show the clustering results of different algorithms on the five considered datasets. We can see that the proposed EGMM performed best for determining the number of clusters (obtaining the best estimation accuracy on four of the five datasets, except Vehicle), while the performance of other algorithms was generally unstable. By comparing the quality of the obtained partitions, as measured by the purity, NMI, and ARI, the proposed EGMM performed well for both cases of unknown and known number of clusters. When was assumed to be unknown, it obtained the best results for Iris, Knowledge, and Seeds, and obtains the second best results for the other two datasets. In the case where was known in advance, it obtained the best results for Knowledge, Seeds, and Vehicle, and obtains the second best results for the other two datasets. These results show the superiority of the proposed EGMM both in finding the number of clusters and clustering the data.
Measures  HCM  FCM  ECM  GMM  GMM (constrained)  bootGMM  EGMM  

–  4.11.45  2.00  2.00  4.50.85  2.00  3.40.52  
is unknown  NMI  –  0.700.01  0.570.05  0.730  0.770.06  0.730  0.870.05 
ARI  –  0.620.02  0.540  0.570  0.720.11  0.570  0.850.10  
Purity  0.870.07  0.890  0.890  0.890.15  0.870.14  0.970.01  0.930.09  
is fixed with 3  NMI  0.740  0.750  0.760  0.840.13  0.830.11  0.910.01  0.870.05 
ARI  0.700.10  0.730  0.730  0.810.21  0.780.21  0.920.01  0.850.10 
Measures  HCM  FCM  ECM  GMM  GMM (constrained)  bootGMM  EGMM  

–  4.91.10  2.00  2.50.53  2.50.85  3.00  3.80.63  
is unknown  NMI  –  0.290.02  0.330  0.360.12  0.020.02  0.390  0.430.05 
ARI  –  0.220.02  0.280  0.260.11  0.010  0.290  0.310.04  
Purity  0.570.05  0.510.01  0.510.01  0.620.02  0.380.10  0.490.01  0.630.04  
is fixed with 4  NMI  0.360.03  0.290.03  0.290.02  0.390.02  0.100.17  0.260.01  0.430.05 
ARI  0.250.03  0.230.03  0.230.03  0.280.02  0.070.14  0.210.01  0.310.04 
Measures  HCM  FCM  ECM  GMM  GMM (constrained)  bootGMM  EGMM  

–  4.71.06  5.40.52  2.00  3.10.31  4.00  3.00  
is unknown  NMI  –  0.610  0.580  0.600  0.780.07  0.590.01  0.800 
ARI  –  0.520  0.520  0.510  0.810.13  0.530.03  0.850  
Purity  0.890  0.900  0.890  0.840.09  0.920.09  0.890.01  0.950  
is fixed with 3  NMI  0.700.01  0.690  0.660  0.680.08  0.780.07  0.690.02  0.800 
ARI  0.710  0.720  0.720  0.650.11  0.810.13  0.720.02  0.850 
Measures  HCM  FCM  ECM  GMM  GMM (constrained)  bootGMM  EGMM  

–  4.11.45  6.00  5.10.57  4.70.95  6.00  5.90.32  
is unknown  NMI  –  0.180  0.120  0.190.03  0.150.04  0.350.01  0.240 
ARI  –  0.120  0.140  0.130.03  0.100.03  0.210.01  0.130  
Purity  0.440.01  0.450  0.400  0.430.03  0.410.03  0.440.01  0.450.18  
is fixed with 4  NMI  0.190.01  0.180  0.130  0.170.04  0.170.03  0.200.01  0.210.02 
ARI  0.120  0.120  0.120  0.120.03  0.100.02  0.160.01  0.140.02 
Measures  HCM  FCM  ECM  GMM  GMM (constrained)  bootGMM  EGMM  

–  4.11.45  6.00  2.00  5.10.88  4.00  3.80.42  
is unknown  NMI  –  0.360  0.350.01  0.580.07  0.810.03  0.950  0.820.05 
ARI  –  0.270  0.240  0.440.06  0.800.05  0.970  0.830.06  
Purity  0.690.01  0.690  0.690  0.870.12  0.840.13  0.980.01  0.850.13  
is fixed with 3  NMI  0.420.01  0.420  0.400  0.700.19  0.730.15  0.950.01  0.810.14 
ARI  0.360.02  0.350  0.350  0.700.19  0.690.20  0.970.01  0.750.21 
At the end of this section, we give an evaluation on the run time of the considered clustering algorithms. The computations were executed on a Microsoft Surface Book with an Intel(R) Core(TM) i56300U CPU @2.40 GHz and 8 GB memory. All algorithms were tested on MATLAB platform, except bootGMM which was tested on R platform. As both of MATLAB and R are script languages, their execution efficiency is nearly at the same level. Table 11 shows the average run time of different algorithms on the five considered datasets. It can be seen that the three evidential clustering algorithms (i.e., ECM, bootGMM, EGMM) generally cost more time than the nonevidential ones, mainly because more independent parameters are needed to be estimated in the evidential partition. Among these three evidential clustering algorithms, the proposed EGMM runs fastest. In particular, it shows obvious advantage over bootGMM, which costs a great deal of time in the procedures of bootstrapping and calibration.
Datasets  HCM  FCM  ECM  GMM  GMM (constrained)  bootGMM  EGMM 

Iris  0.004  0.004  0.085  0.012  0.008  7.610  0.029 
Knowledge  0.004  0.014  0.080  0.030  0.031  56.660  0.035 
Seeds  0.003  0.003  0.078  0.013  0.011  18.160  0.045 
Vehicle  0.006  0.022  4.134  0.048  0.034  167.790  2.255 
Wine  0.003  0.005  0.116  0.009  0.006  46.770  0.052 
5 Conclusions
In this paper, a new modelbased approach to evidential clustering has been proposed. It is based on the notion of evidential partition, which extends those of probabilistic (or fuzzy), possibilistic, and rough ones. Different from the approximately calibrated approach in Denœux (2020), our proposal generates the evidential partition directly by searching for the maximum likelihood solution of the new EGMM via EM algorithm. In addition, a validity index is presented to determine the number of clusters automatically. The proposed EGMM is so convenient that it has no open parameter and the convergence properties can also be well guaranteed. More importantly, the generated evidential partition provides a more complete description of the clustering structure than does the probabilistic (or fuzzy) partition provided by the GMM. Examples have shown that more meaningful partitions of the datasets can be obtained. We have also demonstrated the applicability of this approach to several real datasets, showing that the proposed method generally performs better than some other prototypebased and modelbased algorithms for finding a partition with unknown or known number of clusters.
As indicated in Banfield and Raftery (1993), different kinds of constraints can be imposed on the covariance matrices of the GMM, which results in a total of fourteen models with different assumptions on the shape, volume and orientation of the clusters. In our work, the commonly used model with equal covariance matrix is adopted to develop the evidential clustering algorithm. It is quite interesting to further study evidential versions of the GMM with other constraints. This research direction will be explored in future work.
Appendix A EM solution for the EGMM: The Estep
In the Estep, we need to derive the function, by computing the conditional expectation of observeddata loglikelihood given , using the current fit for , i.e.,
(37) 
By bringing the expression of the observeddata loglikelihood in Eq. (21) into the above formula, we have
(38) 
with
(39) 
Appendix B EM solution for the EGMM: The Mstep
In the Mstep, we need to maximize the function derived in the Estep with respect to the involved parameters: the mixing probabilities of the
Comments
There are no comments yet.