Addressing overfitting in spectral clustering via a non-parametric bootstrap
Finite mixture modelling is a popular method in the field of clustering and is beneficial largely due to its soft cluster membership probabilities. However, the most common algorithm for fitting finite mixture models, the EM algorithm, falls victim to a number of issues. We address these issues that plague clustering using finite mixture models, including convergence to solutions corresponding to local maxima and algorithm speed concerns in high dimensional cases. This is done by developing two novel algorithms that incorporate a spectral decomposition of the data matrix and a non-parametric bootstrap sampling scheme. Simulations show the validity of our algorithms and demonstrate not only their flexibility but also their ability to avoid solutions corresponding to local-maxima, when compared to other (bootstrapped) clustering algorithms for estimating finite mixture models. Our novel algorithms have a typically more consistent convergence criteria as well as a significant increase in speed over other bootstrapped algorithms that fit finite mixture models.
READ FULL TEXT