Cluster analysis and outlier detection with missing data
A mixture of multivariate contaminated normal (MCN) distributions is a useful model-based clustering technique to accommodate data sets with mild outliers. However, this model only works when fitted to complete data sets, which is often not the case in real applications. In this paper, we develop a framework for fitting a mixture of MCN distributions to incomplete data sets, i.e. data sets with some values missing at random. We employ the expectation-conditional maximization algorithm for parameter estimation. We use a simulation study to compare the results of our model and a mixture of Student's t distributions for incomplete data.
READ FULL TEXT