Estimation of Classification Rules from Partially Classified Data

04/13/2020
by   Geoffrey J. McLachlan, et al.
8

We consider the situation where the observed sample contains some observations whose class of origin is known (that is, they are classified with respect to the g underlying classes of interest), and where the remaining observations in the sample are unclassified (that is, their class labels are unknown). For class-conditional distributions taken to be known up to a vector of unknown parameters, the aim is to estimate the Bayes' rule of allocation for the allocation of subsequent unclassified observations. Estimation on the basis of both the classified and unclassified data can be undertaken in a straightforward manner by fitting a g-component mixture model by maximum likelihood (ML) via the EM algorithm in the situation where the observed data can be assumed to be an observed random sample from the adopted mixture distribution. This assumption applies if the missing-data mechanism is ignorable in the terminology pioneered by Rubin (1976). An initial likelihood approach was to use the so-called classification ML approach whereby the missing labels are taken to be parameters to be estimated along with the parameters of the class-conditional distributions. However, as it can lead to inconsistent estimates, the focus of attention switched to the mixture ML approach after the appearance of the EM algorithm (Dempster et al., 1977). Particular attention is given here to the asymptotic relative efficiency (ARE) of the Bayes' rule estimated from a partially classified sample. Lastly, we consider briefly some recent results in situations where the missing label pattern is non-ignorable for the purposes of ML estimation for the mixture model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2019

On missing label patterns in semi-supervised learning

We investigate model based classification with partially labelled traini...
research
02/26/2023

Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R

Semi-supervised learning is being extensively applied to estimate classi...
research
03/03/2021

Maximum likelihood estimation for a general mixture of Markov jump processes

We estimate a general mixture of Markov jump processes. The key novel fe...
research
06/23/2014

Exact fit of simple finite mixture models

How to forecast next year's portfolio-wide credit default rate based on ...
research
03/25/2021

Margin-free classification and new class detection using finite Dirichlet mixtures

We present a margin-free finite mixture model which allows us to simulta...
research
10/25/2022

Some Simulation and Empirical Results for Semi-Supervised Learning of the Bayes Rule of Allocation

There has been increasing attention to semi-supervised learning (SSL) ap...

Please sign up or login with your details

Forgot password? Click here to reset