Semi-supervised Gaussian mixture modelling with a missing-data mechanism in R

02/26/2023
by   Ziyang Lyu, et al.
0

Semi-supervised learning is being extensively applied to estimate classifiers from training data in which not all the labels of the feature vectors are available. We present gmmsslm, an R package for estimating the Bayes' classifier from such partially classified data in the case where the feature vector has a multivariate Gaussian (normal) distribution in each of the predefined classes. Our package implements a recently proposed Gaussian mixture modelling framework that incorporates a missingness mechanism for the missing labels in which the probability of a missing label is represented via a logistic model with covariates that depend on the entropy of the feature vector. Under this framework, it has been shown that the accuracy of the Bayes' classifier formed from the Gaussian mixture model fitted to the partially classified training data can even have lower error rate than if it were estimated from the sample completely classified. This result was established in the particular case of two Gaussian classes with a common covariance matrix. Here, we focus on the effective implementation of an algorithm for multiple Gaussian classes with arbitrary covariance matrices. A strategy for initialising the algorithm is discussed and illustrated. The new package is demonstrated on some real data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/25/2022

Some Simulation and Empirical Results for Semi-Supervised Learning of the Bayes Rule of Allocation

There has been increasing attention to semi-supervised learning (SSL) ap...
research
04/13/2020

Estimation of Classification Rules from Partially Classified Data

We consider the situation where the observed sample contains some observ...
research
07/09/2020

Structural Gaussian mixture vector autoregressive model

A structural version of the Gaussian mixture vector autoregressive model...
research
04/08/2021

Semi-Supervised Learning of Classifiers from a Statistical Perspective: A Brief Review

There has been increasing attention to semi-supervised learning (SSL) ap...
research
05/04/2017

Semi-supervised model-based clustering with controlled clusters leakage

In this paper, we focus on finding clusters in partially categorized dat...
research
11/09/2017

A random matrix analysis and improvement of semi-supervised learning for large dimensional data

This article provides an original understanding of the behavior of a cla...

Please sign up or login with your details

Forgot password? Click here to reset