Semi-Supervised learning with Density-Ratio Estimation

04/18/2012
by   Masanori Kawakita, et al.
0

In this paper, we study statistical properties of semi-supervised learning, which is considered as an important problem in the community of machine learning. In the standard supervised learning, only the labeled data is observed. The classification and regression problems are formalized as the supervised learning. In semi-supervised learning, unlabeled data is also obtained in addition to labeled data. Hence, exploiting unlabeled data is important to improve the prediction accuracy in semi-supervised learning. This problems is regarded as a semiparametric estimation problem with missing data. Under the the discriminative probabilistic models, it had been considered that the unlabeled data is useless to improve the estimation accuracy. Recently, it was revealed that the weighted estimator using the unlabeled data achieves better prediction accuracy in comparison to the learning method using only labeled data, especially when the discriminative probabilistic model is misspecified. That is, the improvement under the semiparametric model with missing data is possible, when the semiparametric model is misspecified. In this paper, we apply the density-ratio estimator to obtain the weight function in the semi-supervised learning. The benefit of our approach is that the proposed estimator does not require well-specified probabilistic models for the probability of the unlabeled data. Based on the statistical asymptotic theory, we prove that the estimation accuracy of our method outperforms the supervised learning using only labeled data. Some numerical experiments present the usefulness of our methods.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2020

Semi-Supervised Learning: the Case When Unlabeled Data is Equally Useful

Semi-supervised learning algorithms attempt to take advantage of relativ...
research
07/24/2019

Discriminative Consistent Domain Generation for Semi-supervised Learning

Deep learning based task systems normally rely on a large amount of manu...
research
07/08/2019

Asymptotic Bayes risk for Gaussian mixture in a semi-supervised setting

Semi-supervised learning (SSL) uses unlabeled data for training and has ...
research
03/15/2012

Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario

Learning algorithms normally assume that there is at most one annotation...
research
02/15/2023

Are labels informative in semi-supervised learning? – Estimating and leveraging the missing-data mechanism

Semi-supervised learning is a powerful technique for leveraging unlabele...
research
05/01/2017

Towards well-specified semi-supervised model-based classifiers via structural adaptation

Semi-supervised learning plays an important role in large-scale machine ...
research
08/18/2013

Reference Distance Estimator

A theoretical study is presented for a simple linear classifier called r...

Please sign up or login with your details

Forgot password? Click here to reset