Reference Distance Estimator

08/18/2013
by   Yanpeng Li, et al.
0

A theoretical study is presented for a simple linear classifier called reference distance estimator (RDE), which assigns the weight of each feature j as P(r|j)-P(r), where r is a reference feature relevant to the target class y. The analysis shows that if r performs better than random guess in predicting y and is conditionally independent with each feature j, the RDE will have the same classification performance as that from P(y|j)-P(y), a classifier trained with the gold standard y. Since the estimation of P(r|j)-P(r) does not require labeled data, under the assumption above, RDE trained with a large number of unlabeled examples would be close to that trained with infinite labeled examples. For the case the assumption does not hold, we theoretically analyze the factors that influence the closeness of the RDE to the perfect one under the assumption, and present an algorithm to select reference features and combine multiple RDEs from different reference features using both labeled and unlabeled data. The experimental results on 10 text classification tasks show that the semi-supervised learning method improves supervised methods using 5,000 labeled examples and 13 million unlabeled ones, and in many tasks, its performance is even close to a classifier trained with 13 million labeled examples. In addition, the bounds in the theorems provide good estimation of the classification performance and can be useful for new algorithm design.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/18/2012

Semi-Supervised learning with Density-Ratio Estimation

In this paper, we study statistical properties of semi-supervised learni...
research
05/28/2019

When can unlabeled data improve the learning rate?

In semi-supervised classification, one is given access both to labeled a...
research
04/17/2018

Reinforced Co-Training

Co-training is a popular semi-supervised learning framework to utilize a...
research
05/02/2022

Reducing the Cost of Training Security Classifier (via Optimized Semi-Supervised Learning)

Background: Most of the existing machine learning models for security ta...
research
12/13/2022

The Hateful Memes Challenge Next Move

State-of-the-art image and text classification models, such as Convoluti...
research
03/18/2020

Task-Adaptive Clustering for Semi-Supervised Few-Shot Classification

Few-shot learning aims to handle previously unseen tasks using only a sm...
research
02/29/2016

Clustering Based Feature Learning on Variable Stars

The success of automatic classification of variable stars strongly depen...

Please sign up or login with your details

Forgot password? Click here to reset