Estimation of Squared-Loss Mutual Information from Positive and Unlabeled Data

10/15/2017
by   Tomoya Sakai, et al.
0

Capturing input-output dependency is an important task in statistical data analysis. Mutual information (MI) is a vital tool for this purpose, but it is known to be sensitive to outliers. To cope with this problem, a squared-loss variant of MI (SMI) was proposed, and its supervised estimator has been developed. On the other hand, in real-world classification problems, it is conceivable that only positive and unlabeled (PU) data are available. In this paper, we propose a novel estimator of SMI only from PU data, and prove its optimal convergence to true SMI. Based on the PU-SMI estimator, we further propose a dimension reduction method which can be executed without estimating the class-prior probabilities of unlabeled data. Such PU class-prior estimation is often required in PU classification algorithms, but it is unreliable particularly in high-dimensional problems, yielding a biased classifier. Our dimension reduction method significantly boosts the accuracy of PU class-prior estimation, as demonstrated through experiments. We also develop a method of independent testing based on our PU-SMI estimator and experimentally show its superiority.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2015

Direct Estimation of the Derivative of Quadratic Mutual Information with Application in Supervised Dimension Reduction

A typical goal of supervised dimension reduction is to find a low-dimens...
research
01/10/2019

A witness function based construction of discriminative models using Hermite polynomials

In machine learning, we are given a dataset of the form {(x_j,y_j)}_j=1^...
research
07/27/2022

Learning from Positive and Unlabeled Data with Augmented Classes

Positive Unlabeled (PU) learning aims to learn a binary classifier from ...
research
06/28/2016

Estimating the class prior and posterior from noisy positives and unlabeled data

We develop a classification algorithm for estimating posterior distribut...
research
03/25/2011

Sufficient Component Analysis for Supervised Dimension Reduction

The purpose of sufficient dimension reduction (SDR) is to find the low-d...
research
10/06/2015

Improved Estimation of Class Prior Probabilities through Unlabeled Data

Work in the classification literature has shown that in computing a clas...
research
09/26/2014

Beyond Maximum Likelihood: from Theory to Practice

Maximum likelihood is the most widely used statistical estimation techni...

Please sign up or login with your details

Forgot password? Click here to reset