Condition Number Analysis of Kernel-based Density Ratio Estimation
The ratio of two probability densities can be used for solving various machine learning tasks such as covariate shift adaptation (importance sampling), outlier detection (likelihood-ratio test), and feature selection (mutual information). Recently, several methods of directly estimating the density ratio have been developed, e.g., kernel mean matching, maximum likelihood density ratio estimation, and least-squares density ratio fitting. In this paper, we consider a kernelized variant of the least-squares method and investigate its theoretical properties from the viewpoint of the condition number using smoothed analysis techniques--the condition number of the Hessian matrix determines the convergence rate of optimization and the numerical stability. We show that the kernel least-squares method has a smaller condition number than a version of kernel mean matching and other M-estimators, implying that the kernel least-squares method has preferable numerical properties. We further give an alternative formulation of the kernel least-squares estimator which is shown to possess an even smaller condition number. We show that numerical studies meet our theoretical analysis.
READ FULL TEXT