A witness function based construction of discriminative models using Hermite polynomials

01/10/2019
by   H. N. Mhaskar, et al.
0

In machine learning, we are given a dataset of the form {(x_j,y_j)}_j=1^M, drawn as i.i.d. samples from an unknown probability distribution μ; the marginal distribution for the x_j's being μ^*. We propose that rather than using a positive kernel such as the Gaussian for estimation of these measures, using a non-positive kernel that preserves a large number of moments of these measures yields an optimal approximation. We use multi-variate Hermite polynomials for this purpose, and prove optimal and local approximation results in a supremum norm in a probabilistic sense. Together with a permutation test developed with the same kernel, we prove that the kernel estimator serves as a `witness function' in classification problems. Thus, if the value of this estimator at a point x exceeds a certain threshold, then the point is reliably in a certain class. This approach can be used to modify pretrained algorithms, such as neural networks or nonlinear dimension reduction techniques, to identify in-class vs out-of-class regions for the purposes of generative models, classification uncertainty, or finding robust centroids. This fact is demonstrated in a number of real world data sets including MNIST, CIFAR10, Science News documents, and LaLonde data sets.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset