Nonparametric Divergence Estimation with Applications to Machine Learning on Distributions

02/14/2012
by   Barnabas Poczos, et al.
0

Low-dimensional embedding, manifold learning, clustering, classification, and anomaly detection are among the most important problems in machine learning. The existing methods usually consider the case when each instance has a fixed, finite-dimensional feature representation. Here we consider a different setting. We assume that each instance corresponds to a continuous probability distribution. These distributions are unknown, but we are given some i.i.d. samples from each distribution. Our goal is to estimate the distances between these distributions and use these distances to perform low-dimensional embedding, clustering/classification, or anomaly detection for the distributions. We present estimation algorithms, describe how to apply them for machine learning tasks on distributions, and show empirical results on synthetic data, real word images, and astronomical data sets.

READ FULL TEXT
research
02/01/2012

Kernels on Sample Sets via Nonparametric Divergence Estimates

Most machine learning algorithms, such as classification or regression, ...
research
12/22/2019

Unsupervised Representation Learning by Predicting Random Distances

Deep neural networks have gained tremendous success in a broad range of ...
research
07/12/2016

Incomplete Pivoted QR-based Dimensionality Reduction

High-dimensional big data appears in many research fields such as image ...
research
09/30/2017

Decontamination of Mutual Contamination Models

Many machine learning problems can be characterized by mutual contaminat...
research
03/10/2021

Multicalibrated Partitions for Importance Weights

The ratio between the probability that two distributions R and P give to...
research
10/25/2020

On synthetic data generation for anomaly detection in complex social networks

This paper studies the feasibility of synthetic data generation for miss...
research
09/01/2020

Graph Embedding with Data Uncertainty

spectral-based subspace learning is a common data preprocessing step in ...

Please sign up or login with your details

Forgot password? Click here to reset