DeepAI AI Chat
Log In Sign Up

Kernel Distribution Embeddings: Universal Kernels, Characteristic Kernels and Kernel Metrics on Distributions

Kernel mean embeddings have recently attracted the attention of the machine learning community. They map measures μ from some set M to functions in a reproducing kernel Hilbert space (RKHS) with kernel k. The RKHS distance of two mapped measures is a semi-metric d_k over M. We study three questions. (I) For a given kernel, what sets M can be embedded? (II) When is the embedding injective over M (in which case d_k is a metric)? (III) How does the d_k-induced topology compare to other topologies on M? The existing machine learning literature has addressed these questions in cases where M is (a subset of) the finite regular Borel measures. We unify, improve and generalise those results. Our approach naturally leads to continuous and possibly even injective embeddings of (Schwartz-) distributions, i.e., generalised measures, but the reader is free to focus on measures only. In particular, we systemise and extend various (partly known) equivalences between different notions of universal, characteristic and strictly positive definite kernels, and show that on an underlying locally compact Hausdorff space, d_k metrises the weak convergence of probability measures if and only if k is continuous and characteristic.


page 1

page 2

page 3

page 4


Strictly proper kernel scores and characteristic kernels on compact spaces

Strictly proper kernel scores are well-known tool in probabilistic forec...

Metrizing Weak Convergence with Maximum Mean Discrepancies

Theorem 12 of Simon-Gabriel Schölkopf (JMLR, 2018) seemed to close a...

Characteristic Kernels and Infinitely Divisible Distributions

We connect shift-invariant characteristic kernels to infinitely divisibl...

D2KE: From Distance to Kernel and Embedding

For many machine learning problem settings, particularly with structured...

Approximation capability of neural networks on spaces of probability measures and tree-structured domains

This paper extends the proof of density of neural networks in the space ...

Distribution Regression with Sliced Wasserstein Kernels

The problem of learning functions over spaces of probabilities - or dist...