New Nearly-Optimal Coreset for Kernel Density Estimation
Given a point set Pāā^d, kernel density estimation for Gaussian kernel is defined as š¢_P(x) = 1/|P|ā_pā Pe^-ā x-p ā^2 for any xāā^d. We study how to construct a small subset Q of P such that the kernel density estimation of P can be approximated by the kernel density estimation of Q. This subset Q is called coreset. The primary technique in this work is to construct ± 1 coloring on the point set P by the discrepancy theory and apply this coloring algorithm recursively. Our result leverages Banaszczyk's Theorem. When d>1 is constant, our construction gives a coreset of size O(1/εā(loglog1/ε)) as opposed to the best-known result of O(1/εā(log1/ε)). It is the first to give a breakthrough on the barrier of ā(log) factor even when d=2.
READ FULL TEXT