Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure

08/14/2020
by   Puja Sahu, et al.
0

We investigate optimal posteriors for recently introduced <cit.> chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviations of empirical risks, usually with subset support. For uniform prior, it is sufficient to search among posteriors on classifier subsets ordered by these risks. We show the bound minimization for linear distance as a convex program and obtain a closed-form expression for its optimal posterior. Whereas that for squared distance is a quasi-convex program under a specific condition, and the one for KL-divergence is non-convex optimization (a difference of convex functions). To compute such optimal posteriors, we derive fast converging fixed point (FP) equations. We apply these approaches to a finite set of SVM regularization parameter values to yield stochastic SVMs with tight bounds. We perform a comprehensive performance comparison between our optimal posteriors and known KL-divergence based posteriors on a variety of UCI datasets with varying ranges and variances in risk values, etc. Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors. Our study highlights the impact of divergence function on the performance of PAC-Bayesian classifiers. We compare our stochastic classifiers with cross-validation based deterministic classifier. The latter has better test errors, but ours is more sample robust, has quantifiable generalization guarantees, and is computationally much faster.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/14/2019

Optimal PAC-Bayesian Posteriors for Stochastic Classifiers and their use for Choice of SVM Regularization Parameter

PAC-Bayesian set up involves a stochastic classifier characterized by a ...
research
08/19/2016

A Strongly Quasiconvex PAC-Bayesian Bound

We propose a new PAC-Bayesian bound and a way of constructing a hypothes...
research
06/07/2023

Learning via Wasserstein-Based High Probability Generalisation Bounds

Minimising upper bounds on the population risk or the generalisation gap...
research
07/01/2022

Integral Probability Metrics PAC-Bayes Bounds

We present a PAC-Bayes-style generalization bound which enables the repl...
research
02/14/2023

Concentration Bounds for Discrete Distribution Estimation in KL Divergence

We study the problem of discrete distribution estimation in KL divergenc...
research
01/26/2022

Self-Certifying Classification by Linearized Deep Assignment

We propose a novel class of deep stochastic predictors for classifying m...
research
07/14/2020

A Decentralized Approach to Bayesian Learning

Motivated by decentralized approaches to machine learning, we propose a ...

Please sign up or login with your details

Forgot password? Click here to reset