Optimal Posteriors for Chi-squared Divergence based PAC-Bayesian Bounds and Comparison with KL-divergence based Optimal Posteriors and Cross-Validation Procedure

by   Puja Sahu, et al.

We investigate optimal posteriors for recently introduced <cit.> chi-squared divergence based PAC-Bayesian bounds in terms of nature of their distribution, scalability of computations, and test set performance. For a finite classifier set, we deduce bounds for three distance functions: KL-divergence, linear and squared distances. Optimal posterior weights are proportional to deviations of empirical risks, usually with subset support. For uniform prior, it is sufficient to search among posteriors on classifier subsets ordered by these risks. We show the bound minimization for linear distance as a convex program and obtain a closed-form expression for its optimal posterior. Whereas that for squared distance is a quasi-convex program under a specific condition, and the one for KL-divergence is non-convex optimization (a difference of convex functions). To compute such optimal posteriors, we derive fast converging fixed point (FP) equations. We apply these approaches to a finite set of SVM regularization parameter values to yield stochastic SVMs with tight bounds. We perform a comprehensive performance comparison between our optimal posteriors and known KL-divergence based posteriors on a variety of UCI datasets with varying ranges and variances in risk values, etc. Chi-squared divergence based posteriors have weaker bounds and worse test errors, hinting at an underlying regularization by KL-divergence based posteriors. Our study highlights the impact of divergence function on the performance of PAC-Bayesian classifiers. We compare our stochastic classifiers with cross-validation based deterministic classifier. The latter has better test errors, but ours is more sample robust, has quantifiable generalization guarantees, and is computationally much faster.



There are no comments yet.


page 1

page 2

page 3

page 4


Optimal PAC-Bayesian Posteriors for Stochastic Classifiers and their use for Choice of SVM Regularization Parameter

PAC-Bayesian set up involves a stochastic classifier characterized by a ...

A Strongly Quasiconvex PAC-Bayesian Bound

We propose a new PAC-Bayesian bound and a way of constructing a hypothes...

Simpler PAC-Bayesian Bounds for Hostile Data

PAC-Bayesian learning bounds are of the utmost interest to the learning ...

Bregman Divergence Bounds and the Universality of the Logarithmic Loss

A loss function measures the discrepancy between the true values and the...

Self-Certifying Classification by Linearized Deep Assignment

We propose a novel class of deep stochastic predictors for classifying m...

Concentration and Confidence for Discrete Bayesian Sequence Predictors

Bayesian sequence prediction is a simple technique for predicting future...

A Decentralized Approach to Bayesian Learning

Motivated by decentralized approaches to machine learning, we propose a ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.