Gradient conjugate priors and deep neural networks

02/07/2018
by   Pavel Gurevich, et al.
0

The paper deals with learning the probability distribution of the observed data by artificial neural networks. We suggest a so-called gradient conjugate prior (GCP) update appropriate for neural networks, which is a modification of the classical Bayesian update for conjugate priors. We establish a connection between the gradient conjugate prior update and the maximization of the log-likelihood of the predictive distribution. Unlike for the Bayesian neural networks, we do not impose a prior on the weights of the neural networks, but rather assume that the ground truth distribution is normal with unknown mean and variance and learn by neural networks the parameters of a prior (normal-gamma distribution) for these unknown mean and variance. The update of the parameters is done, using the gradient that, at each step, directs towards minimizing the Kullback--Leibler divergence from the prior to the posterior distribution (both being normal-gamma). We obtain a corresponding dynamical system for the prior's parameters and analyze its properties. In particular, we study the limiting behavior of all the prior's parameters and show how it differs from the case of the classical full Bayesian update. The results are validated on synthetic and real world data sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2019

Robustness Against Outliers For Deep Neural Networks By Gradient Conjugate Priors

We analyze a new robust method for the reconstruction of probability dis...
research
03/06/2023

Thompson Sampling for Linear Bandit Problems with Normal-Gamma Priors

We consider Thompson sampling for linear bandit problems with finitely m...
research
04/02/2019

BCMA-ES: A Bayesian approach to CMA-ES

This paper introduces a novel theoretically sound approach for the celeb...
research
06/09/2022

Normalized power priors always discount historical data

Power priors are used for incorporating historical data in Bayesian anal...
research
05/15/2021

On the Distributional Properties of Adaptive Gradients

Adaptive gradient methods have achieved remarkable success in training d...
research
07/27/2019

Bayesian Robustness: A Nonasymptotic Viewpoint

We study the problem of robustly estimating the posterior distribution f...
research
07/25/2017

Some Computational Aspects to Find Accurate Estimates for the Parameters of the Generalized Gamma distribution

In this paper, we discuss computational aspects to obtain accurate infer...

Please sign up or login with your details

Forgot password? Click here to reset