Ensemble Model Patching: A Parameter-Efficient Variational Bayesian Neural Network

05/23/2019
by   Oscar Chang, et al.
5

Two main obstacles preventing the widespread adoption of variational Bayesian neural networks are the high parameter overhead that makes them infeasible on large networks, and the difficulty of implementation, which can be thought of as "programming overhead." MC dropout [Gal and Ghahramani, 2016] is popular because it sidesteps these obstacles. Nevertheless, dropout is often harmful to model performance when used in networks with batch normalization layers [Li et al., 2018], which are an indispensable part of modern neural networks. We construct a general variational family for ensemble-based Bayesian neural networks that encompasses dropout as a special case. We further present two specific members of this family that work well with batch normalization layers, while retaining the benefits of low parameter and programming overhead, comparable to non-Bayesian training. Our proposed methods improve predictive accuracy and achieve almost perfect calibration on a ResNet-18 trained with ImageNet.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/21/2016

Generalized Dropout

Deep Neural Networks often require good regularizers to generalize well....
research
06/09/2021

Ex uno plures: Splitting One Model into an Ensemble of Subnetworks

Monte Carlo (MC) dropout is a simple and efficient ensembling method tha...
research
12/24/2020

On Batch Normalisation for Approximate Bayesian Inference

We study batch normalisation in the context of variational inference met...
research
05/15/2019

Rethinking the Usage of Batch Normalization and Dropout in the Training of Deep Neural Networks

In this work, we propose a novel technique to boost training efficiency ...
research
08/12/2015

Bayesian Dropout

Dropout has recently emerged as a powerful and simple method for trainin...
research
02/13/2023

How to Use Dropout Correctly on Residual Networks with Batch Normalization

For the stable optimization of deep neural networks, regularization meth...
research
01/16/2018

Understanding the Disharmony between Dropout and Batch Normalization by Variance Shift

This paper first answers the question "why do the two most powerful tech...

Please sign up or login with your details

Forgot password? Click here to reset