Rate Optimal Variational Bayesian Inference for Sparse DNN
Sparse deep neural network (DNN) has drawn much attention in recent studies because it possesses a great approximation power and is easier to implement and store in practice comparing to fully connected DNN. In this work, we consider variational Bayesian inference, a computationally efficient alternative to Markov chain Monte Carlo method, on the sparse DNN modeling under spike-and-slab prior. Our theoretical investigation shows that, for any α-Hölder smooth function, the variational posterior distribution shares the (near-)optimal contraction property, and the variation inference leads to (near-)optimal generalization error, as long as the network architecture is properly tuned according to smoothness parameter α. Furthermore, an adaptive variational inference procedure is developed to automatically select optimal network structure even when α is unknown. Our result also applies to the case that the truth is instead a ReLU neural network, and certain contraction bound is obtained.
READ FULL TEXT