Log In Sign Up

A Dual Control Variate for doubly stochastic optimization and black-box variational inference

by   Xi Wang, et al.

In this paper, we aim at reducing the variance of doubly stochastic optimization, a type of stochastic optimization algorithm that contains two independent sources of randomness: The subsampling of training data and the Monte Carlo estimation of expectations. Such an optimization regime often has the issue of large gradient variance which would lead to a slow rate of convergence. Therefore we propose Dual Control Variate, a new type of control variate capable of reducing gradient variance from both sources jointly. The dual control variate is built upon approximation-based control variates and incremental gradient methods. We show that on doubly stochastic optimization problems, compared with past variance reduction approaches that take only one source of randomness into account, dual control variate leads to a gradient estimator of significant smaller variance and demonstrates superior performance on real-world applications, like generalized linear models with dropout and black-box variational inference.


page 1

page 2

page 3

page 4


Black Box Variational Inference

Variational inference has become a widely used method to approximate pos...

Using Large Ensembles of Control Variates for Variational Inference

Variational inference is increasingly being addressed with stochastic op...

Scalable Control Variates for Monte Carlo Methods via Stochastic Optimization

Control variates are a well-established tool to reduce the variance of M...

Robust, Automated, and Accurate Black-box Variational Inference

Black-box variational inference (BBVI) now sees widespread use in machin...

Amortized variance reduction for doubly stochastic objectives

Approximate inference in complex probabilistic models such as deep Gauss...

Inference by Stochastic Optimization: A Free-Lunch Bootstrap

Assessing sampling uncertainty in extremum estimation can be challenging...

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

As application demands for zeroth-order (gradient-free) optimization acc...