COCO Denoiser: Using Co-Coercivity for Variance Reduction in Stochastic Convex Optimization

09/07/2021
by   Manuel Madeira, et al.
0

First-order methods for stochastic optimization have undeniable relevance, in part due to their pivotal role in machine learning. Variance reduction for these algorithms has become an important research topic. In contrast to common approaches, which rarely leverage global models of the objective function, we exploit convexity and L-smoothness to improve the noisy estimates outputted by the stochastic gradient oracle. Our method, named COCO denoiser, is the joint maximum likelihood estimator of multiple function gradients from their noisy observations, subject to co-coercivity constraints between them. The resulting estimate is the solution of a convex Quadratically Constrained Quadratic Problem. Although this problem is expensive to solve by interior point methods, we exploit its structure to apply an accelerated first-order algorithm, the Fast Dual Proximal Gradient method. Besides analytically characterizing the proposed estimator, we show empirically that increasing the number and proximity of the queried points leads to better gradient estimates. We also apply COCO in stochastic settings by plugging it in existing algorithms, such as SGD, Adam or STRSAGA, outperforming their vanilla versions, even in scenarios where our modelling assumptions are mismatched.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/15/2020

Improved Complexities for Stochastic Conditional Gradient Methods under Interpolation-like Conditions

We analyze stochastic conditional gradient type methods for constrained ...
research
01/22/2019

On convergence rate of stochastic proximal point algorithm without strong convexity, smoothness or bounded gradients

Significant parts of the recent learning literature on stochastic optimi...
research
10/04/2016

Stochastic Optimization with Variance Reduction for Infinite Datasets with Finite-Sum Structure

Stochastic optimization algorithms with variance reduction have proven s...
research
03/02/2023

Variance-reduced Clipping for Non-convex Optimization

Gradient clipping is a standard training technique used in deep learning...
research
01/25/2019

Estimate Sequences for Stochastic Composite Optimization: Variance Reduction, Acceleration, and Robustness to Noise

In this paper, we propose a unified view of gradient-based algorithms fo...
research
09/06/2022

Faster federated optimization under second-order similarity

Federated learning (FL) is a subfield of machine learning where multiple...
research
02/05/2016

Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters

The amount of data available in the world is growing faster than our abi...

Please sign up or login with your details

Forgot password? Click here to reset