Implicit Regularization of Bregman Proximal Point Algorithm and Mirror Descent on Separable Data

08/15/2021
by   Yan Li, et al.
0

Bregman proximal point algorithm (BPPA), as one of the centerpieces in the optimization toolbox, has been witnessing emerging applications. With simple and easy to implement update rule, the algorithm bears several compelling intuitions for empirical successes, yet rigorous justifications are still largely unexplored. We study the computational properties of BPPA through classification tasks with separable data, and demonstrate provable algorithmic regularization effects associated with BPPA. We show that BPPA attains non-trivial margin, which closely depends on the condition number of the distance generating function inducing the Bregman divergence. We further demonstrate that the dependence on the condition number is tight for a class of problems, thus showing the importance of divergence in affecting the quality of the obtained solutions. In addition, we extend our findings to mirror descent (MD), for which we establish similar connections between the margin and Bregman divergence. We demonstrate through a concrete example, and show BPPA/MD converges in direction to the maximal margin solution with respect to the Mahalanobis distance. Our theoretical findings are among the first to demonstrate the benign learning properties BPPA/MD, and also provide corroborations for a careful choice of divergence in the algorithmic design.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2022

Mirror Descent Maximizes Generalized Margin and Can Be Implemented Efficiently

Driven by the empirical success and wide use of deep neural networks, un...
research
06/12/2018

Convergence of SGD in Learning ReLU Models with Separable Data

We consider the binary classification problem in which the objective fun...
research
10/20/2018

Condition Number Analysis of Logistic Regression, and its Implications for Standard First-Order Solution Methods

Logistic regression is one of the most popular methods in binary classif...
research
06/07/2019

Inductive Bias of Gradient Descent based Adversarial Training on Separable Data

Adversarial training is a principled approach for training robust neural...
research
04/22/2019

Provable Bregman-divergence based Methods for Nonconvex and Non-Lipschitz Problems

The (global) Lipschitz smoothness condition is crucial in establishing t...
research
10/21/2022

The Stochastic Proximal Distance Algorithm

Stochastic versions of proximal methods have gained much attention in st...
research
01/16/2023

Tale of two c(omplex)ities

For decades, best subset selection (BSS) has eluded statisticians mainly...

Please sign up or login with your details

Forgot password? Click here to reset