Log In Sign Up

Accelerated Zeroth-Order Momentum Methods from Mini to Minimax Optimization

by   Feihu Huang, et al.

In the paper, we propose a new accelerated zeroth-order momentum (Acc-ZOM) method to solve the non-convex stochastic mini-optimization problems. We prove that the Acc-ZOM method achieves a lower query complexity of O(d^3/4ϵ^-3) for finding an ϵ-stationary point, which improves the best known result by a factor of O(d^1/4) where d denotes the parameter dimension. The Acc-ZOM does not require any batches compared to the large batches required in the existing zeroth-order stochastic algorithms. Further, we extend the Acc-ZOM method to solve the non-convex stochastic minimax-optimization problems and propose an accelerated zeroth-order momentum descent ascent (Acc-ZOMDA) method. We prove that the Acc-ZOMDA method reaches the best know query complexity of Õ(κ_y^3(d_1+d_2)^3/2ϵ^-3) for finding an ϵ-stationary point, where d_1 and d_2 denote dimensions of the mini and max optimization parameters respectively and κ_y is condition number. In particular, our theoretical result does not rely on large batches required in the existing methods. Moreover, we propose a momentum-based accelerated framework for the minimax-optimization problems. At the same time, we present an accelerated momentum descent ascent (Acc-MDA) method for solving the white-box minimax problems, and prove that it achieves the best known gradient complexity of Õ(κ_y^3ϵ^-3) without large batches. Extensive experimental results on the black-box adversarial attack to deep neural networks (DNNs) and poisoning attack demonstrate the efficiency of our algorithms.


page 1

page 2

page 3

page 4


Gradient Descent Ascent for Min-Max Problems on Riemannian Manifold

In the paper, we study a class of useful non-convex minimax optimization...

AdaGDA: Faster Adaptive Gradient Descent Ascent Methods for Minimax Optimization

In the paper, we propose a class of faster adaptive gradient descent asc...

Bregman Gradient Policy Optimization

In this paper, we design a novel Bregman gradient policy optimization fr...

On the Application of Danskin's Theorem to Derivative-Free Minimax Optimization

Motivated by Danskin's theorem, gradient-based methods have been applied...

Enhanced Bilevel Optimization via Bregman Distance

Bilevel optimization has been widely applied many machine learning probl...

Direct Acceleration of SAGA using Sampled Negative Momentum

Variance reduction is a simple and effective technique that accelerates ...

Katyusha X: Practical Momentum Method for Stochastic Sum-of-Nonconvex Optimization

The problem of minimizing sum-of-nonconvex functions (i.e., convex funct...