DeepAI AI Chat
Log In Sign Up

Boosting One-Point Derivative-Free Online Optimization via Residual Feedback

by   Yan Zhang, et al.

Zeroth-order optimization (ZO) typically relies on two-point feedback to estimate the unknown gradient of the objective function. Nevertheless, two-point feedback can not be used for online optimization of time-varying objective functions, where only a single query of the function value is possible at each time step. In this work, we propose a new one-point feedback method for online optimization that estimates the objective function gradient using the residual between two feedback points at consecutive time instants. Moreover, we develop regret bounds for ZO with residual feedback for both convex and nonconvex online optimization problems. Specifically, for both deterministic and stochastic problems and for both Lipschitz and smooth objective functions, we show that using residual feedback can produce gradient estimates with much smaller variance compared to conventional one-point feedback methods. As a result, our regret bounds are much tighter compared to existing regret bounds for ZO with conventional one-point feedback, which suggests that ZO with residual feedback can better track the optimizer of online optimization problems. Additionally, our regret bounds rely on weaker assumptions than those used in conventional one-point feedback methods. Numerical experiments show that ZO with residual feedback significantly outperforms existing one-point feedback methods also in practice.


page 1

page 2

page 3

page 4


Improving the Convergence Rate of One-Point Zeroth-Order Optimization using Residual Feedback

Many existing zeroth-order optimization (ZO) algorithms adopt two-point ...

Online Boosting with Bandit Feedback

We consider the problem of online boosting for regression tasks, when on...

Online and Bandit Algorithms for Nonstationary Stochastic Saddle-Point Optimization

Saddle-point optimization problems are an important class of optimizatio...

Tracking Slowly Moving Clairvoyant: Optimal Dynamic Regret of Online Learning with True and Noisy Gradient

This work focuses on dynamic regret of online convex optimization that c...

Risk-Averse No-Regret Learning in Online Convex Games

We consider an online stochastic game with risk-averse agents whose goal...

Online Optimization and Learning in Uncertain Dynamical Environments with Performance Guarantees

We propose a new framework to solve online optimization and learning pro...

Provably More Efficient Q-Learning in the Full-Feedback/One-Sided-Feedback Settings

We propose two new Q-learning algorithms, Full-Q-Learning (FQL) and Elim...