Gradient Descent for Low-Rank Functions

06/16/2022
by   Romain Cosson, et al.
11

Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space. In this paper, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed Low-Rank Gradient Descent (LRGD) algorithm finds an ϵ-approximate stationary point of a p-dimensional function by first identifying r ≤ p significant directions, and then estimating the true p-dimensional gradient at every iteration by computing directional derivatives only along those r directions. We establish that the "directional oracle complexities" of LRGD for strongly convex and non-convex objective functions are 𝒪(r log(1/ϵ) + rp) and 𝒪(r/ϵ^2 + rp), respectively. When r ≪ p, these complexities are smaller than the known complexities of 𝒪(p log(1/ϵ)) and 𝒪(p/ϵ^2) of in the strongly convex and non-convex settings, respectively. Thus, LRGD significantly reduces the computational cost of gradient-based methods for sufficiently low-rank functions. In the course of our analysis, we also formally define and characterize the classes of exact and approximately low-rank functions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/04/2020

Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

In this paper, we consider the problem of empirical risk minimization (E...
research
05/26/2023

Fast and Minimax Optimal Estimation of Low-Rank Matrices via Non-Convex Gradient Descent

We study the problem of estimating a low-rank matrix from noisy measurem...
research
06/27/2021

Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

We study the asymmetric low-rank factorization problem: min_𝐔∈ℝ^m ×...
research
12/02/2021

Blind Super-resolution of Point Sources via Projected Gradient Descent

Blind super-resolution can be cast as a low rank matrix recovery problem...
research
03/24/2023

Implicit Balancing and Regularization: Generalization and Convergence Guarantees for Overparameterized Asymmetric Matrix Sensing

Recently, there has been significant progress in understanding the conve...
research
11/15/2021

Quadratic speedup of global search using a biased crossover of two good solutions

The minimisation of cost functions is crucial in various optimisation fi...
research
03/06/2019

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

In this paper, we present some theoretical work to explain why simple gr...

Please sign up or login with your details

Forgot password? Click here to reset