DeepAI AI Chat
Log In Sign Up

Gradient Descent for Low-Rank Functions

by   Romain Cosson, et al.
Purdue University

Several recent empirical studies demonstrate that important machine learning tasks, e.g., training deep neural networks, exhibit low-rank structure, where the loss function varies significantly in only a few directions of the input space. In this paper, we leverage such low-rank structure to reduce the high computational cost of canonical gradient-based methods such as gradient descent (GD). Our proposed Low-Rank Gradient Descent (LRGD) algorithm finds an ϵ-approximate stationary point of a p-dimensional function by first identifying r ≤ p significant directions, and then estimating the true p-dimensional gradient at every iteration by computing directional derivatives only along those r directions. We establish that the "directional oracle complexities" of LRGD for strongly convex and non-convex objective functions are 𝒪(r log(1/ϵ) + rp) and 𝒪(r/ϵ^2 + rp), respectively. When r ≪ p, these complexities are smaller than the known complexities of 𝒪(p log(1/ϵ)) and 𝒪(p/ϵ^2) of in the strongly convex and non-convex settings, respectively. Thus, LRGD significantly reduces the computational cost of gradient-based methods for sufficiently low-rank functions. In the course of our analysis, we also formally define and characterize the classes of exact and approximately low-rank functions.


page 1

page 2

page 3

page 4


Gradient-Based Empirical Risk Minimization using Local Polynomial Regression

In this paper, we consider the problem of empirical risk minimization (E...

Fast and Minimax Optimal Estimation of Low-Rank Matrices via Non-Convex Gradient Descent

We study the problem of estimating a low-rank matrix from noisy measurem...

Global Convergence of Gradient Descent for Asymmetric Low-Rank Matrix Factorization

We study the asymmetric low-rank factorization problem: min_𝐔∈ℝ^m ×...

Blind Super-resolution of Point Sources via Projected Gradient Descent

Blind super-resolution can be cast as a low rank matrix recovery problem...

Quadratic speedup of global search using a biased crossover of two good solutions

The minimisation of cost functions is crucial in various optimisation fi...

Why Learning of Large-Scale Neural Networks Behaves Like Convex Optimization

In this paper, we present some theoretical work to explain why simple gr...