Optimizing Spectral Sums using Randomized Chebyshev Expansions

02/18/2018
by   Insu Han, et al.
0

The trace of matrix functions, often called spectral sums, e.g., rank, log-determinant and nuclear norm, appear in many machine learning tasks. However, optimizing or computing such (parameterized) spectral sums typically involves the matrix decomposition at the cost cubic in the matrix dimension, which is expensive for large-scale applications. Several recent works were proposed to approximate large-scale spectral sums utilizing polynomial function approximations and stochastic trace estimators. However, all prior works on this line have studied biased estimators, and their direct adaptions to an optimization task under stochastic gradient descent (SGD) frameworks often do not work as accumulated biased errors prevent stable convergence to the optimum. To address the issue, we propose the provable optimal unbiased estimator by randomizing Chebyshev polynomial degrees. We further introduce two additional techniques for accelerating SGD, where key ideas are on sharing randomness among many estimations during the iterative procedure. Finally, we showcase two applications of the proposed SGD schemes: matrix completion and learning Gaussian process, under the real-world datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/31/2018

Stochastic Gradient Descent with Biased but Consistent Gradient Estimators

Stochastic gradient descent (SGD), which dates back to the 1950s, is one...
research
05/25/2023

A Guide Through the Zoo of Biased SGD

Stochastic Gradient Descent (SGD) is arguably the most important single ...
research
06/28/2015

Stochastic Gradient Made Stable: A Manifold Propagation Approach for Large-Scale Optimization

Stochastic gradient descent (SGD) holds as a classical method to build l...
research
08/24/2022

Accelerating SGD for Highly Ill-Conditioned Huge-Scale Online Matrix Completion

The matrix completion problem seeks to recover a d× d ground truth matri...
research
11/19/2021

Gaussian Process Inference Using Mini-batch Stochastic Gradient Descent: Convergence Guarantees and Empirical Benefits

Stochastic gradient descent (SGD) and its variants have established them...
research
07/31/2020

Analysis of SGD with Biased Gradient Estimators

We analyze the complexity of biased stochastic gradient methods (SGD), w...
research
11/04/2022

Spectral Regularization: an Inductive Bias for Sequence Modeling

Various forms of regularization in learning tasks strive for different n...

Please sign up or login with your details

Forgot password? Click here to reset