
Stochastic Gradient Descent with Exponential Convergence Rates of Expected Classification Errors
We consider stochastic gradient descent for binary classification proble...
read it

A Unified Analysis of Random Fourier Features
We provide the first unified theoretical analysis of supervised learning...
read it

Exponential convergence of testing error for stochastic gradient methods
We consider binary classification problems with positive definite kernel...
read it

Understanding the Impact of Model Incoherence on Convergence of Incremental SGD with Random Reshuffle
Although SGD with random reshuffle has been widelyused in machine learn...
read it

Particle Filtering Methods for Stochastic Optimization with Application to LargeScale Empirical Risk Minimization
There is a recent interest in developing statistical filtering methods f...
read it

Feature uncertainty bounding schemes for large robust nonlinear SVM classifiers
We consider the binary classification problem when data are large and su...
read it

Scalable Approximations for Generalized Linear Problems
In stochastic optimization, the population risk is generally approximate...
read it
Exponential Convergence Rates of Classification Errors on Learning with SGD and Random Features
Although kernel methods are widely used in many learning problems, they have poor scalability to large datasets. To address this problem, sketching and stochastic gradient methods are the most commonly used techniques to derive efficient largescale learning algorithms. In this study, we consider solving a binary classification problem using random features and stochastic gradient descent. In recent research, an exponential convergence rate of the expected classification error under the strong lownoise condition has been shown. We extend these analyses to a random features setting, analyzing the error induced by the approximation of random features in terms of the distance between the generated hypothesis including population risk minimizers and empirical risk minimizers when using general Lipschitz loss functions, to show that an exponential convergence of the expected classification error is achieved even if random features approximation is applied. Additionally, we demonstrate that the convergence rate does not depend on the number of features and there is a significant computational benefit in using random features in classification problems because of the strong lownoise condition.
READ FULL TEXT
Comments
There are no comments yet.