Quickly Finding the Best Linear Model in High Dimensions

07/03/2019
by   Yahya Sattar, et al.
0

We study the problem of finding the best linear model that can minimize least-squares loss given a data-set. While this problem is trivial in the low dimensional regime, it becomes more interesting in high dimensions where the population minimizer is assumed to lie on a manifold such as sparse vectors. We propose projected gradient descent (PGD) algorithm to estimate the population minimizer in the finite sample regime. We establish linear convergence rate and data dependent estimation error bounds for PGD. Our contributions include: 1) The results are established for heavier tailed sub-exponential distributions besides sub-gaussian. 2) We directly analyze the empirical risk minimization and do not require a realizable model that connects input data and labels. 3) Our PGD algorithm is augmented to learn the bias terms which boosts the performance. The numerical experiments validate our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2015

Learning Single Index Models in High Dimensions

Single Index Models (SIMs) are simple yet flexible semi-parametric model...
research
04/25/2020

Finite-sample analysis of interpolating linear classifiers in the overparameterized regime

We prove bounds on the population risk of the maximum margin algorithm f...
research
11/12/2020

Towards Optimal Problem Dependent Generalization Error Bounds in Statistical Learning Theory

We study problem-dependent rates, i.e., generalization errors that scale...
research
03/02/2017

The Second Order Linear Model

We study a fundamental class of regression models called the second orde...
research
05/10/2017

Learning ReLUs via Gradient Descent

In this paper we study the problem of learning Rectified Linear Units (R...
research
11/21/2016

Scalable Approximations for Generalized Linear Problems

In stochastic optimization, the population risk is generally approximate...
research
04/07/2023

Graphon Estimation in bipartite graphs with observable edge labels and unobservable node labels

Many real-world data sets can be presented in the form of a matrix whose...

Please sign up or login with your details

Forgot password? Click here to reset