Fast and Accurate Least-Mean-Squares Solvers

06/11/2019
by   Alaa Maalouf, et al.
0

Least-mean squares (LMS) solvers such as Linear / Ridge / Lasso-Regression, SVD and Elastic-Net not only solve fundamental machine learning problems, but are also the building blocks in a variety of other methods, such as decision trees and matrix factorizations. We suggest an algorithm that gets a finite set of n d-dimensional real vectors and returns a weighted subset of d+1 vectors whose sum is exactly the same. The proof in Caratheodory's Theorem (1907) computes such a subset in O(n^2d^2) time and thus not used in practice. Our algorithm computes this subset in O(nd) time, using O( n) calls to Caratheodory's construction on small but "smart" subsets. This is based on a novel paradigm of fusion between different data summarization techniques, known as sketches and coresets. As an example application, we show how it can be used to boost the performance of existing LMS solvers, such as those in scikit-learn library, up to x100. Generalization for streaming and distributed (big) data is trivial. Extensive experimental results and complete open source code are also provided.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/02/2019

Tight Sensitivity Bounds For Smaller Coresets

An ε-coreset for Least-Mean-Squares (LMS) of a matrix A∈R^n× d is a smal...
research
11/30/2015

Coresets for Kinematic Data: From Theorems to Real-Time Systems

A coreset (or core-set) of a dataset is its semantic compression with re...
research
10/07/2021

Coresets for Decision Trees of Signals

A k-decision tree t (or k-tree) is a recursive partition of a matrix (2D...
research
11/04/2021

Introduction to Coresets: Approximated Mean

A strong coreset for the mean queries of a set P in ℝ^d is a small weigh...
research
03/03/2021

Advancing Mixture Models for Least Squares Optimization

Gaussian mixtures are a powerful and widely used tool to model non-Gauss...
research
06/09/2020

Faster PAC Learning and Smaller Coresets via Smoothed Analysis

PAC-learning usually aims to compute a small subset (ε-sample/net) from ...
research
10/19/2019

Introduction to Coresets: Accurate Coresets

A coreset (or core-set) of an input set is its small summation, such tha...

Please sign up or login with your details

Forgot password? Click here to reset