Sketching Meets Random Projection in the Dual: A Provable Recovery Algorithm for Big and High-dimensional Data

10/10/2016
by   Jialei Wang, et al.
0

Sketching techniques have become popular for scaling up machine learning algorithms by reducing the sample size or dimensionality of massive data sets, while still maintaining the statistical power of big data. In this paper, we study sketching from an optimization point of view: we first show that the iterative Hessian sketch is an optimization process with preconditioning, and develop accelerated iterative Hessian sketch via the searching the conjugate direction; we then establish primal-dual connections between the Hessian sketch and dual random projection, and apply the preconditioned conjugate gradient approach on the dual problem, which leads to the accelerated iterative dual random projection methods. Finally to tackle the challenges from both large sample size and high-dimensionality, we propose the primal-dual sketch, which iteratively sketches the primal and dual formulations. We show that using a logarithmic number of calls to solvers of small scale problem, primal-dual sketch is able to recover the optimum of the original problem up to arbitrary precision. The proposed algorithms are validated via extensive experiments on synthetic and real data sets which complements our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2023

Improved Rate of First Order Algorithms for Entropic Optimal Transport

This paper improves the state-of-the-art rate of a first-order algorithm...
research
02/20/2019

Adaptive Iterative Hessian Sketch via A-Optimal Subsampling

Iterative Hessian sketch (IHS) is an effective sketching method for mode...
research
01/22/2019

A Fast Iterative Algorithm for High-dimensional Differential Network

Differential network is an important tool to capture the changes of cond...
research
01/22/2015

Sketch and Validate for Big Data Clustering

In response to the need for learning tools tuned to big data analytics, ...
research
11/06/2017

An Iterative Scheme for Leverage-based Approximate Aggregation

Currently data explosion poses great challenges to approximate aggregati...
research
07/03/2019

An Econometric View of Algorithmic Subsampling

Datasets that are terabytes in size are increasingly common, but compute...
research
07/03/2019

An Econometric Perspective of Algorithmic Sampling

Datasets that are terabytes in size are increasingly common, but compute...

Please sign up or login with your details

Forgot password? Click here to reset