Theory of Dual-sparse Regularized Randomized Reduction

04/15/2015
by   Tianbao Yang, et al.
0

In this paper, we study randomized reduction methods, which reduce high-dimensional features into low-dimensional space by randomized methods (e.g., random projection, random hashing), for large-scale high-dimensional classification. Previous theoretical results on randomized reduction methods hinge on strong assumptions about the data, e.g., low rank of the data matrix or a large separable margin of classification, which hinder their applications in broad domains. To address these limitations, we propose dual-sparse regularized randomized reduction methods that introduce a sparse regularizer into the reduced dual problem. Under a mild condition that the original dual solution is a (nearly) sparse vector, we show that the resulting dual solution is close to the original dual solution and concentrates on its support set. In numerical experiments, we present an empirical study to support the analysis and we also present a novel application of the dual-sparse regularized randomized reduction methods to reducing the communication cost of distributed learning from large-scale high-dimensional data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/13/2015

Adaptive Randomized Dimension Reduction on Massive Data

The scalability of statistical estimators is of increasing importance in...
research
12/20/2016

Randomized Clustered Nystrom for Large-Scale Kernel Machines

The Nystrom method has been popular for generating the low-rank approxim...
research
08/11/2020

Random Projections and Dimension Reduction

This paper, broadly speaking, covers the use of randomness in two main a...
research
10/16/2020

Learnable Graph-regularization for Matrix Decomposition

Low-rank approximation models of data matrices have become important mac...
research
02/05/2020

Improved Subsampled Randomized Hadamard Transform for Linear SVM

Subsampled Randomized Hadamard Transform (SRHT), a popular random projec...
research
02/10/2015

Implementing Randomized Matrix Algorithms in Parallel and Distributed Environments

In this era of large-scale data, distributed systems built on top of clu...
research
05/31/2023

Representer Point Selection for Explaining Regularized High-dimensional Models

We introduce a novel class of sample-based explanations we term high-dim...

Please sign up or login with your details

Forgot password? Click here to reset