Log In Sign Up

Comparison of Classification Methods for Very High-Dimensional Data in Sparse Random Projection Representation

by   Anton Akusok, et al.

The big data trend has inspired feature-driven learning tasks, which cannot be handled by conventional machine learning models. Unstructured data produces very large binary matrices with millions of columns when converted to vector form. However, such data is often sparse, and hence can be manageable through the use of sparse random projections. This work studies efficient non-iterative and iterative methods suitable for such data, evaluating the results on two representative machine learning tasks with millions of samples and features. An efficient Jaccard kernel is introduced as an alternative to the sparse random projection. Findings indicate that non-iterative methods can find larger, more accurate models than iterative methods in different application scenarios.


page 1

page 2

page 3

page 4


Function Approximation via Sparse Random Features

Random feature methods have been successful in various machine learning ...

SparseChem: Fast and accurate machine learning model for small molecules

SparseChem provides fast and accurate machine learning models for bioche...

Binary Random Projections with Controllable Sparsity Patterns

Random projection is often used to project higher-dimensional vectors on...

Modeling Winner-Take-All Competition in Sparse Binary Projections

Inspired by the advances in biological science, the study of sparse bina...

Random Projection Estimation of Discrete-Choice Models with Large Choice Sets

We introduce sparse random projection, an important dimension-reduction ...

Comparison of Several Sparse Recovery Methods for Low Rank Matrices with Random Samples

In this paper, we will investigate the efficacy of IMAT (Iterative Metho...

A Non-iterative Parallelizable Eigenbasis Algorithm for Johnson Graphs

We present a new O(k^2 nk^2) method for generating an orthogonal basis o...