Hashing Algorithms for Large-Scale Learning

06/06/2011
by   Ping Li, et al.
0

In this paper, we first demonstrate that b-bit minwise hashing, whose estimators are positive definite kernels, can be naturally integrated with learning algorithms such as SVM and logistic regression. We adopt a simple scheme to transform the nonlinear (resemblance) kernel into linear (inner product) kernel; and hence large-scale problems can be solved extremely efficiently. Our method provides a simple effective solution to large-scale learning in massive and extremely high-dimensional datasets, especially when data do not fit in memory. We then compare b-bit minwise hashing with the Vowpal Wabbit (VW) algorithm (which is related the Count-Min (CM) sketch). Interestingly, VW has the same variances as random projections. Our theoretical and empirical comparisons illustrate that usually b-bit minwise hashing is significantly more accurate (at the same storage) than VW (and random projections) in binary data. Furthermore, b-bit minwise hashing can be combined with VW to achieve further improvements in terms of training speed, especially when b is large.

READ FULL TEXT
research
05/23/2011

b-Bit Minwise Hashing for Large-Scale Linear SVM

In this paper, we propose to (seamlessly) integrate b-bit minwise hashin...
research
08/15/2011

Training Logistic Regression and SVM on 200GB Data Using b-Bit Minwise Hashing and Comparisons with Vowpal Wabbit (VW)

We generated a dataset of 200 GB with 10^9 features, to test our recent ...
research
03/05/2015

Min-Max Kernels

The min-max kernel is a generalization of the popular resemblance kernel...
research
04/24/2014

CoRE Kernels

The term "CoRE kernel" stands for correlation-resemblance kernel. In man...
research
01/07/2022

GCWSNet: Generalized Consistent Weighted Sampling for Scalable and Accurate Training of Neural Networks

We develop the "generalized consistent weighted sampling" (GCWS) for has...
research
08/03/2011

Accurate Estimators for Improving Minwise Hashing and b-Bit Minwise Hashing

Minwise hashing is the standard technique in the context of search and d...
research
11/14/2014

Asymmetric Minwise Hashing

Minwise hashing (Minhash) is a widely popular indexing scheme in practic...

Please sign up or login with your details

Forgot password? Click here to reset