Min-Max Kernels

03/05/2015
by   Ping Li, et al.
0

The min-max kernel is a generalization of the popular resemblance kernel (which is designed for binary data). In this paper, we demonstrate, through an extensive classification study using kernel machines, that the min-max kernel often provides an effective measure of similarity for nonnegative data. As the min-max kernel is nonlinear and might be difficult to be used for industrial applications with massive data, we show that the min-max kernel can be linearized via hashing techniques. This allows practitioners to apply min-max kernel to large-scale applications using well matured linear algorithms such as linear SVM or logistic regression. The previous remarkable work on consistent weighted sampling (CWS) produces samples in the form of (i^*, t^*) where the i^* records the location (and in fact also the weights) information analogous to the samples produced by classical minwise hashing on binary data. Because the t^* is theoretically unbounded, it was not immediately clear how to effectively implement CWS for building large-scale linear classifiers. In this paper, we provide a simple solution by discarding t^* (which we refer to as the "0-bit" scheme). Via an extensive empirical study, we show that this 0-bit scheme does not lose essential information. We then apply the "0-bit" CWS for building linear classifiers to approximate min-max kernel classifiers, as extensively validated on a wide range of publicly available classification datasets. We expect this work will generate interests among data mining practitioners who would like to efficiently utilize the nonlinear information of non-binary and nonnegative data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/06/2011

Hashing Algorithms for Large-Scale Learning

In this paper, we first demonstrate that b-bit minwise hashing, whose es...
research
03/21/2016

A Comparison Study of Nonlinear Kernels

In this paper, we compare 5 different nonlinear kernels: min-max, RBF, f...
research
12/29/2016

Generalized Intersection Kernel

Following the very recent line of work on the "generalized min-max" (GMM...
research
05/23/2011

b-Bit Minwise Hashing for Large-Scale Linear SVM

In this paper, we propose to (seamlessly) integrate b-bit minwise hashin...
research
05/18/2016

Linearized GMM Kernels and Normalized Random Fourier Features

The method of "random Fourier features (RFF)" has become a popular tool ...
research
04/05/2019

Extracting Factual Min/Max Age Information from Clinical Trial Studies

Population age information is an essential characteristic of clinical tr...
research
08/06/2013

On b-bit min-wise hashing for large-scale regression and classification with sparse data

Large-scale regression problems where both the number of variables, p, a...

Please sign up or login with your details

Forgot password? Click here to reset