Additive Feature Hashing

02/07/2021
by   M. Andrecut, et al.
0

The hashing trick is a machine learning technique used to encode categorical features into a numerical vector representation of pre-defined fixed length. It works by using the categorical hash values as vector indices, and updating the vector values at those indices. Here we discuss a different approach based on additive-hashing and the "almost orthogonal" property of high-dimensional random vectors. That is, we show that additive feature hashing can be performed directly by adding the hash values and converting them into high-dimensional numerical vectors. We show that the performance of additive feature hashing is similar to the hashing trick, and we illustrate the results numerically using synthetic, language recognition, and SMS spam detection data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/27/2023

Locally Uniform Hashing

Hashing is a common technique used in data processing, with a strong imp...
research
02/23/2018

High-Dimensional Vector Semantics

In this paper we explore the "vector semantics" problem from the perspec...
research
01/26/2022

Rapid solution for searching similar audio items

A naive approach for finding similar audio items would be to compare eac...
research
06/06/2020

Chromatic Learning for Sparse Datasets

Learning over sparse, high-dimensional data frequently necessitates the ...
research
09/01/2022

Johnson-Lindenstrauss embeddings for noisy vectors – taking advantage of the noise

This paper investigates theoretical properties of subsampling and hashin...
research
11/16/2021

A Unified and Fast Interpretable Model for Predictive Analytics

In this paper, we propose FXAM (Fast and eXplainable Additive Model), a ...
research
06/28/2023

Pb-Hash: Partitioned b-bit Hashing

Many hashing algorithms including minwise hashing (MinHash), one permuta...

Please sign up or login with your details

Forgot password? Click here to reset