Random Forests Can Hash

12/16/2014
by   Qiang Qiu, et al.
0

Hash codes are a very efficient data representation needed to be able to cope with the ever growing amounts of data. We introduce a random forest semantic hashing scheme with information-theoretic code aggregation, showing for the first time how random forest, a technique that together with deep learning have shown spectacular results in classification, can also be extended to large-scale retrieval. Traditional random forest fails to enforce the consistency of hashes generated from each tree for the same class data, i.e., to preserve the underlying similarity, and it also lacks a principled way for code aggregation across trees. We start with a simple hashing scheme, where independently trained random trees in a forest are acting as hashing functions. We the propose a subspace model as the splitting function, and show that it enforces the hash consistency in a tree for data from the same class. We also introduce an information-theoretic approach for aggregating codes of individual trees into a single hash code, producing a near-optimal unique hash for each class. Experiments on large-scale public datasets are presented, showing that the proposed approach significantly outperforms state-of-the-art hashing methods for retrieval tasks.

READ FULL TEXT
research
11/22/2017

ForestHash: Semantic Hashing With Shallow Random Forests and Tiny Convolutional Networks

Hash codes are efficient data representations for coping with the ever g...
research
05/27/2019

Best-scored Random Forest Classification

We propose an algorithm named best-scored random forest for binary class...
research
03/17/2016

Variable-Length Hashing

Hashing has emerged as a popular technique for large-scale similarity se...
research
06/03/2019

Unsupervised Neural Generative Semantic Hashing

Fast similarity search is a key component in large-scale information ret...
research
02/05/2020

Seeker or Avoider? User Modeling for Inspiration Deployment in Large-Scale Ideation

People react differently to inspirations shown to them during brainstorm...
research
06/06/2022

Hashing Learning with Hyper-Class Representation

Existing unsupervised hash learning is a kind of attribute-centered calc...
research
07/17/2020

Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme

We present a novel approach for the problem of frequency estimation in d...

Please sign up or login with your details

Forgot password? Click here to reset