Representation Learning for Efficient and Effective Similarity Search and Recommendation

09/04/2021
by   Casper Hansen, et al.
0

How data is represented and operationalized is critical for building computational solutions that are both effective and efficient. A common approach is to represent data objects as binary vectors, denoted hash codes, which require little storage and enable efficient similarity search through direct indexing into a hash table or through similarity computations in an appropriate space. Due to the limited expressibility of hash codes, compared to real-valued representations, a core open challenge is how to generate hash codes that well capture semantic content or latent properties using a small number of bits, while ensuring that the hash codes are distributed in a way that does not reduce their search efficiency. State of the art methods use representation learning for generating such hash codes, focusing on neural autoencoder architectures where semantics are encoded into the hash codes by learning to reconstruct the original inputs of the hash codes. This thesis addresses the above challenge and makes a number of contributions to representation learning that (i) improve effectiveness of hash codes through more expressive representations and a more effective similarity measure than the current state of the art, namely the Hamming distance, and (ii) improve efficiency of hash codes by learning representations that are especially suited to the choice of search method. The contributions are empirically validated on several tasks related to similarity search and recommendation.

READ FULL TEXT
research
03/26/2021

Projected Hamming Dissimilarity for Bit-Level Importance Coding in Collaborative Filtering

When reasoning about tasks that involve large amounts of data, a common ...
research
03/26/2021

Unsupervised Multi-Index Semantic Hashing

Semantic hashing represents documents as compact binary vectors (hash co...
research
05/15/2018

Efficient end-to-end learning for quantizable representations

Embedding representation learning via neural networks is at the core fou...
research
08/26/2019

Embarrassingly Simple Binary Representation Learning

Recent binary representation learning models usually require sophisticat...
research
10/01/2018

Learning Hash Codes via Hamming Distance Targets

We present a powerful new loss function and training scheme for learning...
research
11/20/2022

Simultaneously Learning Robust Audio Embeddings and balanced Hash codes for Query-by-Example

Audio fingerprinting systems must efficiently and robustly identify quer...
research
06/06/2023

Constant Sequence Extension for Fast Search Using Weighted Hamming Distance

Representing visual data using compact binary codes is attracting increa...

Please sign up or login with your details

Forgot password? Click here to reset