scikit-hubness: Hubness Reduction and Approximate Neighbor Search

12/02/2019
by   Roman Feldbauer, et al.
0

This paper introduces scikit-hubness, a Python package for efficient nearest neighbor search in high-dimensional spaces. Hubness is an aspect of the curse of dimensionality, and is known to impair various learning tasks, including classification, clustering, and visualization. scikit-hubness provides algorithms for hubness analysis ("Is my data affected by hubness?"), hubness reduction ("How can we improve neighbor retrieval in high dimensions?"), and approximate neighbor search ("Does it work for large data sets?"). It is integrated into the scikit-learn environment, enabling rapid adoption by Python-based machine learning researchers and practitioners. Users will find all functionality of the scikit-learn neighbors package, plus additional support for transparent hubness reduction and approximate nearest neighbor search. scikit-hubness is developed using several quality assessment tools and principles, such as PEP8 compliance, unit tests with high code coverage, continuous integration on all major platforms (Linux, MacOS, Windows), and additional checks by LGTM. The source code is available at https://github.com/VarIr/scikit-hubness under the BSD 3-clause license. Install from the Python package index with pip install scikit-hubness.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/19/2020

LANNS: A Web-Scale Approximate Nearest Neighbor Lookup System

Nearest neighbor search (NNS) has a wide range of applications in inform...
research
03/24/2022

Hierarchical Nearest Neighbor Graph Embedding for Efficient Dimensionality Reduction

Dimensionality reduction is crucial both for visualization and preproces...
research
02/10/2021

Leveraging Reinforcement Learning for evaluating Robustness of KNN Search Algorithms

The problem of finding K-nearest neighbors in the given dataset for a gi...
research
12/03/2020

Approximate kNN Classification for Biomedical Data

We are in the era where the Big Data analytics has changed the way of in...
research
10/25/2022

Redistributor: Transforming Empirical Data Distributions

We present an algorithm and package, Redistributor, which forces a colle...
research
11/05/2021

SPANN: Highly-efficient Billion-scale Approximate Nearest Neighbor Search

The in-memory algorithms for approximate nearest neighbor search (ANNS) ...
research
10/22/2019

Lucene for Approximate Nearest-Neighbors Search on Arbitrary Dense Vectors

We demonstrate three approaches for adapting the open-source Lucene sear...

Please sign up or login with your details

Forgot password? Click here to reset