Approximate kNN Classification for Biomedical Data

12/03/2020
by   Panagiotis Anagnostou, et al.
0

We are in the era where the Big Data analytics has changed the way of interpreting the various biomedical phenomena, and as the generated data increase, the need for new machine learning methods to handle this evolution grows. An indicative example is the single-cell RNA-seq (scRNA-seq), an emerging DNA sequencing technology with promising capabilities but significant computational challenges due to the large-scaled generated data. Regarding the classification process for scRNA-seq data, an appropriate method is the k Nearest Neighbor (kNN) classifier since it is usually utilized for large-scale prediction tasks due to its simplicity, minimal parameterization, and model-free nature. However, the ultra-high dimensionality that characterizes scRNA-seq impose a computational bottleneck, while prediction power can be affected by the "Curse of Dimensionality". In this work, we proposed the utilization of approximate nearest neighbor search algorithms for the task of kNN classification in scRNA-seq data focusing on a particular methodology tailored for high dimensional data. We argue that even relaxed approximate solutions will not affect the prediction performance significantly. The experimental results confirm the original assumption by offering the potential for broader applicability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/04/2019

2-D Embedding of Large and High-dimensional Data with Minimal Memory and Computational Time Requirements

In the advent of big data era, interactive visualization of large data s...
research
09/03/2019

Rates of Convergence for Large-scale Nearest Neighbor Classification

Nearest neighbor is a popular class of classification methods with many ...
research
12/02/2019

scikit-hubness: Hubness Reduction and Approximate Neighbor Search

This paper introduces scikit-hubness, a Python package for efficient nea...
research
07/19/2011

Unsupervised K-Nearest Neighbor Regression

In many scientific disciplines structures in high-dimensional data have ...
research
03/17/2023

High-Dimensional Approximate Nearest Neighbor Search: with Reliable and Efficient Distance Comparison Operations

Approximate K nearest neighbor (AKNN) search is a fundamental and challe...
research
02/15/2012

The Future of Search and Discovery in Big Data Analytics: Ultrametric Information Spaces

Consider observation data, comprised of n observation vectors with value...
research
08/19/2020

Neural Neighborhood Encoding for Classification

Inspired by the fruit-fly olfactory circuit, the Fly Bloom Filter [Dasgu...

Please sign up or login with your details

Forgot password? Click here to reset