Feature space reduction method for ultrahigh-dimensional, multiclass data: Random forest-based multiround screening (RFMS)

05/25/2023
by   Gergely Hanczár, et al.
0

In recent years, numerous screening methods have been published for ultrahigh-dimensional data that contain hundreds of thousands of features; however, most of these features cannot handle data with thousands of classes. Prediction models built to authenticate users based on multichannel biometric data result in this type of problem. In this study, we present a novel method known as random forest-based multiround screening (RFMS) that can be effectively applied under such circumstances. The proposed algorithm divides the feature space into small subsets and executes a series of partial model builds. These partial models are used to implement tournament-based sorting and the selection of features based on their importance. To benchmark RFMS, a synthetic biometric feature space generator known as BiometricBlender is employed. Based on the results, the RFMS is on par with industry-standard feature screening methods while simultaneously possessing many advantages over these methods.

READ FULL TEXT
research
06/21/2022

BiometricBlender: Ultra-high dimensional, multi-class synthetic data generator to imitate biometric feature space

The lack of freely available (real-life or synthetic) high or ultra-high...
research
11/27/2019

Single Sample Feature Importance: An Interpretable Algorithm for Low-Level Feature Analysis

Have you ever wondered how your feature space is impacting the predictio...
research
11/07/2020

Machine learning applications to DNA subsequence and restriction site analysis

Based on the BioBricks standard, restriction synthesis is a novel catabo...
research
02/15/2023

A model-free feature selection technique of feature screening and random forest based recursive feature elimination

In this paper, we propose a model-free feature selection method for ultr...
research
08/27/2016

Random Forest for Label Ranking

Label ranking aims to learn a mapping from instances to rankings over a ...
research
12/02/2020

A Novel Approach to Radiometric Identification

This paper demonstrates that highly accurate radiometric identification ...
research
01/12/2022

Predicting Terrorist Attacks in the United States using Localized News Data

Terrorism is a major problem worldwide, causing thousands of fatalities ...

Please sign up or login with your details

Forgot password? Click here to reset