Imbalance-XGBoost: Leveraging Weighted and Focal Losses for Binary Label-Imbalanced Classification with XGBoost

08/05/2019
by   Chen Wang, et al.
7

The paper presents Imbalance-XGBoost, a Python package that combines the powerful XGBoost software with weighted and focal losses to tackle binary label-imbalanced classification tasks. Though a small-scale program in terms of size, the package is, to the best of the authors' knowledge, the first of its kind which provides an integrated implementation for the two losses on XGBoost and brings a general-purpose extension on XGBoost for label-imbalanced scenarios. In this paper, the design and usage of the package are described with exemplar code listings, and its convenience to be integrated into Python-driven Machine Learning projects is illustrated. Furthermore, as the first- and second-order derivatives of the loss functions are essential for the implementations, the algebraic derivation is discussed and it can be deemed as a separate algorithmic contribution. The performances of the algorithms implemented in the package are empirically evaluated on Parkinson's disease classification data set, and multiple state-of-the-art performances have been observed. Given the scalable nature of XGBoost, the package has great potentials to be applied to real-life binary classification tasks, which are usually of large-scale and label-imbalanced.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/29/2020

stream-learn – open-source Python library for difficult data stream batch analysis

stream-learn is a Python package compatible with scikit-learn and develo...
research
05/26/2023

mldr.resampling: Efficient Reference Implementations of Multilabel Resampling Algorithms

Resampling algorithms are a useful approach to deal with imbalanced lear...
research
11/30/2020

Binary Classification: Counterbalancing Class Imbalance by Applying Regression Models in Combination with One-Sided Label Shifts

In many real-world pattern recognition scenarios, such as in medical app...
research
02/28/2021

A Minimax Probability Machine for Non-Decomposable Performance Measures

Imbalanced classification tasks are widespread in many real-world applic...
research
07/30/2021

Foundations of data imbalance and solutions for a data democracy

Dealing with imbalanced data is a prevalent problem while performing cla...
research
02/04/2022

Stochastic smoothing of the top-K calibrated hinge loss for deep imbalanced classification

In modern classification tasks, the number of labels is getting larger a...
research
05/04/2022

pyRDF2Vec: A Python Implementation and Extension of RDF2Vec

This paper introduces pyRDF2Vec, a Python software package that reimplem...

Please sign up or login with your details

Forgot password? Click here to reset