Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization

10/16/2022
by   Jonathan Wilton, et al.
0

The need to learn from positive and unlabeled data, or PU learning, arises in many applications and has attracted increasing interest. While random forests are known to perform well on many tasks with positive and negative data, recent PU algorithms are generally based on deep neural networks, and the potential of tree-based PU learning is under-explored. In this paper, we propose new random forest algorithms for PU-learning. Key to our approach is a new interpretation of decision tree algorithms for positive and negative data as recursive greedy risk minimization algorithms. We extend this perspective to the PU setting to develop new decision tree learning algorithms that directly minimizes PU-data based estimators for the expected risk. This allows us to develop an efficient PU random forest algorithm, PU extra trees. Our approach features three desirable properties: it is robust to the choice of the loss function in the sense that various loss functions lead to the same decision trees; it requires little hyperparameter tuning as compared to neural network based PU learning; it supports a feature importance that directly measures a feature's contribution to risk minimization. Our algorithms demonstrate strong performance on several datasets. Our code is available at <https://github.com/puetpaper/PUExtraTrees>.

READ FULL TEXT

page 10

page 20

research
08/11/2021

Trading Complexity for Sparsity in Random Forest Explanations

Random forests have long been considered as powerful model ensembles in ...
research
07/25/2023

Feature Importance Measurement based on Decision Tree Sampling

Random forest is effective for prediction tasks but the randomness of tr...
research
10/19/2022

Subtractive random forests

Motivated by online recommendation systems, we study a family of random ...
research
12/29/2020

Growing Deep Forests Efficiently with Soft Routing and Learned Connectivity

Despite the latest prevailing success of deep neural networks (DNNs), se...
research
05/04/2023

When Do Neural Nets Outperform Boosted Trees on Tabular Data?

Tabular data is one of the most commonly used types of data in machine l...
research
09/30/2020

Uncovering Feature Interdependencies in Complex Systems with Non-Greedy Random Forests

A "non-greedy" variation of the random forest algorithm is presented to ...
research
10/02/2020

Attention augmented differentiable forest for tabular data

Differentiable forest is an ensemble of decision trees with full differe...

Please sign up or login with your details

Forgot password? Click here to reset